By Jack - on Friday, 24 November 2023
Category: Technical

Splunk: Building a Test Instance

TLDR:

Building a Splunk Test instance is really useful, helps protect against prod outages and very performant hardware is now available for a fraction of the price it used to be; if you're willing to deal with some tech challenges!

Splunk Test instance

Many of our clients rely on Splunk as their production SIEM tool to monitor, detect and respond to cyber security events. The production system is therefore critically important and ensuring that no logs are lost or un-necessary downtime experienced, it is really important to have a test instance that accurately maps the prod environment and the apps, sourcetype's and KO's (Knowledge Objects) in place.

This does not mean you have to duplicate your hardware exactly or disk and licensing costs but they (test environments) do offer a great opportunity to avoid problems if correctly planned and considered. As a certified Splunk consultant, continuous personal development, practice and ongoing training is equally necessary and valuable.

Since 2020 I have relied upon an S1 (Splunk Validated Architecture, standalone server) as my training environment based upon an Intel NUC i5 with 4x CPU cores, 16GB RAM and M2 NVME solid state storage. I run Ubuntu 22.04 and Splunk 9.1.x. It has served me well as a platform and I couple this with a similar spec Syslog server with a UF (Universal Forwarder) to accurately simulate data onboarding. Whilst it has been a good platform it is under-spec'd and Splunk recommend a minimum of 16 cores for this type of standalone hardware. It is also not up to the task of running premium apps such as Enterprise Security or SOAR both of which I have use case testing for. 

Moore's Law

Since my 2020 purchase, CPU's have continued their upwards power trajectory and inverse pricing, there are now some quite incredible CPU's available that in the past would have been only affordable as high end servers. AMD has also become a significant and viable challenger to Intel in some use cases, and their extreme hyper-threading capability is very useful in some industries. In Splunk 7.x docs Splunk used to specify an Intel CPU specifically, now it is a more generic x86 (64-bit) CPU architecture (URL). This is brilliant for end user choice and building high performance systems without single vendor tie.

What hardware to choose?

I had a few options open to me:


Option 1: During my university days in the early noughties building a custom PC with LED's and water cooling etc. was good fun as a hobbyist past-time, it isn't for the faint hearted though and there were always issues with compatibility, drivers and at that time stability. Whilst potentially the most performant option, this isn't a scalable solution and not one which I could recommend to clients who may wish to clone our Test platform architecture. I discounted this option, leave custom builds for the gaming community, whom by the way we have a lot to be thankful for in modern hardware capabilities particularly GPU's.

Option 2: is to use a NUC or Next Unit of Compute, Intel's commercial offering of an advanced single board computer where all you have to do is bring RAM and Disk (typically), there are a bunch of different offerings and other vendors also offer this style of hardware now. This is a great option and having worked on numerous NUC's I know that they're stable, have Linux hardware drivers (for the most part) and allow you to get on with the platform build without getting bogged down in hardware challenges; however more on this later.

Option 3: is to go to a reputable vendor and purchase a commercial-off-the-shelf (COTS) platform. This sounds sensible but the reality is the hardware on offer is stale and often last generation and overpriced. For example at the time of writing a Dell PowerEdge small Tower Server will set you back £3,300 for single Xeon Silver CPU at 2.8Ghz with a paltry 16GB RAM and a mechanical SAS hard drive; this is very much tired hardware IMHO in 2023 and I bet the IOPS (input, output speed of the disk is miserable) probably around 600-800 IOPS.

Option 2 it was, and I selected the Intel NUC 13 i9 with 24 cores (NUC13RNGi9, vendor URL), note it has recently been discontinued from sale as a product and can be snapped up for an absolute bargain. 

Linux OR Linux

Ok, you can run Windows for Splunk Enterprise but it is less performant (especially in the DS role) where it has 1/5th of the capability, but I want to train on apps such as Splunk Enterprise Security which does not currently support Windows OS at all. So the choice is Linux OR Linux (a great choice to have) and my preferred distro at present is Ubuntu Server 22.04; there are many respected variances and the *nix community is vibrant, helpful and enjoyable to be a part of. Most clients running Splunk in production run on RHEL or in the past CentOS, a replacement for is still relatively unclear.  

Putting it all together

Great so NUC ordered, with 64GB of RAM in 2x32GB 4800Ghz DDR5 format from Corsair and two blisteringly fast NVME drives. The idea with storage is that you have a smaller, maximum speed OS disk and then a larger, slightly less performant indexes storage disk. In this case both have a whopping 7000MB/s which is faster than anything server hardware can offer in 2.5inch format.

The kit arrived less than 24hrs later and I've put together a video of the process to show how easy it was to do, however nothing in life is quite that straight-forward is it. 

That wasn't supposed to happen!

I carefully inserted the RAM and storage observing anti-static precautions and powered it up waiting for a roar of the platform bursting into life. Nothing, absolutely nothing at all, perhaps a small twitch in the CPU cooling fan but that was it. Ok, start small and work it through, I reduced it to a single RAM module and no storage and built it back up. Still nothing, ok lets read the docs again and slowly, it was all rather frustrating for such a simple build and all looked fine, RAM pushed in fully.

Technical Product Specifications for Intel® NUC Products

Tested Peripherals for Intel® NUC Products

On checking the compatibility docs there were a few things that jumped out. Firstly that only a small number of select vendors are listed as Intel labs verified, this shouldn't be a problem but it does create some doubt. The next issue is that the docs explained that the maximum SO-DIMM density was 4, 8 and 16 GB memory technology with SPD support. This was really misleading as the maximum support for the NUC is 64GB, so how can you reach this in two slots if the maximum density is 16GB per stick? A support ticket later and a little bit of frustration and it was clarified to me that this means per side but not stated in the docs! So in theory the RAM I had ordered should be ok despite not being validated.

Sadly I had to RMA it and switch to Crucial branded RAM, which is fine and a reputable brand with equal performance stats, but it was just annoying that the docs are not clearer and that there is a problem in 2023 with relatively commodity hardware. The original vendor was good about the RMA and they were very well accustomed to Intel's compatibility page apparently! 

Fingers crossed...

With a barely perceptible sound the platform started, I wasn't sure the NUC had in-fact powered up and it was only the monitor output to the UEFI setup that alerted me that all was now well. I created a bootable Ubuntu USB using the following sequence:

Ubuntu Build

With the platform live and at the command prompt I set about the very basics of system administration before proceeding with a standard professional Splunk install. This is a recommended initial start, you also need to consider UFW (firewall config) and hardening to CIS benchmarks depending on your environment.

Disk setup

Ok this again proved to be slightly trickier than one first might imagine. On my previous instance I had a single storage 'disk' for the platform, I was now splitting into an OS disk and an index disk. This should be simply a case of using 'fdisk' and creating a new partition. However this is not inside the responsibility of a Splunk engagement and is up to the clients IT teams to undertake so it isn't something as consultants we do. After working with a colleague I found that the problem was that I was not creating a new partition on the 'splunk-indexes' disk and so it couldn't be correctly seen by the OS and wasn't available for the 'Splunkd' process to write to. Ok, fixed that by creating a proper partition.  

Splunk install

This is a blog topic in itself really, I'll stick to the high level which is setting THP, creating a systemd service, configuring ulimits and a splunk user, 'chowning' the /opt/splunk directory, migrating the splunk.secret file before first time run etc. It is quite involved to deploy Splunk correctly and reliably, there is some guidance on the Splunk docs page but it is worth having a professional admin check it fully if you're then going to clone that into production.

Install on Linux - Splunk Documentation

24 CPU cores

The output below shows 'htop' and the huge volume of physical cores now available to the platform, this is before the Splunk install and the arrival of the second stick of RAM. The CPU architecture on this processor type is novel and mixes performance and efficiency cores which will be interesting to test. It may be that Splunk performs poorly on the efficiency cores, but this is a lab and a great place to trial such things. Selecting an enterprise grade Xeon would negate this uncertainty but at a cost!

FIO / IOPS check:

​To verify the IOPs or disk speed I installed 'fio' and ran an 8GB file write test, as you can see this resulted in 342k Read IOPS and 114k Write IOPS . To put it into perspective the Splunk recommended minimum is 800 IOPS this is therefore 143x faster Write speed that Or 465x faster Read speed.

Splunk Volumes

The final detail to change was the volume settings for the migrated data from my old NUC to my new one. I'd SCP'd the data over to the 1TB partition and the files were all ok. However I needed to adjust the 'indexes.conf' file in Splunk to use the new volume, in the past it had been a single disk. This is relatively easily achieved but you have to pay attention to the docs and the 'thawedPath' being the local 'SPLUNK_DB' path which tripped me up at first with no useful output in the error log.

​After that just Apps needed to be pulled down from Git Hub or migrated in. I'm very pleased with the performance thus far and as you can see from this MC snapshot, it is ready to go for Premium App installation. I'll follow up with another post about how that goes in the coming weeks.

Leave Comments