Skip to main content

CND News and Blog

Splunk: Building a Test Instance

Splunk: Building a Test Instance

TLDR:

Building a Splunk Test instance is really useful, helps protect against prod outages and very performant hardware is now available for a fraction of the price it used to be; if you're willing to deal with some tech challenges!

Splunk Test instance

Many of our clients rely on Splunk as their production SIEM tool to monitor, detect and respond to cyber security events. The production system is therefore critically important and ensuring that no logs are lost or un-necessary downtime experienced, it is really important to have a test instance that accurately maps the prod environment and the apps, sourcetype's and KO's (Knowledge Objects) in place.

This does not mean you have to duplicate your hardware exactly or disk and licensing costs but they (test environments) do offer a great opportunity to avoid problems if correctly planned and considered. As a certified Splunk consultant, continuous personal development, practice and ongoing training is equally necessary and valuable.

Since 2020 I have relied upon an S1 (Splunk Validated Architecture, standalone server) as my training environment based upon an Intel NUC i5 with 4x CPU cores, 16GB RAM and M2 NVME solid state storage. I run Ubuntu 22.04 and Splunk 9.1.x. It has served me well as a platform and I couple this with a similar spec Syslog server with a UF (Universal Forwarder) to accurately simulate data onboarding. Whilst it has been a good platform it is under-spec'd and Splunk recommend a minimum of 16 cores for this type of standalone hardware. It is also not up to the task of running premium apps such as Enterprise Security or SOAR both of which I have use case testing for. 

Moore's Law

Since my 2020 purchase, CPU's have continued their upwards power trajectory and inverse pricing, there are now some quite incredible CPU's available that in the past would have been only affordable as high end servers. AMD has also become a significant and viable challenger to Intel in some use cases, and their extreme hyper-threading capability is very useful in some industries. In Splunk 7.x docs Splunk used to specify an Intel CPU specifically, now it is a more generic x86 (64-bit) CPU architecture (URL). This is brilliant for end user choice and building high performance systems without single vendor tie.

What hardware to choose?

I had a few options open to me:

  • 1.Custom hardware build, with appropriate CPU (Xeon, Intel, AMD)
  • 2.Barebones NUC type hardware (bring RAM and Disk)
  • 3.Off the shelf hardware e.g. Dell PowerEdge

Option 1: During my university days in the early noughties building a custom PC with LED's and water cooling etc. was good fun as a hobbyist past-time, it isn't for the faint hearted though and there were always issues with compatibility, drivers and at that time stability. Whilst potentially the most performant option, this isn't a scalable solution and not one which I could recommend to clients who may wish to clone our Test platform architecture. I discounted this option, leave custom builds for the gaming community, whom by the way we have a lot to be thankful for in modern hardware capabilities particularly GPU's.

Option 2: is to use a NUC or Next Unit of Compute, Intel's commercial offering of an advanced single board computer where all you have to do is bring RAM and Disk (typically), there are a bunch of different offerings and other vendors also offer this style of hardware now. This is a great option and having worked on numerous NUC's I know that they're stable, have Linux hardware drivers (for the most part) and allow you to get on with the platform build without getting bogged down in hardware challenges; however more on this later.

Option 3: is to go to a reputable vendor and purchase a commercial-off-the-shelf (COTS) platform. This sounds sensible but the reality is the hardware on offer is stale and often last generation and overpriced. For example at the time of writing a Dell PowerEdge small Tower Server will set you back £3,300 for single Xeon Silver CPU at 2.8Ghz with a paltry 16GB RAM and a mechanical SAS hard drive; this is very much tired hardware IMHO in 2023 and I bet the IOPS (input, output speed of the disk is miserable) probably around 600-800 IOPS.

Option 2 it was, and I selected the Intel NUC 13 i9 with 24 cores (NUC13RNGi9, vendor URL), note it has recently been discontinued from sale as a product and can be snapped up for an absolute bargain. 

Linux OR Linux

Ok, you can run Windows for Splunk Enterprise but it is less performant (especially in the DS role) where it has 1/5th of the capability, but I want to train on apps such as Splunk Enterprise Security which does not currently support Windows OS at all. So the choice is Linux OR Linux (a great choice to have) and my preferred distro at present is Ubuntu Server 22.04; there are many respected variances and the *nix community is vibrant, helpful and enjoyable to be a part of. Most clients running Splunk in production run on RHEL or in the past CentOS, a replacement for is still relatively unclear.  

Putting it all together

Great so NUC ordered, with 64GB of RAM in 2x32GB 4800Ghz DDR5 format from Corsair and two blisteringly fast NVME drives. The idea with storage is that you have a smaller, maximum speed OS disk and then a larger, slightly less performant indexes storage disk. In this case both have a whopping 7000MB/s which is faster than anything server hardware can offer in 2.5inch format.

The kit arrived less than 24hrs later and I've put together a video of the process to show how easy it was to do, however nothing in life is quite that straight-forward is it. 

That wasn't supposed to happen!

I carefully inserted the RAM and storage observing anti-static precautions and powered it up waiting for a roar of the platform bursting into life. Nothing, absolutely nothing at all, perhaps a small twitch in the CPU cooling fan but that was it. Ok, start small and work it through, I reduced it to a single RAM module and no storage and built it back up. Still nothing, ok lets read the docs again and slowly, it was all rather frustrating for such a simple build and all looked fine, RAM pushed in fully.

Technical Product Specifications for Intel® NUC Products

Tested Peripherals for Intel® NUC Products

On checking the compatibility docs there were a few things that jumped out. Firstly that only a small number of select vendors are listed as Intel labs verified, this shouldn't be a problem but it does create some doubt. The next issue is that the docs explained that the maximum SO-DIMM density was 4, 8 and 16 GB memory technology with SPD support. This was really misleading as the maximum support for the NUC is 64GB, so how can you reach this in two slots if the maximum density is 16GB per stick? A support ticket later and a little bit of frustration and it was clarified to me that this means per side but not stated in the docs! So in theory the RAM I had ordered should be ok despite not being validated.

Sadly I had to RMA it and switch to Crucial branded RAM, which is fine and a reputable brand with equal performance stats, but it was just annoying that the docs are not clearer and that there is a problem in 2023 with relatively commodity hardware. The original vendor was good about the RMA and they were very well accustomed to Intel's compatibility page apparently! 

Fingers crossed...

With a barely perceptible sound the platform started, I wasn't sure the NUC had in-fact powered up and it was only the monitor output to the UEFI setup that alerted me that all was now well. I created a bootable Ubuntu USB using the following sequence:

# Creating a Bootable Ubuntu Disk:

<pre>
# identify the USB disk
sudo fdisk -l

# verify the Ubuntu image name
ls

~/Downloads$ sudo dd bs=4M if=ubuntu-22.04.3-live-server-amd64.iso of=/dev/sdb1 status=progress oflag=sync
2130706432 bytes (2.1 GB, 2.0 GiB) copied, 350 s, 6.1 MB/s 
508+1 records in
508+1 records out
2133391360 bytes (2.1 GB, 2.0 GiB) copied, 349.935 s, 6.1 MB/s
</pre> 

Ubuntu Build

With the platform live and at the command prompt I set about the very basics of system administration before proceeding with a standard professional Splunk install. This is a recommended initial start, you also need to consider UFW (firewall config) and hardening to CIS benchmarks depending on your environment.

# Update the platform
<pre>
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install net-tools
</pre>

# Setup 'root' user
- Save new password to password manager
<pre>
  sudo passwd root
  # Enter password
  # Re-enter password
  su -
  exit
</pre>

# IP Setup
- [how to](https://www.linuxtechi.com/static-ip-address-on-ubuntu-server/)
<pre>
  cd /etc/netplan
  # Create a backup before starting
  sudo cp 00-installer-config.yaml 00-installer-config.yaml.bak

  sudo nano 00-installer-config.yaml

  ### Detail end result of netplan
</pre>

### Example Netplan:
<pre>
  # This is the network config written by 'subiquity'
network:
  ethernets:
    enp3s0:
      addresses:
        - 192.168.x.x/24
      nameservers:
        addresses: [x.x.x.x, x.x.x.x]
      routes:
      - to: default
        via: 192.168.x.x
  version: 2
</pre>

## Disable Wireless RF
- Prevents un-necessary access to the platform if connected via ethernet
<pre>
  sudo apt-get install network-manager
  sudo apt install rfkill

  #nmcli
  nmcli radio wifi
  sudo nmcli radio wifi off
  nmcli radio wifi

  #rfkill
  rfkill list
  sudo rfkill block bluetooth 
  sudo rfkill block wlan
  rfkill list
</pre>

- Reboot
- Re-check wireless is still OFF.

# Time setup
- Critical to setup time sync early in the build
<pre>
timedatectl status
timedatectl timesync-status
timedatectl set-timezone UTC
timedatectl show-timesync --all

sudo nano /etc/systemd/timesyncd.conf
NTP=time.cloudflare.com uk.pool.ntp.org
#exit

systemctl restart systemd-timesyncd.service

timedatectl show-timesync --all
timedatectl status

# Ensure time is correct before moving onwards
</pre>

## Add legal warning login banner

<pre>
sudo nano /etc/issue.net
  

************************************************
* Unauthorised access or use of this equipment *
*   is prohibited and constitutes an offence   *
*     under the Computer Misuse Act 1990.      *
*    If you are not authorised to use this     *
*     system, terminate this session now.      *
************************************************

#exit
sudo nano /etc/ssh/sshd_config

# Update the following to include the path:
# no default banner path
Banner /etc/issue.net

#exit
sudo systemctl restart sshd.service

# log off via ssh
exit

# log in via ssh
</pre>
- Verify the new banner is there

## Install Lynis
- [Lynis](https://cisofy.com/lynis/)
<pre>
  sudo apt install lynis
  # Add basic Debian hardening packages
  sudo apt-get install libpam-tmpdir apt-listchanges fail2ban
</pre>

- Run lynis
<pre>
 # Example
 sudo /usr/sbin/lynis audit system >~/<YYYY-MM-DD>-lynis-audit.txt
 
 more ~/<YYYY-MM-DD>-lynis-audit.txt

</pre>
 

Disk setup

Ok this again proved to be slightly trickier than one first might imagine. On my previous instance I had a single storage 'disk' for the platform, I was now splitting into an OS disk and an index disk. This should be simply a case of using 'fdisk' and creating a new partition. However this is not inside the responsibility of a Splunk engagement and is up to the clients IT teams to undertake so it isn't something as consultants we do. After working with a colleague I found that the problem was that I was not creating a new partition on the 'splunk-indexes' disk and so it couldn't be correctly seen by the OS and wasn't available for the 'Splunkd' process to write to. Ok, fixed that by creating a proper partition.  

# Incorrectly mounted:
- There is no partition on nvme1n1, it needs a partition and that can be mounted:

<pre>
lsblk -f 

james@nuc13:/mnt/splunk-indexes$ lsblk -f
NAME        FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
loop0                                                                     0   100% /snap/canonical-livepatch/246
loop1                                                                     0   100% /snap/core/16202
loop2                                                                     0   100% /snap/core20/1974
loop3                                                                     0   100% /snap/core20/2015
loop4                                                                     0   100% /snap/lxd/24322
loop5                                                                     0   100% /snap/snapd/19457
loop6                                                                     0   100% /snap/snapd/20290
nvme0n1                                                                            
├─nvme0n1p1 vfat   FAT32       F6FB-9CC3                                 1G     1% /boot/efi
└─nvme0n1p2 ext4   1.0         87f212dc-bce6-4d43-9bbb-8d9f23a5af6b  289.7G    31% /
nvme1n1     ext4   1.0         8d5202b4-29be-4ffe-8683-4e87a830f572  869.2G     0% /mnt/splunk-indexes
</pre>


# Linux fdisk build example:

<pre>
sudo fdisk /dev/nvme0n1

n
# primary
F

# write
exit

sudo mkfs.ext4 -F /dev/nvme0n1p1

sudo mkdir -p /splunk-indexes
sudo mount /dev/nvme0n1p1 /splunk-indexes


Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.1G  2.0M  3.1G   1% /run
/dev/nvme1n1p2  457G  144G  290G  34% /
tmpfs            16G     0   16G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/nvme1n1p1  1.1G  6.1M  1.1G   1% /boot/efi
tmpfs           3.1G  4.0K  3.1G   1% /run/user/1000
/dev/nvme0n1    916G   28K  870G   1% /splunk-indexes


Disk /dev/nvme1n1: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000P5PSSD8                           
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5CDDBFDD-BECA-7A47-AA42-263E2B420203


james@nuc13:~$ sudo mkfs.ext4 -F /dev/nvme1n1
mke2fs 1.46.5 (30-Dec-2021)
Found a gpt partition table in /dev/nvme1n1
Discarding device blocks: done                            
Creating filesystem with 244190646 4k blocks and 61054976 inodes
Filesystem UUID: 8d5202b4-29be-4ffe-8683-4e87a830f572
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000, 214990848

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done     

/dev/nvme1n1 on /mnt/splunk-indexes type ext4 (rw,relatime)

</pre> 

Splunk install

This is a blog topic in itself really, I'll stick to the high level which is setting THP, creating a systemd service, configuring ulimits and a splunk user, 'chowning' the /opt/splunk directory, migrating the splunk.secret file before first time run etc. It is quite involved to deploy Splunk correctly and reliably, there is some guidance on the Splunk docs page but it is worth having a professional admin check it fully if you're then going to clone that into production.

Install on Linux - Splunk Documentation

24 CPU cores

The output below shows 'htop' and the huge volume of physical cores now available to the platform, this is before the Splunk install and the arrival of the second stick of RAM. The CPU architecture on this processor type is novel and mixes performance and efficiency cores which will be interesting to test. It may be that Splunk performs poorly on the efficiency cores, but this is a lab and a great place to trial such things. Selecting an enterprise grade Xeon would negate this uncertainty but at a cost!

FIO / IOPS check:

To verify the IOPs or disk speed I installed 'fio' and ran an 8GB file write test, as you can see this resulted in 342k Read IOPS and 114k Write IOPS . To put it into perspective the Splunk recommended minimum is 800 IOPS this is therefore 143x faster Write speed that Or 465x faster Read speed.

<pre>
sudo apt-get install fio

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=fiotest --filename=testfio --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=75

# delete the test file after use
rm -rfv testfio
</pre> 

Splunk Volumes

The final detail to change was the volume settings for the migrated data from my old NUC to my new one. I'd SCP'd the data over to the 1TB partition and the files were all ok. However I needed to adjust the 'indexes.conf' file in Splunk to use the new volume, in the past it had been a single disk. This is relatively easily achieved but you have to pay attention to the docs and the 'thawedPath' being the local 'SPLUNK_DB' path which tripped me up at first with no useful output in the error log.
# Disk:
james@nuc13:/opt/splunk/etc/apps/ORG_indexer_volume_indexes/local$ df -kh | grep /dev
/dev/nvme1n1p2  457G   54G  380G  13% /
/dev/nvme1n1p1  1.1G  6.1M  1.1G   1% /boot/efi
/dev/nvme0n1p1  916G   97G  774G  12% /splunk-indexes

# Conf files:
indexes.conf

[volume:primary]
path = /splunk-indexes
# Note: The *only* reason to use a volume is to set a cumulative size-based
# limit across several indexes stored on the same partition. There are *not*
# time-based volume limits.
# ~5 TB
maxVolumeDataSizeMB = 930000


[firepower]
coldPath = volume:primary/firepower/colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = volume:primary/firepower/db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB/firepower/thaweddb      # NOTE! this has to be $SPLUNKDB or it fails to start Splunkd

 

After that just Apps needed to be pulled down from Git Hub or migrated in. I'm very pleased with the performance thus far and as you can see from this MC snapshot, it is ready to go for Premium App installation. I'll follow up with another post about how that goes in the coming weeks.


Report Print
×
Stay Informed

When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.

 

Comments

No comments made yet. Be the first to submit a comment
Already Registered? Login Here
Tuesday, 17 September 2024

Captcha Image

By accepting you will be accessing a service provided by a third-party external to https://www.cndltd.com/