Hello community, we are currently a bit desperate because of a Splunk memory leak problem under Windows OS that most probably all of you will have, but may not have noticed yet, here is the history and analysis of it: The first time we observed a heavy memory leak problem on a Windows Server 2019 instance was after updating to Splunk Enterprise Version 9.1.3 (from 9.0.7). The Windows server affected has installed some Splunk apps (Symantec, ServiceNow, MS o365, DBconnect, Solarwinds), which are starting a lot of python scripts at very short intervals. After the update the server crashes every few hours due to low memory. Openend a Splunk case #3416998 in Feb 9th. With the MS sysinternals tool rammap.exe we found a lot "zombie" processes (PIDs no more listed in task manager) which are still using some KB of memory (~20-32 KB). Process names are btool.exe, python3.exe, splunk-optimiz, splunkd.exe. It seems every time a process of one of these programs ends, it leaves behind such a memory usage. The Splunk apps on our Windows server do this very often and fast which results in thousands of zombie processes. After this insight we downgraded Splunk on the server to 9.0.7 and the problem disappears. Then on a test server we installed Splunk Enterprise versions 9.1.3 and 9.0.9. Both versions are showing the same issue. New Splunk case #3428922. In March 28th we got this information from Splunk: .... got an update from our internal dev team on this "In Windows, after upgrading Splunk enterprise to 9.1.3 or 9.2.0 consumes more memory usage. (memory and processes are not released)" internal ticket. They investigated the diag files and seems system memory usage is high, but only Splunk running. This issue comes from the mimalloc (memory allocator). This memory issue will be fixed in the 9.1.5 and 9.2.2 .......... 9.2.2 arrived at July 1st: Unfortunately, still the same issue, the memory leak persists. 3rd Splunk case #3518811 (which is still open). Also not fixed in Version 9.3.0. Even after a online session showing them the rammap.exe screen they wanted us to provide diags again and again from our (test) servers - but they should actually be able to reproduce it in their lab. The hudge problem is: because of existing vulnerabilities in the installed (affected) versions we need to update Splunk (Heavy Forwarders) on our Windows Servers, but cannot due to the memory leak issue. How to reproduce: - OS tested: Windows Server 2016, 2019, 2022, Windows 10 22H2 - Splunk Enterprise Versions tested: 9.0.9, 9.1.3, 9.2.2 (Universal Forwarder not tested) - let the default installation run for some hours (splunk service running) - download rammap.exe from https://learn.microsoft.com/en-us/sysinternals/downloads/rammap and start it - goto Processes tab, sort by Process column - look for btool.exe, python3.exe and splunkd.exe with a small total memory usage of about ~ 20-32 KB. PIDs of this processes don't exists in task list (see Task manager or tasklist.exe) - with the Splunk default installation (without any other apps) the memory usage slowly increases because the default apps script running interval isn't very high - stopping Splunk service releases memory usage (and zombie processes disappear in rammap.exe) - for faster results you can add an app for exessive testing with python3.exe, starting it in short (0 seconds) intervals. The test.py doesn't need to be exist! Splunk starts python3.exe anyway. Only inputs.conf file is needed: ... \etc\apps\pythonDummy\local\inputs.conf [script://$SPLUNK_HOME/etc/apps/pythonDummy/bin/test.py 0000] python.version = python3 interval = 0 [script://$SPLUNK_HOME/etc/apps/pythonDummy/bin/test.py 1111] python.version = python3 interval = 0 ...............if you want, add some more stanzas, 2222, 3333 and so on ............. - the more python script stanzas there are, the more and faster the zombies processes appears in rammap.exe Please share your experiences. And open tickets for Splunk support if you also see the problem, please. We hope Splunk finally react.
... View more