I need to create a dashboard which involves capturing Performance Data for Linux instances.
A UF (Universal Forwarder) with the *nix app configured is installed on the Linux instances which pushes data into a central Windows Indexer.
For one of those linux host instances, AMERICA-3, I took the standard query under Splunk>*nix>CPU>CPU By Host > Load Factor and modified it to the following:
index=os sourcetype=vmstat host=* | multikv fields loadAvg1mi | search host="AMERICA-3" | timechart span=5m avg(loadAvg1mi) by host | sort _time
However, this gives me certain values for the loadAverage PER minute that are in excess of 1.000 and 2.000 (loadAvg1mi is usually between 0.000 and 1.0000). I am not sure what loadAvg1mi measures in the context of multiple CPUs and I need to understand that better.
What I am trying to do is the LINUX equivalent of the following CPU Load related metrics for WINDOWS machines (provided also out-of-the-box by Splunk):
sourcetype="Perfmon:CPU Load" | timechart avg(Value) by host | summaryindex spool=t uselb=t addtime=t index="summary" file="winCPULoad_556591856.stash_new" name="winCPULoad" marker=""
Based on my research, there are these other two queries that I could also try:
index=os sourcetype=vmstat host=* | multikv fields memUsedPct | search host="AMERICA-3" | timechart span=5m max(memUsedPct) by host | sort _time
index=os sourcetype=top host=* | multikv fields pctCPU COMMAND | search host="AMERICA-3" COMMAND="java" | timechart span=5m max(pctCPU) by host | sort _time (I only care for the java process)
Not sure if either query gives us the accurate figure.
Linux load average as reported by vmstat
is not bounded to between 0 and 1 per CPU. A somewhat old (but also very good) discussion is at http://www.linuxjournal.com/article/9001?page=0,1. The load average measurement, while useful, is not equivalent to the Windows Perfmon "CPU Load" counters.
You may find the *nix's app's "percent load by host" to be a more comparable search:
index=os sourcetype=cpu host=* | multikv fields pctIdle | eval Percent_CPU_Load = 100 - pctIdle | timechart avg(Percent_CPU_Load) by host
Linux load average as reported by vmstat
is not bounded to between 0 and 1 per CPU. A somewhat old (but also very good) discussion is at http://www.linuxjournal.com/article/9001?page=0,1. The load average measurement, while useful, is not equivalent to the Windows Perfmon "CPU Load" counters.
You may find the *nix's app's "percent load by host" to be a more comparable search:
index=os sourcetype=cpu host=* | multikv fields pctIdle | eval Percent_CPU_Load = 100 - pctIdle | timechart avg(Percent_CPU_Load) by host
Hi Dwaddle, do you know of a similar search to gather just the "load average" you get from top?