Hello,
I wish to know the functional difference (if any) between the following:
| tstats count FROM datamodel=Endpoint.Processes where Processes.user=SYSTEM by _time span=1h Processes.dest ...
And
| tstats count FROM datamodel=Endpoint.Processes where Processes.user=SYSTEM by Processes.dest ...
| bin _time span=1h
I understand the function and that "| bin" would always be used for a non tstats search, but within tstats is there any reason to place the "span" within the "by", or is it just cleaner/slightly faster?
Thanks in advance!
Hi @Corky_
Regarding the first option of applying the span after the _time and before other fields in the "BY" of your tstats command, I personally prefer to put the span at the end rather than in the middle of the by list to keep it cleaner and not to be confused with a field. The tstats docs also suggests it should be at the end:
[ BY (<field-list> | (PREFIX(<field>))) [span=<timespan>]]
The second query Im confused as to how you could bin by _time with tstats if you havent specified _time in the by clause initially. If you do not split by _time in the initial part of the query then the _time field wont be available to the bin command.
FWIW - I find the bin command good for doing stats by multiple fields over _time, when you cannot do with timechart.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
If you do tstats by time without binning and then do bin, you'll have to stats again to summarise your data. Bin on its own doesn't aggregate data, just aligns the field into discrete points.
The first example will produce a count of destinations, etc, for each hour of the search time window. Something like this
_time | Processes.dest | count |
12:00 | foo | 2 |
12:00 | bar | 1 |
13:00 | foo | 4 |
13:00 | bar | 2 |
The second example will produce counts by destination, etc. The counts will not be broken down by time.
Processes.dest | count |
foo | 6 |
bar | 3 |
The bin command will have no effect because there is no _time field at that point.
Putting span in the tstats command gives you control over the bin sizes. Without span, tstats will choose a span it thinks best fits the data.
I personally like to put _time span=whatever like you have in your first example everywhere it will work (like with "timechart") since it works and it makes it clear what you are spanning.
For the longest time I was not using timechart and span correctly until I learned you should put the span literally right next to the _time to make sure it is getting applied appropriately, so now I just do that everywhere 😁
But to answer your real question...what is the technical difference...IDK 😋