Hello,
I'm working on a Splunk query to track REST calls in our logs. Specifically, I’m trying to use the transaction command to group related logs — each transaction should include exactly two messages: a RECEIVER log and a SENDER log.
Here’s my current query:
index=...
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| transaction id startswith="RECEIVER" endswith="SENDER" mvlist=message
| search eventcount > 1
| eval count=mvcount(message)
| eval request=mvindex(message, 0)
| eval response=mvindex(message, 1)
| table id, duration, count, request, response, _raw
The idea is to group together RECEIVER and SENDER logs using the transaction id that my logs creates (e.g., RECEIVER[52] and SENDER[52]), and then extract and separate the first and second messages of the transaction into request and response to have a better visualisation.
The transaction command seems to be grouping the logs correctly, I get the right number of transactions, and both receiver and sender logs are present in the _raw field.
For a few cases it works fine, I have as expected the proper request and response in two distinct fields, but for many transactions, the response (second message) is showing as NULL, even though eventcount is 2 and both messages are visible in _raw
The message field is well present in both ends of the transaction, as I can see it in the _raw output.
Can someone guide me on what is wrong with my query ?
Just heads up, it was indeed an issue with the extraction of the fields, my event are so big that splunk stops extracting fields at some point. Thanks all for the help 🙂
As @bowesmana diagnoses, default field extraction stops at 50K. You can change this in limits.conf. The stanza is [kv], property name is maxchars.
I recommend that you fix another problem @livehybrid hinted at: You should extract id field from message field, not from _raw, i.e.,
| rex field=message "(SENDER|RECEIVER)\[(?<id>\d+)\]"
Is your data JSON?
If so, Splunk will only extract the first 5K of the JSON object in an event. I'm not totally sure if it has that 5k limit for other auto kv field extraction.
Hi @Jessydan
Couldnt agree more with the others regarding stats - it seems you're having issues extracting your message/ID in these examples though - does the following work for you? I used your provided sample data in the below:
| windbag | head 2
| streamstats count as row_number
| eval _raw=if(row_number==1, "{\"severity\":\"INFO\",\"logger\":\"com.PayloadLogger\",\"thread\":\"40362833\",\"message\":\"RECEIVER[20084732]: POST /my-end-point Headers: {sedatimeout=[60000], x-forwarded-port=[443], jmsexpiration=[0], host=[hostname], content-type=[application/json], Content-Length=[1461], sending.interface=[ANY], Accept=[application/json], cookie=[....], x-forwarded-proto=[https]} {{\\\"content\\\":\\\"Any content here\\\"}}\",\"properties\":{\"environment\":\"any\",\"transactionOriginator\":\"any\",\"customerId\":\"any\",\"correlationId\":\"any\",\"configurationId\":\"any\"}}", "{\"severity\":\"INFO\",\"logger\":\"com.PayloadLogger\",\"thread\":\"40362833\",\"message\":\"SENDER[20084732]: Status: {200} Headers: {Date=[Mon, 05 May 2025 07:27:18 GMT], Content-Type=[application/json]} {{\\\"generalProcessingStatus\\\":\\\"OK\\\",\\\"content\\\":[]}}\",\"properties\":{\"environment\":\"any\",\"transactionOriginator\":\"any\",\"customerId\":\"any\",\"correlationId\":\"any\",\"configurationId\":\"any\"}}")
| spath input=_raw
| rex field=message "^(?<msgType>[A-Z]+)\[(?<id>[0-9]+)\].*"
| stats range(_time) as duration, count, values(msgType) as msgType by id
| where isnotnull(msgType) AND msgType="RECEIVER" AND msgType="SENDER"
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
It's hard to say what's "wrong" not knowing your data but while transaction can be sometimes useful (in some strange use cases) it's often easier, and faster to simply use stats. Mostly because transaction has loads of limitations that stats don't have.
Quick glance at your search suggests that for some reason the message field is not extracted properly from your event so you're not getting two separate values in your multivalued message output field.
As I said I'd go with
index=...
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| eval request=if(searchmatch("SENDER[",message,null())
| eval response=if(searchmatch("\"RECEIVER[\" AND \"POST /my-end-point*\"",message,null())
| stats range(_time) as duration, count, values(request) as request, values(response) as response, values(_raw) as _raw by id
transaction can silently ignore data, depending on data volume, time between start and end and you will not get any indication that data has been discarded.
It's far better to use stats to group by id - which you appear to have.
At the simplest level you can replace transaction with stats like this
index=...
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| stats list(_raw) as _raw list(message) as message min(_time) as start_time max(_time) as end_time by id
| eval duration=end_time - start_time, eventcount=mvcount(_raw)
| eval request=mvindex(message, 0)
| eval response=mvindex(message, 1)
| table id, duration, count, request, response, _raw
I'm a bit puzzled now, while both of the queries you proposed me are working, they raise the same issue.
With this one I get the expected output, but with way less transactions than expected (like 10 instead of 100)
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*")) | rex "\[(?<id>\d+)\]" | stats list(_raw) as _raw list(message) as message min(_time) as start_time max(_time) as end_time by id | eval duration=end_time - start_time, eventcount=mvcount(_raw) | eval request=mvindex(message, 0) | eval response=mvindex(message, 1) | table id, duration, count, request, response, _raw
And with this one, I have the same issue where _raw contains the response, but I don't get it in the response field, it is properly present for only around 10% of the transactions
index=...
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| eval request=if(searchmatch("SENDER[",message,null())
| eval response=if(searchmatch("\"RECEIVER[\" AND \"POST /my-end-point*\"",message,null())
| stats range(_time) as duration, count, values(request) as request, values(response) as response, values(_raw) as _raw by id
| where isnotnull(request)
Apparently you have problems with proper extraction of the message field. So you should verify your data onboarding and start with fixing the extractions.
Not sure that's what you expect, let me know if you need something else, here are two raw events that my query matched together, but response is not being displayed (while present in the output _raw)
{"severity":"INFO","logger":"com.PayloadLogger","thread":"40362833","message":"RECEIVER[20084732]: POST /my-end-point Headers: {sedatimeout=[60000], x-forwarded-port=[443], jmsexpiration=[0], host=[hostname], content-type=[application/json], Content-Length=[1461], sending.interface=[ANY], Accept=[application/json], cookie=[....], x-forwarded-proto=[https]} {{\"content\":"Any content here"}}","properties":{"environment":"any","transactionOriginator":"any","customerId":"any","correlationId":"any","configurationId":"any"}}
{"severity":"INFO","logger":"com.PayloadLogger","thread":"40362833","message":"SENDER[20084732]: Status: {200} Headers: {Date=[Mon, 05 May 2025 07:27:18 GMT], Content-Type=[application/json]} {{\"generalProcessingStatus\":\"OK\",\"content\":[]}}","properties":{"environment":"any","transactionOriginator":"any","customerId":"any","correlationId":"any","configurationId":"any}}
I've been trying to use stats as well but have more trouble than with the transaction, which works pretty well (despite this missing response field). Can't say im a splunk expert
Please provide some sample data (anonymised) which demonstrate your issue
Having said that, you could try using stats to gather your events by id as this is can be more deterministic than transaction