Splunk Search

Splunk transaction – Trouble extracting first and last messages

Jessydan
Engager

Hello,

I'm working on a Splunk query to track REST calls in our logs. Specifically, I’m trying to use the transaction command to group related logs — each transaction should include exactly two messages: a RECEIVER log and a SENDER log.

Here’s my current query:

index=...
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| transaction id startswith="RECEIVER" endswith="SENDER" mvlist=message
| search eventcount > 1
| eval count=mvcount(message)
| eval request=mvindex(message, 0)
| eval response=mvindex(message, 1)
| table id, duration, count, request, response, _raw


The idea is to group together RECEIVER and SENDER logs using the transaction id that my logs creates (e.g., RECEIVER[52] and SENDER[52]), and then extract and separate the first and second messages of the transaction into request and response to have a better visualisation.

The transaction command seems to be grouping the logs correctly, I get the right number of transactions, and both receiver and sender logs are present in the _raw field.

For a few cases it works fine, I have as expected the proper request and response in two distinct fields, but for many transactions, the response (second message) is showing as NULL, even though eventcount is 2 and both messages are visible in _raw

The message field is well present in both ends of the transaction, as I can see it in the _raw output.

Can someone guide me on what is wrong with my query ?

 

Labels (1)
0 Karma

Jessydan
Engager

Just heads up, it was indeed an issue with the extraction of the fields, my event are so big that splunk stops extracting fields at some point. Thanks all for the help 🙂 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

As @bowesmana diagnoses, default field extraction stops at 50K.  You can change this in limits.conf.  The stanza is [kv], property name is maxchars.

I recommend that you fix another problem @livehybrid hinted at: You should extract id field from message field, not from _raw, i.e.,

| rex field=message "(SENDER|RECEIVER)\[(?<id>\d+)\]"

 

Tags (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Is your data JSON?

If so, Splunk will only extract the first 5K of the JSON object in an event. I'm not totally sure if it has that 5k limit for other auto kv field extraction.

 

0 Karma

livehybrid
Super Champion

Hi @Jessydan 

Couldnt agree more with the others regarding stats - it seems you're having issues extracting your message/ID in these examples though - does the following work for you? I used your provided sample data in the below:

livehybrid_0-1746437628236.png

 

| windbag | head 2
| streamstats count as row_number
| eval _raw=if(row_number==1, "{\"severity\":\"INFO\",\"logger\":\"com.PayloadLogger\",\"thread\":\"40362833\",\"message\":\"RECEIVER[20084732]: POST /my-end-point Headers: {sedatimeout=[60000], x-forwarded-port=[443], jmsexpiration=[0], host=[hostname], content-type=[application/json], Content-Length=[1461], sending.interface=[ANY], Accept=[application/json], cookie=[....], x-forwarded-proto=[https]} {{\\\"content\\\":\\\"Any content here\\\"}}\",\"properties\":{\"environment\":\"any\",\"transactionOriginator\":\"any\",\"customerId\":\"any\",\"correlationId\":\"any\",\"configurationId\":\"any\"}}", "{\"severity\":\"INFO\",\"logger\":\"com.PayloadLogger\",\"thread\":\"40362833\",\"message\":\"SENDER[20084732]: Status: {200} Headers: {Date=[Mon, 05 May 2025 07:27:18 GMT], Content-Type=[application/json]} {{\\\"generalProcessingStatus\\\":\\\"OK\\\",\\\"content\\\":[]}}\",\"properties\":{\"environment\":\"any\",\"transactionOriginator\":\"any\",\"customerId\":\"any\",\"correlationId\":\"any\",\"configurationId\":\"any\"}}") 
| spath input=_raw
| rex field=message "^(?<msgType>[A-Z]+)\[(?<id>[0-9]+)\].*"
| stats range(_time) as duration, count, values(msgType) as msgType by id
| where isnotnull(msgType) AND msgType="RECEIVER" AND msgType="SENDER"

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

PickleRick
SplunkTrust
SplunkTrust

It's hard to say what's "wrong" not knowing your data but while transaction can be sometimes useful (in some strange use cases) it's often easier, and faster to simply use stats. Mostly because transaction has loads of limitations that stats don't have.

Quick glance at your search suggests that for some reason the message field is not extracted properly from your event so you're not getting two separate values in your multivalued message output field.

As I said I'd go with

index=...
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| eval request=if(searchmatch("SENDER[",message,null())
| eval response=if(searchmatch("\"RECEIVER[\" AND \"POST /my-end-point*\"",message,null())
| stats range(_time) as duration, count, values(request) as request, values(response) as response, values(_raw) as _raw by id

 

bowesmana
SplunkTrust
SplunkTrust

transaction can silently ignore data, depending on data volume, time between start and end and you will not get any indication that data has been discarded.

It's far better to use stats to group by id - which you appear to have.

At the simplest level you can replace transaction with stats like this

index=...
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| stats list(_raw) as _raw list(message) as message min(_time) as start_time max(_time) as end_time by id
| eval duration=end_time - start_time, eventcount=mvcount(_raw)
| eval request=mvindex(message, 0)
| eval response=mvindex(message, 1)
| table id, duration, count, request, response, _raw

 

Jessydan
Engager

I'm a bit puzzled now, while both of the queries you proposed me are working, they raise the same issue. 

With this one I get the expected output, but with way less transactions than expected (like 10 instead of 100) 

("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| stats list(_raw) as _raw list(message) as message min(_time) as start_time max(_time) as end_time by id
| eval duration=end_time - start_time, eventcount=mvcount(_raw)
| eval request=mvindex(message, 0)
| eval response=mvindex(message, 1)
| table id, duration, count, request, response, _raw


And with this one, I have the same issue where _raw contains the response, but I don't get it in the response field, it is properly present for only around 10% of the transactions

index=...
("SENDER[" OR ("RECEIVER[" AND "POST /my-end-point*"))
| rex "\[(?<id>\d+)\]"
| eval request=if(searchmatch("SENDER[",message,null())
| eval response=if(searchmatch("\"RECEIVER[\" AND \"POST /my-end-point*\"",message,null())
| stats range(_time) as duration, count, values(request) as request, values(response) as response, values(_raw) as _raw by id
| where isnotnull(request)

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Apparently you have problems with proper extraction of the message field. So you should verify your data onboarding and start with fixing the extractions.

Jessydan
Engager

Not sure that's what you expect, let me know if you need something else, here are two raw events that my query matched together, but response is not being displayed (while present in the output _raw)

{"severity":"INFO","logger":"com.PayloadLogger","thread":"40362833","message":"RECEIVER[20084732]: POST /my-end-point Headers: {sedatimeout=[60000], x-forwarded-port=[443], jmsexpiration=[0], host=[hostname], content-type=[application/json], Content-Length=[1461], sending.interface=[ANY], Accept=[application/json], cookie=[....], x-forwarded-proto=[https]} {{\"content\":"Any content here"}}","properties":{"environment":"any","transactionOriginator":"any","customerId":"any","correlationId":"any","configurationId":"any"}}

{"severity":"INFO","logger":"com.PayloadLogger","thread":"40362833","message":"SENDER[20084732]: Status: {200} Headers: {Date=[Mon, 05 May 2025 07:27:18 GMT], Content-Type=[application/json]} {{\"generalProcessingStatus\":\"OK\",\"content\":[]}}","properties":{"environment":"any","transactionOriginator":"any","customerId":"any","correlationId":"any","configurationId":"any}}

I've been trying to use stats as well but have more trouble than with the transaction, which works pretty well (despite this missing response field). Can't say im a splunk expert

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please provide some sample data (anonymised) which demonstrate your issue

Having said that, you could try using stats to gather your events by id as this is can be more deterministic than transaction

Get Updates on the Splunk Community!

Splunk App Dev Community Updates – What’s New and What’s Next

Welcome to your go-to roundup of everything happening in the Splunk App Dev Community! Whether you're building ...

The Latest Cisco Integrations With Splunk Platform!

Join us for an exciting tech talk where we’ll explore the latest integrations in Cisco &#43; Splunk! We’ve ...

Enterprise Security Content Update (ESCU) | New Releases

In April, the Splunk Threat Research Team had 2 releases of new security content via the Enterprise Security ...
OSZAR »