Splunk Search

REGEX Problem : Json Structure and sub structure

pck_npluyaud
Explorer

Hello.

For reasons of JSON log splitting, I have a problem with a complex structure.

The integration is in a forwarder (not UF), in transforms.conf. 

For example :

{ "var1":132,"var2":"toto","var3":{},"var4":{"A":1,"B":2},"var5":{"C":{"D":5}}}

the expected result :

  • "var1":132
  • "var2":"toto"
  • "var3":{}
  • "var4":{"A":1,"B":2}
  • "var5":{"C":{"D":5}}}

Actually I use

[extract_message]
SOURCE_KEY = field:message
REGEX = "([^"]*)":("[^"}]*"|[^,"]*|\d{1,})
FORMAT = $1::$2
REPEAT_MATCH = true
WRITE_META = true

Online, it works !

That did not match...

 

Labels (1)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

As I always repeat - fiddling with regexes around structured data will only bring tears and won't give you the result you want (or it will for a short time but the moment your data is reordered or reindented (which is perfectly OK with json) your solution will stop working.

So there are three ways of handling json data with Splunk.

1. Indexed extractions

2. Automatic search-time KV-extraction

3. Manual use of the spath command.

Each of those ways has its pros and cons and yields a bit different results.

View solution in original post

marycordova
SplunkTrust
SplunkTrust

@pck_npluyaud i personally prefer @PickleRick option 2 and as @yuanliu mention if it isn't working it's because the json isn't properly formatted

if the json isn't properly formatted and it's in-house you can try to get it fixed, if it's a paid product that sucks and you can try to open a support ticket but good luck

if you have to do a regex because you can't get the json fixed...

  • go to regex101 and build your regex there
  • make sure you are using the bare minimum escapes "\" and don't use any if you don't have to

the .props file handles things ever so slightly differently than the Search GUI so they should both work with teeny tweaks but the cleanest version is the one you want in your props file

@marycordova
0 Karma

pck_npluyaud
Explorer

Yes, for the fields in root, there is no problem.
I omit one point : the structure json is in another structure JSON ... , hence the "SOURCE_KEY = field:message" in transforms.conf

{ "root": {
  "field1": "value1",
  "message": {
    "var1":132,
    "var2":toto",
    "var3":{},
    "var4":{"A":1,"B":2},
    "var5":{"C":{"D":5}}
  }
}
After indexing, field1 is accessible, because the structure source json is seeing as a JSON. And interpreted like that. I need to parse message to find the var* like field1.

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Is it possible that your developers made a mistake?  If the mock data accurately reflects the raw event structure, there are two errors:

  1. The value of root.var2 is missing a quotation mark at the beginning. (toto" instead of "toto".)
  2. The structure is missing the final closing bracket.

A corrected structure would be

{
  "root": {
  "field1": "value1",
  "message": {
    "var1":132,
    "var2":"toto",
    "var3":{},
    "var4":{"A":1,"B":2},
    "var5":{"C":{"D":5}}
  }
 }
}

If the raw event has the correct structure, you don't need to do anything and Splunk will automatically extract the following:

root.field1root.message.var1root.message.var2root.message.var4.Aroot.message.var4.Broot.message.var5.C.D
value1132toto125

root.message.var3 will not show because it's value is a null JSON.

PickleRick
SplunkTrust
SplunkTrust

As I always repeat - fiddling with regexes around structured data will only bring tears and won't give you the result you want (or it will for a short time but the moment your data is reordered or reindented (which is perfectly OK with json) your solution will stop working.

So there are three ways of handling json data with Splunk.

1. Indexed extractions

2. Automatic search-time KV-extraction

3. Manual use of the spath command.

Each of those ways has its pros and cons and yields a bit different results.

pck_npluyaud
Explorer

Ok. So, no solution.

1. Indexed extractions => the Json is too complex

2. Automatic search-time KV-extraction => no, fields need to be parsed ....

3. Manual use of the spath command. => at the search time ... too late

 

Well, thanks anyway.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I don't understand your objections vs. methods 1 and 2. Complexity of the json structure shouldn't matter as long as the event is a valid json and - in case 2 - doesn't exceed maximum number of fields handled by auto-kv.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

All your picture shows is that makeresults can parse your string (which is valid json format) and extract the first level fields. This does not demonstrate that the regex you have used is fit for purpose.

Here I have updated your regex to escape the double quotes to demonstrate what is being extracted as fieldname and value.

ITWhisperer_0-1746177598760.png

You should either set your log format to json to let Splunk automatically extract the fields, or update your regex to take into account the recursive nature of json-structured data

0 Karma

livehybrid
Super Champion

Hi @pck_npluyaud 

If the data is JSON then you shouldn’t need to extract the fields manually. 
What do you get if you send the JSON but do not apply the transforms?

🌟 Did this answer help you? If so, please consider:

    • Adding karma to show it was useful
    • Marking it as the solution if it resolved your issue
    • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing.

0 Karma
Get Updates on the Splunk Community!

The Latest Cisco Integrations With Splunk Platform!

Join us for an exciting tech talk where we’ll explore the latest integrations in Cisco + Splunk! We’ve ...

Enterprise Security Content Update (ESCU) | New Releases

In April, the Splunk Threat Research Team had 2 releases of new security content via the Enterprise Security ...

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

What are Community Office Hours?Community Office Hours is an interactive 60-minute Zoom series where ...
OSZAR »