Hello.
For reasons of JSON log splitting, I have a problem with a complex structure.
The integration is in a forwarder (not UF), in transforms.conf.
For example :
{ "var1":132,"var2":"toto","var3":{},"var4":{"A":1,"B":2},"var5":{"C":{"D":5}}}
the expected result :
Actually I use
[extract_message]
SOURCE_KEY = field:message
REGEX = "([^"]*)":("[^"}]*"|[^,"]*|\d{1,})
FORMAT = $1::$2
REPEAT_MATCH = true
WRITE_META = true
Online, it works !
That did not match...
As I always repeat - fiddling with regexes around structured data will only bring tears and won't give you the result you want (or it will for a short time but the moment your data is reordered or reindented (which is perfectly OK with json) your solution will stop working.
So there are three ways of handling json data with Splunk.
1. Indexed extractions
2. Automatic search-time KV-extraction
3. Manual use of the spath command.
Each of those ways has its pros and cons and yields a bit different results.
@pck_npluyaud i personally prefer @PickleRick option 2 and as @yuanliu mention if it isn't working it's because the json isn't properly formatted
if the json isn't properly formatted and it's in-house you can try to get it fixed, if it's a paid product that sucks and you can try to open a support ticket but good luck
if you have to do a regex because you can't get the json fixed...
the .props file handles things ever so slightly differently than the Search GUI so they should both work with teeny tweaks but the cleanest version is the one you want in your props file
Yes, for the fields in root, there is no problem.
I omit one point : the structure json is in another structure JSON ... , hence the "SOURCE_KEY = field:message" in transforms.conf
{ "root": {
"field1": "value1",
"message": {
"var1":132,
"var2":toto",
"var3":{},
"var4":{"A":1,"B":2},
"var5":{"C":{"D":5}}
}
}
After indexing, field1 is accessible, because the structure source json is seeing as a JSON. And interpreted like that. I need to parse message to find the var* like field1.
Is it possible that your developers made a mistake? If the mock data accurately reflects the raw event structure, there are two errors:
A corrected structure would be
{
"root": {
"field1": "value1",
"message": {
"var1":132,
"var2":"toto",
"var3":{},
"var4":{"A":1,"B":2},
"var5":{"C":{"D":5}}
}
}
}
If the raw event has the correct structure, you don't need to do anything and Splunk will automatically extract the following:
root.field1 | root.message.var1 | root.message.var2 | root.message.var4.A | root.message.var4.B | root.message.var5.C.D |
value1 | 132 | toto | 1 | 2 | 5 |
root.message.var3 will not show because it's value is a null JSON.
As I always repeat - fiddling with regexes around structured data will only bring tears and won't give you the result you want (or it will for a short time but the moment your data is reordered or reindented (which is perfectly OK with json) your solution will stop working.
So there are three ways of handling json data with Splunk.
1. Indexed extractions
2. Automatic search-time KV-extraction
3. Manual use of the spath command.
Each of those ways has its pros and cons and yields a bit different results.
Ok. So, no solution.
1. Indexed extractions => the Json is too complex
2. Automatic search-time KV-extraction => no, fields need to be parsed ....
3. Manual use of the spath command. => at the search time ... too late
Well, thanks anyway.
I don't understand your objections vs. methods 1 and 2. Complexity of the json structure shouldn't matter as long as the event is a valid json and - in case 2 - doesn't exceed maximum number of fields handled by auto-kv.
All your picture shows is that makeresults can parse your string (which is valid json format) and extract the first level fields. This does not demonstrate that the regex you have used is fit for purpose.
Here I have updated your regex to escape the double quotes to demonstrate what is being extracted as fieldname and value.
You should either set your log format to json to let Splunk automatically extract the fields, or update your regex to take into account the recursive nature of json-structured data
If the data is JSON then you shouldn’t need to extract the fields manually.
What do you get if you send the JSON but do not apply the transforms?
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing.