Data Age vs Frozen Age on _internal index

mburgess97 · ‎10-17-2023

It's my understanding that the default frozenTimePeriodInSecs is 6 years. I am confused about what this graph means for my _internal index. This is Data Age vs Frozen Age.

1. I don' t have 1.461 days of data because my deployment isn't that old.

2. This appears that my frozen age is 30 days? I'm not really sure that this number means.

PickleRick · ‎10-17-2023

Ok. One at a time 😉

1. While the default retention period for an index might be 6 years (I don't remember exactly but that sounds legit) the retention period for _internal index is just 30 days (you can check where it's defined by doing

splunk btool list indexes _internal --debug

on your indexer)

2. While your environment might not be that long, there might be some events that were for example produced by hosts with badly configured time. Check the actual events to see where they came from. Remember that the timestamps do not have to reflect the time at which the event was ingested.

3. Events are rotated hot->warm->cold->frozen (deleted) as whole buckets. So the bucket gets rolled to frozen if most recent event from that bucket is older than the retention period.

4. While most buckets will contain events from a relatively small time range, there is a special bucket created for each index which catches all "weird" events - older than expected or from the future. Since this bucket may contain events both very old (as is probably your case) as well as from the future it might actualy not get rotated for a long time.

5. Do a dbinspect of your index to see what buckets you have in your index and what time ranges they contain.

mburgess97 · ‎10-17-2023

It's in the monitoring console
Indexes > Indexes and Volumes > Indexes and Volumes: Instance

I don't know where the 30 day Frozen Age is coming from.

_JP · ‎10-17-2023

Ok - I understand what you're looking at. I'm also assuming you're not on Splunk Cloud since I will be referring to filesystem level things.

Things like this are set within the indexes.conf configuration file. If you have your own defined indexes, you might have configured some of these time-to-live settings there. But, within the Splunk install there is a default set of indexes configured. For example, the default basic configs are in %SPLUNKHOME%/etc\system\default\indexes.conf

DO NOT EDIT THAT FILE.

But you can look at it to see where some of the configurations might be coming from for internal Splunk indexes, the default values for stuff like frozenTimePeriodInSecs, etc.

When you create your own indexes Splunk creates additional indexes.conf files in various places depending on how you configured things (%SPLUNKHOME%/etc\system\local\indexes.conf or maybe in %SPLUNKHOME%/etc\apps\some_app_you_created\local\indexes.conf).

Even if you tried to override the "default" value for something, it might not be in effect because of the Splunk conf file precedence.

_JP · ‎10-17-2023

I don't recognize that screenshot - is that from an SPL query/dashboard, or the monitoring console?

Keep in mind that there's multiple "timestamps" in Splunk. You have _time, which is the timestamp from within the event data (or, if there's no time in the event then the timestamp Splunk assigns when the data is seen). Also know that sometimes the values for _time can get whacky if the data wasn't extracted correctly (often a user issue when configuring new sourcetypes). This probably doesn't happen as often in _internal.

There is also _indextime which is when Splunk indexed that event. This is more what Splunk will look at when rolling data from hot to warm to cold to frozen. I don't believe it is exactly this field, but Splunk is looking under the hood where the data is stored for a parallel age of the data to the _indextime when rolling buckets.

If you want a quick and dirty way to see what sort of time data lives in your _internal index, run this SPL over all time. You might see stuff way in the past or way in the future if there's some weird data in the index:

index=_internal | stats earliest(_time) as earliest_time, latest(_time) as latest_time, earliest(_indextime) as earliest_indextime, latest(_indextime) as latest_indextime
|  eval earliest_time=strftime(earliest_time,"%x %r")
|  eval latest_time=strftime(latest_time,"%x %r")
|  eval earliest_indextime=strftime(earliest_indextime,"%x %r")
|  eval latest_indextime=strftime(latest_indextime,"%x %r")

Also keep in mind that Splunk doesn't work with individual events that hit your time-period thresholds. Rather, it deals with entire buckets. If your buckets are configured in a way that you have a years worth of data in one bucket, but the frozen setting is 60 days...Splunk won't handle that bucket (like deleting if frozen) unless all data in that bucket hits that time threshold. So you might still see "old" data showing up in results beyond your time configurations because you still have buckets with pre- and post- threshold data in them.

You're touching on some advanced topics in bucket tuning and what Splunk is really doing under the hood. Here's some helpful docs to help you dig into these concepts.

Data Age vs Frozen Age on _internal index

capacity planning

indexer

indexer clustering

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and ...

Elevate Your Organization with Splunk’s Next Platform Evolution

Splunk Answers Content Calendar, June Edition

Are you a member of the Splunk Community?

Data Age vs Frozen Age on _internal index

capacity planning

indexer

indexer clustering

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and ...

Elevate Your Organization with Splunk’s Next Platform Evolution

Splunk Answers Content Calendar, June Edition