Getting Data In

Splunk data retention

Karthikeya
Communicator

I was newly aligned into a project and didn't have proper KT from the left ones. I have queries regarding my current architecture and configurations and I am not well versed with advanced admin concepts. Please help me in these queries:

  • We have 6 indexers (hosted on AWS cloud as EC2 but not Splunk cloud) with 6.9TB disk storage and 1.5GB/day license. Is this ok? I am checking for retention period but nowhere set with frozentimeperiodinsecs or maxTotalDataSizeMB in local. But in default it is there... I am also looking whether archival location is set or not.

indexes.conf in Cluster Manager:

[new_index]
homePath   = volume:primary/$_index_name/db
coldPath   = volume:primary/$_index_name/colddb
thawedPath = $SPLUNK_DB/$_index_name/thaweddb
 
volumes indexes.conf:
 
[volume:primary]
path = $SPLUNK_DB
#maxVolumeDataSizeMB = 6000000
 
there is one more app which is pushing to indexers with indexes.conf: (not at all aware of this)
 
[default]
remotePath = volume:aws_s3_vol/$_index_name
maxDataSize = 750

[volume:aws_s3_vol]
storageType = remote
path = s3://conn-splunk-prod-smartstore/
remote.s3.auth_region = eu-west-1
remote.s3.bucket_name = conn-splunk-prod-smartstore
remote.s3.encryption = sse-kms
remote.s3.kms.key_id = XXXX
remote.s3.supports_versioning = false
 
and I don't see coldToFrozenDir and coldToFrozenScript is also not mentioned anywhere.
 
Now are we storing archival data in S3 bucket now? but there mentioned maxDataSize which is related to hot to warm. So apart from hot bucket data, rest all data is storing in s3 bucket now? 
 
So how will Splunk take the data from S3 bucket to search and make queries?
Labels (1)
0 Karma

livehybrid
Super Champion

Hi

You are using Splunk SmartStore, which offloads warm and cold buckets to remote object storage (S3). Hot buckets remain on local indexer storage until they roll to warm, then get uploaded to S3.

Your remotePath and [volume:aws_s3_vol] config confirms SmartStore is enabled, meaning:

    • Hot Data and cached warm/cold data resides on indexers
    • Warm and cold buckets are stored in S3
    • There is no need for coldToFrozenDir or coldToFrozenScript unless you want to archive frozen data elsewhere. This allows for archiving data which is passed the frozenTimePeriodInSecs to be moved elsewhere.

Retention is controlled by frozenTimePeriodInSecs (age-based) or maxTotalDataSizeMB (size-based). If you don’t override these in local/, defaults apply (usually 6 years retention).

You can run the following command on one of your indexers to confirm the settings which have been applied: 

/opt/splunk/bin/splunk btool indexes list --debug | grep -A 10 new_index

Splunk automatically retrieves data from S3 to local cache when searches require it. This is transparent to users but may add latency for cold data which is not already in the cache. When the cache reaches capacity it will "evict" buckets based on the eviction policy which by default is the least-recently used bucket.

Some useful Docs relating to SmartStore and index configuration which might be useful:

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

PickleRick
SplunkTrust
SplunkTrust

One small correction. With smartstore there is no separate warm/cold storage. A bucket is getting uploaded to remote storage and is being cached locally if needed but it doesn't go through warm->cold lifecycle.

It's also worth noting that with some use cases (especially when you often work with searches covering a significant portion of your remote storage which turns out to be way over your local storage) you might get a significant performance hit because you're effectively not caching anything locally.

0 Karma

Karthikeya
Communicator

@PickleRick And will data be deleted in S3 if it reaches to any limit? I mean we didn't set frozentimeperiodinsecs so by default it is 6 years so the older data stays for 6 years in S3?

0 Karma

Karthikeya
Communicator

Thanks for this... So my understanding is my index size which is of 500GB by default will never fill at all because once it reaches to 750 MB (Maxdatasize) it will roll over to warm bucket which is in S3 bucket? Am I correct?

0 Karma

livehybrid
Super Champion

Hi @Karthikeya 

For reference, the following docs page is useful for SmartStore retention settings: https://docs.splunk.com/Documentation/Splunk/9.4.1/Indexer/SmartStoredataretention

maxDataSize = Bucket Size in MB, not the total size of the index

Data will be "frozen" when either maxGlobalDataSizeMB or frozenTimePeriodInSecs is met (whichever is first!) - so it is not safe to assume the data will be retained for 6 years if the maxGlobalDataSizeMB setting is not large enough to hold 6 years of data.

To clarify my previous post as @PickleRick mentioned - cold buckets in SmartStore indexes are functionally equivalent to warm buckets  - They are essentially the same and cold buckets only exist in circumstances and in any case, the storage on S3 is the same.

Let me know if you have any further questions or need clarity on any of these points 🙂

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

 

0 Karma
Get Updates on the Splunk Community!

Splunk at Cisco Live 2025: Learning, Innovation, and a Little Bit of Mr. Brightside

Pack your bags (and maybe your dancing shoes)—Cisco Live is heading to San Diego, June 8–12, 2025, and Splunk ...

Splunk App Dev Community Updates – What’s New and What’s Next

Welcome to your go-to roundup of everything happening in the Splunk App Dev Community! Whether you're building ...

The Latest Cisco Integrations With Splunk Platform!

Join us for an exciting tech talk where we’ll explore the latest integrations in Cisco + Splunk! We’ve ...
OSZAR »