Without a tiered storage model it seems like there would be little argument for using cold/frozen storage. Except potentially if additional compression helps save space. If not, using only a homePath in indexes.conf would seem like it would make all data readily accesible as hot/warm.
However, checking the documentation there seems to be three paths for indexes that are reqired for Splunkd to start being homePath, coldPatch and thawedPath. (indexes.conf - Splunk Documentation)
So using a single disk/volume/mount, what does the inputs.conf look like? Should the same path just be set for all three? Making sure that maxVolumeDataSizeMB adds up to the total volume available on /data/splunk/warm.
[volume:storage]
path = /data/splunk/warm/
# adjust when correct disk is mounted
maxVolumeDataSizeMB = 2800000
...
...
[volume:_splunk_summaries]
path = /data/splunk/warm/
# ~ 200GB
maxVolumeDataSizeMB = 200000
...
...
[main]
homePath = volume:storage/defaultdb/db
coldPath = volume:storage/defaultdb/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb
[history]
homePath = volume:storage/historydb/db
coldPath = volume:storage/historydb/colddb
thawedPath = $SPLUNK_DB/historydb/thaweddb
[summary]
homePath = volume:storage/summarydb/db
coldPath = volume:storage/summarydb/colddb
thawedPath = $SPLUNK_DB/summarydb/thaweddb
...
...
[windows]
homePath = volume:storage/windows/db
coldPath = volume:storage/windows/colddb
summaryHomePath = volume:storage/windows/summary
thawedPath = $SPLUNK_DB/windows/thaweddb
tstatsHomePath = volume:_splunk_summaries/windows/datamodel_summary
frozenTimePeriodInSecs = 63072000
[linux]
homePath = volume:storage/linux/db
coldPath = volume:storage/linux/colddb
summaryHomePath = volume:storage/linux/summary
thawedPath = $SPLUNK_DB/linux/thaweddb
tstatsHomePath = volume:_splunk_summaries/linux/datamodel_summary
frozenTimePeriodInSecs = 63072000
I'm assuming this would work, right?
Though as it seems that Splunk requires, and does make use of "cold" and "thawed" anyway, does it make more sence to just partition mounts for warm and cold separately anyway?
[volume:warm]
path = /data/splunk/warm/
# adjust when correct disk is mounted
maxVolumeDataSizeMB = 500000
[volume:cold]
path = /data/splunk/warm/
# adjust when correct disk is mounted
maxVolumeDataSizeMB = 2500000
...
...
[volume:_splunk_summaries]
path = /data/splunk/warm/
# ~ 200GB
maxVolumeDataSizeMB = 200000
...
...
[main]
homePath = volume:warm/defaultdb/db
coldPath = volume:cold/defaultdb/colddb
thawedPath = $SPLUNK_DB/defaultdb/thaweddb
[history]
homePath = volume:warm/historydb/db
coldPath = volume:cold/historydb/colddb
thawedPath = $SPLUNK_DB/historydb/thaweddb
[summary]
homePath = volume:warm/summarydb/db
coldPath = volume:cold/summarydb/colddb
thawedPath = $SPLUNK_DB/summarydb/thaweddb
...
...
[windows]
homePath = volume:warm/windows/db
coldPath = volume:cold/windows/colddb
summaryHomePath = volume:warm/windows/summary
thawedPath = $SPLUNK_DB/windows/thaweddb
tstatsHomePath = volume:_splunk_summaries/windows/datamodel_summary
frozenTimePeriodInSecs = 63072000
[linux]
homePath = volume:warm/linux/db
coldPath = volume:warm/linux/colddb
summaryHomePath = volume:warm/linux/summary
thawedPath = $SPLUNK_DB/linux/thaweddb
tstatsHomePath = volume:_splunk_summaries/linux/datamodel_summary
frozenTimePeriodInSecs = 63072000
Does it matter and what would be "best praxis".
While hot/warm and cold storage can be on the same device there is still sense of using cold space. Most importantly, cold is the only storage that allows you to limit the data by bucket age. So with hot/warm and cold on the same physical storage it's probably most reasonable to keep warm small and keep most data in cold so its retention can be flexibly configured.
While hot/warm and cold storage can be on the same device there is still sense of using cold space. Most importantly, cold is the only storage that allows you to limit the data by bucket age. So with hot/warm and cold on the same physical storage it's probably most reasonable to keep warm small and keep most data in cold so its retention can be flexibly configured.
That is probably the strongest argument right now, as there is only SSD storage at a flat rate.
So if i configure nothing and just set a path to "storage" data
[volume:storage]
path = /data/splunk/warm/
# adjust when correct disk is mounted
maxVolumeDataSizeMB = 2800000
Then use "storage" for both warm and cold , I assume that there is a "default" balance beteween home (hot/warm) and cold with regards to space allocated. So I then have no control over the sizes of home and cold. However, I can set the forezenTimePeriodInSecs and delete based on time from cold with higher accuracy.
I assume I could just add another definition for the same volume mounted and divide the space between hot and cold
[volume:varm]
path = /data/splunk/warm/
# adjust when correct disk is mounted
maxVolumeDataSizeMB = 1000000
[volume:cold]
path = /data/splunk/warm/
# adjust when correct disk is mounted
maxVolumeDataSizeMB = 1800000
Should produce similar results as just keeping the defaults (as indicated by other replies).
The question remaining then is, should I just separate volumes into warm and cold so that I can more easily expand cold storage when needed. As well as some possible efficiency gains.
Thank you for your feedback!
Hi
Should you define separate volumes for cold and hot/warm even if you have only one physical volume where the data is? As usually this depends. If you can be absolutely sure that you never have/need separate volume for cold on the lifecycle of your system (also when you migrate it into new one) then you can keep those in same volume. BUT if there is even small / minimum possibility that you will get later tiered storage on your system then it's easier to migrate those later to separate physical volume when this information is already in your indexes.conf.
Personally in most cases I use separate hot/warm and cold and summary volumes even I have only one physical media type present in nodes. BUT you should calculate how much you have space and divide it over those volumes PLUS left enough free space for filesystem cache etc. Without that last part your I/O performance will suffer if filesystem goes too full. And be sure that all usable space in that physical storage are difined as volumes and no separate FS part is used from it (like SPLUNK_DB etc)
r. Ismo
The coldPath, thawedPath etc are folders inside the index folder in Splunk and they can be hosted on the same volume (the default installation of Splunk creates these folders inside $SPLUNK_DB folder, so it does the same). For the purpose of better performance (and sometimes cost efficiency), we recommend have separate volume for hot/warm and cold bucket (keep faster disk for host/warm and slower/cheaper disk for cold buckets are they are searched less often).
Thank you for the feedback. The long answer to @PickleRick should cover most if my reply, just to clarify. There is no disc tier, only SSD all the way at a flat rate.
This is why I was curious if there was any point in keeping any or just a minimum amount of cold storage available. Retention time control seems to be the best argument so far 🙂
Hi @fatsug
By default these are typically used:
homePath = $SPLUNK_DB/<indexName>/db
coldPath = $SPLUNK_DB/<indexName>/colddb
thawedPath = $SPLUNK_DB/<indexName>/thaweddb
Where hot/warm buckets are in homePath and cold buckets are in coldPath.
thawedPath is used for restoring buckets which have been frozen out to an external location using a coldToFrozenScript - Its a required setting even if you dont plan to freeze/restore data.
In terms of home vs cold - A lot of customers may choose faster storage (such as SSD) on the homePath location, versus cheaper/slower storage for coldPath where older data is typically located.
Your assumptions around using volumes are correct. You can specify multiple volumes on the same path, making sure that the combined maxVolumeDataSizeMB for your volumes doesnt exceed the disk size! You are essentially using it as a logical separation in the same physical space.
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards
Will
Sure can 🙂
So, PickleRick and you basically made the same argument which helped a lot. I'm not going to duplicate the whole reply, hope that OK.
I can still split the same "disk" into warm and cold defining different amounts of storage, or just dump everything in the same volume and let Splunk figure it out.
Thanks for the feedback