Storage requirements for more than 3 tiers #18616

aldem · 2024-09-25T00:15:42Z

aldem
Sep 25, 2024

I use 4 tiers and would like to calculate the storage requirements for each one, as well as create a calculator based on the number of metrics collected.

According to the documentation, tier 0 requires around 0.6 bytes of storage per sample, tier 1 requires 6 bytes, while tier 2 requires a surprisingly large 18 bytes.

So I have two questions:

why does tier 2 require more than twice the amount of tier 1 (with the same sample size of 16 bytes)?
what are the storage requirements for tiers 3 and 4?

Thank you!

Answered by stelfrag

Sep 26, 2024

Hi @aldem

It has to do with update frequency of the higher tiers. Higher tiers are updated every X iterations of data points collected of the previous tiers.

So in the default configuration tier 0 collects metrics every second, tier 1 will store 1 point every 60 metrics of tier 0 etc.

As collected metrics fill data pages for each tier, the pages are grouped into "blocks", compressed and stored to disk

Agent behavior (for example frequent restarts) will result in data pages with just a few data points. This can affect compression, but it can also hurt the compression ratio (because even for 1 data point there is associated metadata that needs to be stored)

As an example for a running agent…

View full answer

stelfrag · 2024-09-26T09:02:21Z

stelfrag
Sep 26, 2024
Maintainer

Hi @aldem

It has to do with update frequency of the higher tiers. Higher tiers are updated every X iterations of data points collected of the previous tiers.

So in the default configuration tier 0 collects metrics every second, tier 1 will store 1 point every 60 metrics of tier 0 etc.

As collected metrics fill data pages for each tier, the pages are grouped into "blocks", compressed and stored to disk

Agent behavior (for example frequent restarts) will result in data pages with just a few data points. This can affect compression, but it can also hurt the compression ratio (because even for 1 data point there is associated metadata that needs to be stored)

As an example for a running agent, running localhost:19999/api/v3/node_instances will provide some numbers that can help a bit with the calculations:

         "db_size":[{
                    "tier":0,
                    "granularity":"1s",
                    "metrics":3452,
                    "samples":2795015837,
                    "disk_used":1033003144,
                    "disk_max":1073741824,
                    "disk_percent":96.2059148,
                    "from":1726371494,
                    "to":1727337799,
                    "retention":966305,
                    "retention_human":"11d4h",
                    "requested_retention":1209600,
                    "requested_retention_human":"14d",
                    "expected_retention":1004413,
                    "expected_retention_human":"11d15h",
                    "currently_collected_metrics":3332
                },{
                    "tier":1,
                    "granularity":"1m",
                    "metrics":3546,
                    "samples":256138318,
                    "disk_used":1040938576,
                    "disk_max":1073741824,
                    "disk_percent":96.9449595,
                    "from":1720613220,
                    "to":1727337799,
                    "retention":6724579,
                    "retention_human":"2mo17d20h",
                    "requested_retention":7776000,
                    "requested_retention_human":"3mo",
                    "expected_retention":6936491,
                    "expected_retention_human":"2mo20d7h",
                    "currently_collected_metrics":3332
                },{
                    "tier":2,
                    "granularity":"1h",
                    "metrics":3641,
                    "samples":7656977,
                    "disk_used":211770544,
                    "disk_max":1073741824,
                    "disk_percent":19.7226688,
                    "from":1712725200,
                    "to":1727337799,
                    "retention":14612599,
                    "retention_human":"5mo19d3h",
                    "requested_retention":63072000,
                    "requested_retention_human":"2y",
                    "expected_retention":63072000,
                    "expected_retention_human":"2y",
                    "currently_collected_metrics":3332

For each tier, we are interested in the disk_used and samples

For this agents restart frequency (at least daily) we get

Tier 0: 0.37 bytes
Tier 1: 4.06 bytes
Tier 2: 27.66 bytes

Note that the numbers for Tier 0 and 1 are "better" that what is listed in the documentation (Tier 2 is worse). This is the reason also for mentioning Usually On Disk

For an agent with more tiers, a similar approach can be used for calculation

Note: These disk_used numbers do not include agent metadata not related to data collection itself and is stored per metric (metric name, chart, context etc) but that doesn't impact higher tiers anyway

I hope this helps

0 replies

aldem · 2024-09-26T12:43:35Z

aldem
Sep 26, 2024
Author

Thank you, @stelfrag - this really helps!

I didn't know that it is possible to get exact usage, including number of samples/metrics, via API.

Though now I believe that the best approach would be to "auto-tune" based on actual storage usage - let it run for a while, then use expected_retention / requested_retention ratio to adjust configuration accordingly.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Storage requirements for more than 3 tiers #18616

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Storage requirements for more than 3 tiers #18616

Uh oh!

Uh oh!

aldem Sep 25, 2024

Replies: 2 comments

Uh oh!

stelfrag Sep 26, 2024 Maintainer

Uh oh!

aldem Sep 26, 2024 Author

aldem
Sep 25, 2024

stelfrag
Sep 26, 2024
Maintainer

aldem
Sep 26, 2024
Author