Skip to content

Tictac AAE object_stats shows various statistics over a period of time. #1874

Open
@TeadRIM

Description

Hi.
Some context. We use riak version 3.12, leveldb, ring_size = 64, n_val = 3, 5 riak nodes. We tried switching to Tictac AAE to use NextGen replication. The following was added to the config:

       anti_entropy = passive
       tictacaae_active = active
       tictacaae_storeheads = enabled

After tictac aae has been built, we observe different statistics on objects through riak_client:aae_fold({object_stats, <<"domainRecord">>, all, all}). on immutable buckets (no external traffic enters riak). There are more than 10kk objects in the bucket, and object_stats shows a different value each time with a slight difference. erase_keys also gives different statistics. For example (the request was repeated with a difference of 5 minutes):

riak_client:aae_fold({object_stats, <<"any_bucket">>, all, {date, {{2023, 8, 5}, {12, 0, 0}}, {{2023, 8, 5}, {13, 0, 0}}}}).
{ok,[{total_count,3652},
      {total_size,19184471},
      {sizes,[{2,105},{3,3156},{4,391}]},
      {siblings,[{1,3652}]}]}
riak_client:aae_fold({object_stats, <<"any_bucket">>, all, {date, {{2023, 8, 5}, {12, 0, 0}}, {{2023, 8, 5}, {13, 0, 0}}}}).
{ok,[{total_count,3645},
      {total_size,19145341},
      {sizes,[{2,105},{3,3149},{4,391}]},
      {siblings,[{1,3645}]}]}
riak_client:aae_fold({object_stats, <<"any_bucket">>, all, {date, {{2023, 8, 5}, {12, 0, 0}}, {{2023, 8, 5}, {13, 0, 0}}}}).
{ok,[{total_count,3643},
      {total_size,19132325},
      {sizes,[{2,105},{3,3147},{4,391}]},
      {siblings,[{1,3643}]}]}
riak_client:aae_fold({object_stats, <<"any_bucket">>, all, {date, {{2023, 8, 5}, {12, 0, 0}}, {{2023, 8, 5}, {13, 0, 0}}}}).
{ok,[{total_count,3650},
      {total_size,19171870},
      {sizes,[{2,105},{3,3154},{4,391}]},
      {siblings,[{1,3650}]}]}

Over a longer period of time the difference will be greater.
Also, when real-time replication is enabled, object_stats on the sink cluster and the source cluster will be different. There are no errors in the logs.
Can you give any comments? Is this normal behavior? Due to the lack of tools for checking data integrity, we cannot be sure that consistency between the two clusters will be maintained and we cannot reliably use replication.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions