Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prefix filter to sstable2json #586

Merged
merged 6 commits into from
Nov 25, 2024

Conversation

d-guo
Copy link
Contributor

@d-guo d-guo commented Nov 22, 2024

Enables finding all keys with a prefix within an sstable through sstable2json.

Comment on lines 338 to 340
} else {
entry = sstable.getPosition(lastKey, SSTableReader.Operator.GT);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this keep researching the sstable every key instead of scanning forward from the first ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there's two pieces at play here: sstable.getPosition will have to binary search for the index for each key but dfile.seek will only ever move forward from the previous seeked position. this means searching for a prefix that has 100 keys is the same as if you searched speciically for those 100 keys with -k

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you had a "next prefix" supplied I think you would be able to run this like a normal range scan right? like "prefix1" is the left bound then "prefix2" is the next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, added!

Copy link
Contributor

@wi11dey wi11dey Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think with the update this

this means searching for a prefix that has 100 keys is the same as if you searched speciically for those 100 keys with -k

is still the case

                RowIndexEntry entry = sstable.getPosition(decoratedKey, keysCount == 0 ? SSTableReader.Operator.GE : SSTableReader.Operator.GT);

i think like @zpear said we can just use the on disk atom iterator until we hit untilPrefix (or until the prefix doesn't match anymore) instead of researching for every key

@d-guo
Copy link
Contributor Author

d-guo commented Nov 22, 2024

manual testing

rescue@cqlsh> SELECT * FROM dg_sstables.prefix;

 row            | col1 | col2 | value1 | value2
----------------+------+------+--------+--------
   0x6170706c65 |    1 |    1 |     0x |     0x
       0x62616d |    1 |    1 |     0x |     0x
       0x62616e |    1 |    1 |     0x |     0x
 0x62616e616e61 |    1 |    1 |     0x |     0x
   0x7a65627261 |    1 |    1 |     0x |     0x

(5 rows)
rescue@cqlsh> exit
ptdocker@dg-expansion-0:/app$ ./service/bin/sstable2json mnt/cassandra-data-0/cassandra/dg_sstables/prefix-c5058400a8fd11efb2cc23a21fa52e0e -p 62

{
"lb-1-big-Data.db":[
{"key": "62616d",
 "cells": [["1:1:","",1732299544590381],
           ["1:1:value1","",1732299544590381],
           ["1:1:value2","",1732299544590381]]},
{"key": "62616e",
 "cells": [["1:1:","",1732299531590633],
           ["1:1:value1","",1732299531590633],
           ["1:1:value2","",1732299531590633]]},
{"key": "62616e616e61",
 "cells": [["1:1:","",1732299525101513],
           ["1:1:value1","",1732299525101513],
           ["1:1:value2","",1732299525101513]]}
]
}

@d-guo
Copy link
Contributor Author

d-guo commented Nov 25, 2024

talked offline — keeping the current implementation because DataRange represents a (exclusive, inclusive] range

A Range is responsible for the tokens between (left, right].

we want IncludingExcludingBounds but it doesn't appear SSTableReader accepts that type of bound in its current form

@d-guo d-guo merged commit 862a381 into palantir-cassandra-2.2.18 Nov 25, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants