-
Notifications
You must be signed in to change notification settings - Fork 159
Implement Incremental Merge Strategy for Clickhouse #6069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
k-anshul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more general comments:
- May be I am missing something but replace looks exactly the same as merge so why call it a different strategy.
- As I understand the partition replacement works only for the partition key of the table and not any other combination so we can't take any key input for
Merge/Replacestrategy and the partition key specified during table creation should work for unique_key as well. Also if the user specifies strategy asMergewe should validate that table is partitioned.
@k-anshul The Merge strategy in DuckDB is designed to replace data records based on any specified key, offering flexibility as the key can be any column. This approach is somewhat similar to using ReplacingMergeTree in ClickHouse, where data can be replaced based on primary keys. The idea of the Replace strategy is to replace strictly at the partition level, using only the partition key to replace data. This means Replace does not support granular replacement on arbitrary fields, making it a more partition-specific approach. Also, it may not make sense for the Merge strategy to function differently in DuckDB compared to ClickHouse from an end-user perspective. By maintaining Merge and Replace as separate strategies, we can provide users clarity, ensuring they choose the appropriate method based on their specific data update requirements. @esevastyanov @begelundmuller let me know your thoughts on this |
You are right, the user may not be required to pass the key field, instead we can internally get it from the table definition. I can make changes accordingly.
@k-anshul Here, we may be contradicting our own rules for merge. If user just uses merge, he must specify the key. Instead, we can allow user to just use |
I am still not fully convinced for adding another strategy because the end result is same for both. There can always be limitations around the functionalities depending on the underlying OLAP engine capabilities. But yeah lets also wait for Benjamin/Eugene for their inputs. |
The disparity between strategies for different OLAP engines will still exist. |
this sounds like a simpler approach since doesn't require introducing new syntax
can the logic be updated so that no key is required in case of clickhouse? |
|
Thanks, I will make changes accordingly @k-anshul @esevastyanov |
|
I think it will also be good to add some validation that |
It involves the following steps:
Issue
https://github.com/rilldata/rill-private-issues/issues/892