Skip to content

Conversation

@rohithreddykota
Copy link
Contributor

@rohithreddykota rohithreddykota commented Nov 10, 2024

It involves the following steps:

  1. Create a Temporary Table: A temporary table is created with the same schema as the target table.
  2. Insert Data into Temporary Table: The new data is inserted into the temporary table.
  3. List Temporary Table Partitions: The partitions in the temporary table are listed based on the specified key.
  4. Replace Partitions: The partitions in the target table are replaced with the corresponding partitions from the temporary table.
  5. Drop Temporary Table: The temporary table is dropped after the partitions have been replaced.

Issue

https://github.com/rilldata/rill-private-issues/issues/892

@rohithreddykota rohithreddykota changed the title Implement Incremental Replace Strategy for ClickHouse Implement Incremental Replace Strategy for Click & Duck Nov 10, 2024
Copy link
Member

@k-anshul k-anshul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more general comments:

  1. May be I am missing something but replace looks exactly the same as merge so why call it a different strategy.
  2. As I understand the partition replacement works only for the partition key of the table and not any other combination so we can't take any key input for Merge/Replace strategy and the partition key specified during table creation should work for unique_key as well. Also if the user specifies strategy as Merge we should validate that table is partitioned.

@rohithreddykota
Copy link
Contributor Author

rohithreddykota commented Nov 14, 2024

  1. May be I am missing something but replace looks exactly the same as merge so why call it a different strategy.

@k-anshul The Merge strategy in DuckDB is designed to replace data records based on any specified key, offering flexibility as the key can be any column. This approach is somewhat similar to using ReplacingMergeTree in ClickHouse, where data can be replaced based on primary keys.

The idea of the Replace strategy is to replace strictly at the partition level, using only the partition key to replace data. This means Replace does not support granular replacement on arbitrary fields, making it a more partition-specific approach.

Also, it may not make sense for the Merge strategy to function differently in DuckDB compared to ClickHouse from an end-user perspective. By maintaining Merge and Replace as separate strategies, we can provide users clarity, ensuring they choose the appropriate method based on their specific data update requirements.

@esevastyanov @begelundmuller let me know your thoughts on this

@rohithreddykota
Copy link
Contributor Author

rohithreddykota commented Nov 14, 2024

As I understand the partition replacement works only for the partition key of the table and not any other combination so we can't take any key input for Merge/Replace strategy and the partition key specified during table creation should work for unique_key as well.

You are right, the user may not be required to pass the key field, instead we can internally get it from the table definition. I can make changes accordingly.

Also if the user specifies strategy as Merge we should validate that table is partitioned.

@k-anshul Here, we may be contradicting our own rules for merge. If user just uses merge, he must specify the key. Instead, we can allow user to just use incremental_strategy: replace and not allow to pass any key field. This can be another reason to have replace different from merge (fyi @begelundmuller )

@k-anshul
Copy link
Member

@k-anshul Here, we may be contradicting our own rules for merge. If user just uses merge, he must specify the key. Instead, we can allow user to just use incremental_strategy: replace and not allow to pass any key field. This can be another reason to have replace different from merge (fyi @begelundmuller )

I am still not fully convinced for adding another strategy because the end result is same for both. There can always be limitations around the functionalities depending on the underlying OLAP engine capabilities.
Here is how I would put it.
Merge for duckdb works for arbitrary keys. Merge for clickhouse is limited in scope and only works for partition tables.

But yeah lets also wait for Benjamin/Eugene for their inputs.

@k-anshul
Copy link
Member

Here, we may be contradicting our own rules for merge. If user just uses merge, he must specify the key. Instead, we can allow user to just use incremental_strategy: replace and not allow to pass any key field. This can be another reason to have replace different from merge

The disparity between strategies for different OLAP engines will still exist. Replace is an alias for Merge for duckDB meaning user can(and must) provide keys. Replace for clickhouse does not take key input.

@esevastyanov
Copy link
Contributor

Merge for duckdb works for arbitrary keys. Merge for clickhouse is limited in scope and only works for partition tables.

this sounds like a simpler approach since doesn't require introducing new syntax

If user just uses merge, he must specify the key

can the logic be updated so that no key is required in case of clickhouse?

@rohithreddykota
Copy link
Contributor Author

Thanks, I will make changes accordingly @k-anshul @esevastyanov

@esevastyanov esevastyanov marked this pull request as ready for review December 2, 2024 11:42
@esevastyanov esevastyanov requested a review from k-anshul December 2, 2024 11:42
@rohithreddykota rohithreddykota changed the title Implement Incremental Replace Strategy for Click & Duck Implement Incremental Merge Strategy for Clickhouse Dec 2, 2024
@k-anshul
Copy link
Member

k-anshul commented Dec 3, 2024

I think it will also be good to add some validation that incremental strategy = replace is only added for models which create partitioned tables.

@esevastyanov esevastyanov requested a review from k-anshul December 3, 2024 22:18
@esevastyanov esevastyanov requested a review from k-anshul December 4, 2024 13:15
@esevastyanov esevastyanov merged commit d3c4c5f into main Dec 5, 2024
7 checks passed
@esevastyanov esevastyanov deleted the rohithreddykota/inc-strategy-replace branch December 5, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants