You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We were experiencing slow upsert performance when using Hudi with Flink SQL on AWS S3. Tried enabling metadata table, which improved update speed, but the cleaner is not triggering even after 3 commits.
To Reproduce
Steps to reproduce the behavior:
Configure Hudi with the following settings for upserting data via Flink SQL:
Run a batch job to perform upserts.
Monitor logs for cleaning operations.
Expected behavior
We expect the cleaner to trigger and remove older commits as per the defined configuration.
Environment Description
Hudi version: 1.14.1
Flink version: 1.17.1
Storage (HDFS/S3/GCS..): S3
Running on Docker? (yes/no): running on K8s
Additional context
After enabling metadata.enabled to true, we observed a notable improvement in upsert speed. However, the cleaner does not seem to be functioning as expected.
Are we missing any configs?
2024-09-15 14:48:38,399 WARN org.apache.hudi.config.HoodieWriteConfig [] - Increase hoodie.keep.min.commits=6 to be greater than hoodie.cleaner.commits.retained=20 (there is risk of incremental pull missing data from few instants based on the current configuration). The Hudi archiver will automatically adjust the configuration regardless.
2024-09-15 14:48:38,909 INFO org.apache.hudi.metadata.HoodieBackedTableMetadataWriter [] - Latest deltacommit time found is 20240915143952010, running clean operations.
2024-09-15 14:48:39,153 INFO org.apache.hudi.client.BaseHoodieWriteClient [] - Scheduling cleaning at instant time :20240915143952010002
2024-09-15 14:48:39,160 INFO org.apache.hudi.table.action.clean.CleanPlanner [] - No earliest commit to retain. No need to scan partitions !!
2024-09-15 14:48:39,160 INFO org.apache.hudi.table.action.clean.CleanPlanActionExecutor [] - Nothing to clean here. It is already clean
The text was updated successfully, but these errors were encountered:
@dheemanthgowda Thanks for the feedback, it looks like your table does not have partitioning fields, then each compaction would triger a whole table rewrite which is indeed costly for streaming ingestion. Did you try to move the compaction out as a separate job.
Describe the problem you faced
We were experiencing slow upsert performance when using Hudi with Flink SQL on AWS S3. Tried enabling metadata table, which improved update speed, but the cleaner is not triggering even after 3 commits.
To Reproduce
Steps to reproduce the behavior:
Configure Hudi with the following settings for upserting data via Flink SQL:
Run a batch job to perform upserts.
Monitor logs for cleaning operations.
Expected behavior
We expect the cleaner to trigger and remove older commits as per the defined configuration.
Environment Description
Hudi version: 1.14.1
Flink version: 1.17.1
Storage (HDFS/S3/GCS..): S3
Running on Docker? (yes/no): running on K8s
Additional context
After enabling metadata.enabled to true, we observed a notable improvement in upsert speed. However, the cleaner does not seem to be functioning as expected.
Are we missing any configs?
The text was updated successfully, but these errors were encountered: