-
Notifications
You must be signed in to change notification settings - Fork 14.8k
KAFKA-19967: Reduce GC pressure in tiered storage read path using direct memory buffers #21102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
KAFKA-19967: Reduce GC pressure in tiered storage read path using direct memory buffers #21102
Conversation
|
Thanks for the PR, I can see new metrics have been introduced in the PR which falls under |
showuon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nandini12396 , thanks for the PR! Some high-level comments:
- It looks to me that introducing a thread pool with indirect buffer can fix the GC pressure issue, right? Any other reason we want to use direct buffer?
- New configs/metrics needs to go through KIP.
|
Hi @showuon thanks for your review!
In tiered storage, maxBytes can reach 55MB+ based on Direct buffers move the data off-heap entirely into native memory where GC doesn't see it. We also get zero-copy I/O since the data is already in native memory for socket writes. To share some test results
I will remove the metrics changes from this PR so we can focus on the buffer pool implementation. |
9a49d38 to
d2592f2
Compare
|
I've updated the PR without the metrics changes to focus on fixing the issue. Please could you take a look and review. Thank you! |
kamalcph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default consumer configs are max.partition.fetch.bytes = 1 MB and fetch.max.bytes = 50 MB. And, if the size of the messages in the topic are ~1 MB, then the heap buffer size of 1 MB might be fine for those cases.
We may have to provide another config to use either direct / heap buffer when enabling the buffer pool. Since using direct buffer might create page faults, and potentially impact the produce latencies.
Thanks for the patch!
| MEDIUM, | ||
| REMOTE_LIST_OFFSETS_REQUEST_TIMEOUT_MS_DOC); | ||
| REMOTE_LIST_OFFSETS_REQUEST_TIMEOUT_MS_DOC) | ||
| .define(REMOTE_LOG_DIRECT_BUFFER_POOL_ENABLED_PROP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you mark the config as internal until the KIP gets approved?
.define -> .defineInternal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
clients/src/main/java/org/apache/kafka/common/record/RemoteLogInputStream.java
Outdated
Show resolved
Hide resolved
d2592f2 to
6da88aa
Compare
…ect memory buffers This change addresses high GC pressure by allocating tiered storage fetch buffers in direct (off-heap) memory instead of the JVM heap. When direct memory is exhausted, the system gracefully falls back to heap allocation with a warning. Problem: During tiered storage reads, heap-allocated buffers bypass young generation and go directly to old generation (humongous allocations). Under high read load, these accumulate rapidly and trigger frequent, expensive G1 Old Generation collections causing significant GC pause times. Solution: - Introduced DirectBufferPool that pools direct buffers using WeakReferences, allowing GC to reclaim buffers under memory pressure - Modified RemoteLogInputStream to use pooled direct buffers instead of per-request heap allocation - Graceful fallback to heap allocation when direct memory is exhausted
6da88aa to
343a450
Compare
This change addresses high GC pressure by allocating tiered storage fetch buffers in direct (off-heap) memory instead of the JVM heap. When direct memory is exhausted, the system gracefully falls back to heap allocation with a warning.
Problem:
During tiered storage reads, heap-allocated buffers bypass young generation and go directly to old generation (humongous allocations). Under high read load, these accumulate rapidly and trigger frequent, expensive G1 Old Generation collections causing significant GC pause times.
Solution: