-
Notifications
You must be signed in to change notification settings - Fork 1.4k
libhb: Improve buffer pool sizes #5232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Nice. If it works as well as it is clean and concise, seems like a win. I'm looking forward to testing this. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Are you using the QSV decoder? If not I wonder how it would reduce GPU usage. |
Similar to the last patch, this caused a performance regression for me. 1080p Big Buck Bunny. Before: [20:18:40] work: average encoding speed for job is 44.418655 fps After: [20:22:03] work: average encoding speed for job is 23.167715 fps |
It seems it can be faster in some cases, but significantly slower in others. Mode 4 FPS is all over the place. It's randomly peaking 2 times faster, then hitting lows of 2~3 times slower but overall ends up faster. Mode 6 is consistently 2x slower. There is definitely some kind of stalling going on here. You may have adjusted too far the other way. |
72a7bb8
to
63fde73
Compare
I wonder if this has something to do with your respective caches, which is very high on 12900k (14 + 30Mb) so bigger cache cpus will gain more out of this if pools are bigger? |
#5257 might change the way the pool is used. |
Thank you. I will start over with testing the increase in pool buffer sizes, with that pull request incorporated. |
@stickz any Updates? |
I made a mistake if you want to run the workflow again @sr55. It's fixed now. |
Some really basic initial testing to begin with has before/after within a margin of error on my 5900X so whatever issue was showing up before seems to have disappeared. Maybe a slight bias + few fps in some cases. Suspect the impact will be very CPU / Memory dependant in cases where it was an issue. |
So mode 6 has been is at least as fast as it was now (with error of margin)? Regardless, we can close this : #2848 ??? |
Any change with QSV decoding enabled on an Intel CPU? I can do more testing if required to confirm the benefit. |
Some statistics on how much the fifo is used would be nice, if used at all. Enabling HB_BUFFER_DEBUG on fifo.c:37 will provide some info in the activity logs. |
Not had a chance to test further but is on my todo list still. I'll also check QSV when I get a moment. |
I increased the size of the first pool to |
Enabling
@sr55 I would run the workflow again before testing. I had to restructure since it only cares about the first pool. |
The log level needs to be set to extended (3 I think) to make it print the usage logs. |
Alright I got usage logs. I increased
The first buffer is going into the L3 CPU cache. I can't justify going higher than 512 elements, even though it's faster in some situations. It could cause cache misses to happen. Not everyone has a 30MB L3 CPU cache. |
I don't see how the that would influence the L3 cache, the only difference is that they are re-allocated each time in the main memory, that shouldn't influence the L3 cache. |
Yes, I'm testing with QSV decode enabled. There is a big gain in doing so.
Okay, this makes sense. I increased the first pool from 512 to 1024 elements. Let's see if this is faster on @sr55's laptop.
Do you want to try this again with 1024 elements in the first pool? This should help on your laptop where the fps is lower. |
1024 elements doesn't appear to have made any discernible difference. |
Yeah your laptop has 45 watts base power. No wonder it's swinging. It can go up to 115 watts based on the "temperature". I think we should test this on a desktop, so we don't get these swings. The results are much more consistent. The 5900x results show a consistent performance gain in every situation. Can someone test this on a desktop, with an Intel CPU using both QSV decoding enabled and disabled? I'm seeing 8% performance gains with QSV decoding on SVT-AV1 Mode 4. The higher element size is beneficial for QSV decoding and it apparently makes no discernible difference otherwise. |
The buffer poll is not used as much as before. Now we keep the buffers that libavcodec or libavfilter provides. Software decoding will use only the smallest poll to get some hb_buffer to wrap the avframes, unless you have some filters that use the buffer pool. |
Okay, here give this a test run then @sr55. It should produce the same results on your 5900x. Memory usage was only increased from 1.31MB to 3.15MB for the first few pools. Perhaps, we can improve memory usage by less than 2MB if we rewrite some code. But is it really worth it? We have a solution here that improves performance by at least 2-3%. I'm seeing higher... FYI, the reason why 2 pools have 64 elements was based on testing I performed on 4K content. They needed more elements. |
Maybe something like this: https://stackoverflow.com/questions/28339980/ffmpeg-while-decoding-video-is-possible-to-generate-result-to-users-provided?rq=1 ? |
1 << 12
and increased the element size to512
.1 << 13
to1 << 14
from 32 elements to 64 elements.