libhb: Improve buffer pool sizes #5232

stickz · 2023-07-02T22:09:18Z

Increased the size of the first pool to 1 << 12 and increased the element size to 512.
Increased the element size of pool 1 << 13 to 1 << 14 from 32 elements to 64 elements.

bradleysepos · 2023-07-02T22:44:32Z

Nice. If it works as well as it is clean and concise, seems like a win. I'm looking forward to testing this.

galad87 · 2023-07-03T06:22:30Z

Are you using the QSV decoder? If not I wonder how it would reduce GPU usage.

sr55 · 2023-07-03T19:25:13Z

Similar to the last patch, this caused a performance regression for me.

1080p Big Buck Bunny.

Before: [20:18:40] work: average encoding speed for job is 44.418655 fps

After: [20:22:03] work: average encoding speed for job is 23.167715 fps
The CPU Usage after has higher peaks, but lower average and much lower base values. It looks like it's "stalling" cores.

sr55 · 2023-07-03T19:32:51Z

It seems it can be faster in some cases, but significantly slower in others. Mode 4 FPS is all over the place. It's randomly peaking 2 times faster, then hitting lows of 2~3 times slower but overall ends up faster. Mode 6 is consistently 2x slower.

There is definitely some kind of stalling going on here.

You may have adjusted too far the other way.

jensdraht1999 · 2023-07-04T03:24:49Z

I wonder if this has something to do with your respective caches, which is very high on 12900k (14 + 30Mb) so bigger cache cpus will gain more out of this if pools are bigger?

galad87 · 2023-07-20T09:30:18Z

#5257 might change the way the pool is used.

stickz · 2023-07-20T14:05:53Z

#5257 might change the way the pool is used.

Thank you. I will start over with testing the increase in pool buffer sizes, with that pull request incorporated.

jensdraht1999 · 2023-10-08T19:56:39Z

@stickz any Updates?

stickz · 2023-10-08T21:05:04Z

@stickz any Updates?

All fixed. Just had to change 1 number for #5257.

stickz · 2023-10-09T19:32:04Z

I made a mistake if you want to run the workflow again @sr55. It's fixed now.

sr55 · 2023-10-10T17:55:52Z

Some really basic initial testing to begin with has before/after within a margin of error on my 5900X so whatever issue was showing up before seems to have disappeared. Maybe a slight bias + few fps in some cases.

Suspect the impact will be very CPU / Memory dependant in cases where it was an issue.

jensdraht1999 · 2023-10-10T22:42:02Z

Some really basic initial testing to begin with has before/after within a margin of error on my 5900X so whatever issue was showing up before seems to have disappeared. Maybe a slight bias + few fps in some cases.

Suspect the impact will be very CPU / Memory dependant in cases where it was an issue.

So mode 6 has been is at least as fast as it was now (with error of margin)?
If yes, we might merge this.

Regardless, we can close this : #2848 ???

stickz · 2023-10-12T12:40:31Z

Some really basic initial testing to begin with has before/after within a margin of error on my 5900X so whatever issue was showing up before seems to have disappeared. Maybe a slight bias + few fps in some cases.

Suspect the impact will be very CPU / Memory dependant in cases where it was an issue.

Any change with QSV decoding enabled on an Intel CPU? I can do more testing if required to confirm the benefit.

galad87 · 2023-10-13T05:56:21Z

Some statistics on how much the fifo is used would be nice, if used at all. Enabling HB_BUFFER_DEBUG on fifo.c:37 will provide some info in the activity logs.

sr55 · 2023-10-15T19:40:12Z

Not had a chance to test further but is on my todo list still. I'll also check QSV when I get a moment.

stickz · 2023-10-15T22:20:07Z

I increased the size of the first pool to 1 << 12 and increased the element size to 512. The result was a 8% performance increase. The testing condition was SVT-AV1 Mode 4 with QSV decoding. Increasing the other pools from 32 to 64 elements had no impact.

stickz · 2023-10-15T22:28:12Z

Some statistics on how much the fifo is used would be nice, if used at all. Enabling HB_BUFFER_DEBUG on fifo.c:37 will provide some info in the activity logs.

Enabling HB_BUFFER_DEBUG had no impact. There was nothing printed in the activity log. I had to relay on testing.

Not had a chance to test further but is on my todo list still. I'll also check QSV when I get a moment.

@sr55 I would run the workflow again before testing. I had to restructure since it only cares about the first pool.

galad87 · 2023-10-16T05:17:22Z

The log level needs to be set to extended (3 I think) to make it print the usage logs.

stickz · 2023-10-22T00:09:13Z

The log level needs to be set to extended (3 I think) to make it print the usage logs.

Alright I got usage logs. I increased 1 << 13 (8192) to 1 << 18 (262144) from 32 to 128 elements. It's faster at a memory cost of an additional 50mb. Up to 1 << 12 is exponential. It will use as many elements as given. This is why speed increased by 8%.

[14:36:20] Freed 512 buffers of size 4096
[14:36:20] Freed 93 buffers of size 8192
[14:36:20] Freed 87 buffers of size 16384
[14:36:20] Freed 59 buffers of size 32768
[14:36:20] Freed 48 buffers of size 65536
[14:36:20] Freed 21 buffers of size 131072
[14:36:20] Freed 31 buffers of size 262144
[14:36:20] Freed 10 buffers of size 524288
[14:36:20] Freed 2 buffers of size 1048576

The first buffer is going into the L3 CPU cache. I can't justify going higher than 512 elements, even though it's faster in some situations. It could cause cache misses to happen. Not everyone has a 30MB L3 CPU cache.

galad87 · 2023-10-22T05:34:22Z

I don't see how the that would influence the L3 cache, the only difference is that they are re-allocated each time in the main memory, that shouldn't influence the L3 cache.
Are you still testing with QSV decode enabled?

sr55 · 2023-10-22T12:15:21Z

Big Buck Bunny, 1080p, no filters, no audio.

Not really sure why QSV is outlying here but it seems to have a fair bit of swing on it on this laptop. It may be related to the hyperencode so I don't necessarily think this is anything to be concerned about .

stickz · 2023-10-22T14:25:00Z

Are you still testing with QSV decode enabled?

Yes, I'm testing with QSV decode enabled. There is a big gain in doing so.

I don't see how the that would influence the L3 cache, the only difference is that they are re-allocated each time in the main memory, that shouldn't influence the L3 cache.

Okay, this makes sense. I increased the first pool from 512 to 1024 elements. Let's see if this is faster on @sr55's laptop.

Big Buck Bunny, 1080p, no filters, no audio.

Not really sure why QSV is outlying here but it seems to have a fair bit of swing on it on this laptop. It may be related to the hyperencode so I don't necessarily think this is anything to be concerned about .

Do you want to try this again with 1024 elements in the first pool? This should help on your laptop where the fps is lower.

sr55 · 2023-10-22T14:44:18Z

Yeh, I can see it having an impact without QSV decode:

Something going on with the A370M. It seems to bit bottlenecking in some way so it's getting any benefit as a result.

Will check with the larger buffer size in a bit.

sr55 · 2023-10-22T15:48:50Z

1024 elements doesn't appear to have made any discernible difference.

stickz · 2023-10-22T16:30:58Z

Yeah your laptop has 45 watts base power. No wonder it's swinging. It can go up to 115 watts based on the "temperature". I think we should test this on a desktop, so we don't get these swings. The results are much more consistent.

The 5900x results show a consistent performance gain in every situation. Can someone test this on a desktop, with an Intel CPU using both QSV decoding enabled and disabled? I'm seeing 8% performance gains with QSV decoding on SVT-AV1 Mode 4.

The higher element size is beneficial for QSV decoding and it apparently makes no discernible difference otherwise.

galad87 · 2023-10-23T06:48:53Z

The buffer poll is not used as much as before. Now we keep the buffers that libavcodec or libavfilter provides. Software decoding will use only the smallest poll to get some hb_buffer to wrap the avframes, unless you have some filters that use the buffer pool.
I think it's possible to provide a custom buffer pool to libavcodec, I wonder if that would improve memory usage or not.

stickz · 2023-10-23T18:01:34Z

The buffer poll is not used as much as before. Now we keep the buffers that libavcodec or libavfilter provides. Software decoding will use only the smallest poll to get some hb_buffer to wrap the avframes, unless you have some filters that use the buffer pool. I think it's possible to provide a custom buffer pool to libavcodec, I wonder if that would improve memory usage or not.

Okay, here give this a test run then @sr55. It should produce the same results on your 5900x. Memory usage was only increased from 1.31MB to 3.15MB for the first few pools. Perhaps, we can improve memory usage by less than 2MB if we rewrite some code. But is it really worth it? We have a solution here that improves performance by at least 2-3%. I'm seeing higher...

FYI, the reason why 2 pools have 64 elements was based on testing I performed on 4K content. They needed more elements.

jensdraht1999 · 2023-10-23T20:24:06Z

I think it's possible to provide a custom buffer pool to libavcodec, I wonder if that would improve memory usage or not.

Maybe something like this: https://stackoverflow.com/questions/28339980/ffmpeg-while-decoding-video-is-possible-to-generate-result-to-users-provided?rq=1 ?

bradleysepos self-assigned this Jul 2, 2023

This comment was marked as off-topic.

Sign in to view

stickz force-pushed the patch-1 branch 2 times, most recently from 72a7bb8 to 63fde73 Compare July 4, 2023 01:15

stickz force-pushed the patch-1 branch from 11f908d to 1df0479 Compare July 6, 2023 20:35

stickz force-pushed the patch-1 branch from 473646f to 0c4fda6 Compare October 8, 2023 21:02

stickz force-pushed the patch-1 branch from 2dedb3b to 6071e2e Compare October 9, 2023 19:30

sr55 self-assigned this Oct 15, 2023

sr55 added the Enhancement label Oct 15, 2023

stickz force-pushed the patch-1 branch from 11364d3 to 33429ce Compare October 15, 2023 22:19

bradleysepos mentioned this pull request Oct 20, 2023

libhb: Improve buffer pool sizes. #2848

Closed

3 tasks

stickz force-pushed the patch-1 branch from 56c5803 to 975bbc0 Compare October 22, 2023 00:08

stickz force-pushed the patch-1 branch from 975bbc0 to cffef04 Compare October 22, 2023 14:22

libhb: Improve buffer pool sizes

b843b3a

stickz force-pushed the patch-1 branch from 82859e0 to b843b3a Compare October 23, 2023 17:59

libhb: Improve buffer pool sizes #5232

Are you sure you want to change the base?

libhb: Improve buffer pool sizes #5232

Uh oh!

Conversation

stickz commented Jul 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradleysepos commented Jul 2, 2023

Uh oh!

This comment was marked as off-topic.

galad87 commented Jul 3, 2023

Uh oh!

sr55 commented Jul 3, 2023

Uh oh!

sr55 commented Jul 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jensdraht1999 commented Jul 4, 2023

Uh oh!

galad87 commented Jul 20, 2023

Uh oh!

stickz commented Jul 20, 2023

Uh oh!

jensdraht1999 commented Oct 8, 2023

Uh oh!

stickz commented Oct 8, 2023

Uh oh!

stickz commented Oct 9, 2023

Uh oh!

sr55 commented Oct 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jensdraht1999 commented Oct 10, 2023

Uh oh!

stickz commented Oct 12, 2023

Uh oh!

galad87 commented Oct 13, 2023

Uh oh!

sr55 commented Oct 15, 2023

Uh oh!

stickz commented Oct 15, 2023

Uh oh!

stickz commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

galad87 commented Oct 16, 2023

Uh oh!

stickz commented Oct 22, 2023

Uh oh!

galad87 commented Oct 22, 2023

Uh oh!

sr55 commented Oct 22, 2023

Uh oh!

stickz commented Oct 22, 2023

Uh oh!

sr55 commented Oct 22, 2023

Uh oh!

sr55 commented Oct 22, 2023

Uh oh!

stickz commented Oct 22, 2023

Uh oh!

galad87 commented Oct 23, 2023

Uh oh!

stickz commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jensdraht1999 commented Oct 23, 2023

Uh oh!

Uh oh!

stickz commented Jul 2, 2023 •

edited

Loading

sr55 commented Jul 3, 2023 •

edited

Loading

sr55 commented Oct 10, 2023 •

edited

Loading

stickz commented Oct 15, 2023 •

edited

Loading

stickz commented Oct 23, 2023 •

edited

Loading