Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: pagination can break with certain page_sizes #746

Open
2 tasks done
HarryFreeMyLand opened this issue Jun 10, 2024 · 4 comments
Open
2 tasks done

[Bug]: pagination can break with certain page_sizes #746

HarryFreeMyLand opened this issue Jun 10, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@HarryFreeMyLand
Copy link

HarryFreeMyLand commented Jun 10, 2024

I've read the documentation

Operating System

Linux (TrueNAS k3s)

Your Bug Report

Describe the bug

there appears to be a bug with pagination where depending on how the page_size is set (called archive view in /settings/user) could cause too big of a request to be sent to ES.

below is the caused error, on a channel with 9981 videos and 150 view size, causing the pagination to reach a final page 67 that will return an error 500 and this output in the logs.
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"ta_video","node":"6V50WepGQHS9j3s32-DcFQ","reason":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.","caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}},"status":400}

lamusmaser found the specific details in a discord thread, his findings are below:

OK, I think I figured it out.
Page 66 has the correct configurations that are expected to come through from the Pagination class: https://github.com/tubearchivist/tubearchivist/blob/master/tubearchivist/home/src/index/generic.py#L79-L145
Page 67 doesn't because page_size = 150 and page_from = (9750+150 = 9900), which means that the request is going because ElasticSearch's maximum limitation, as noted https://github.com/tubearchivist/tubearchivist/blob/master/tubearchivist/home/src/index/generic.py#L132. This would normally be getting handled by the max_pages calculation here: https://github.com/tubearchivist/tubearchivist/blob/master/tubearchivist/home/src/index/generic.py#L130, but we run into an issue of 10000 not dividing evenly, so any number that could go beyond the 10000 would cause this issue. The reason 24 works is because 9981/24 = 415.875; 415 * 24 = 9960, which means that the last page is 9960 through 9984, which doesn't exceed the max_pages limit by ES. 200 also works because 10000 divides evenly by it.
If you'd open up a GitHub Issue, I can link this work and put in a recommended fix for handling the page_size when it needs to be adjusted; if we decrease the page_from, then the last page will have duplicates from the previous page.
[11:13 AM]
If there were four more videos in the results, 24 also wouldn't work for the last page, since it would increment beyond from 9984 through 10001, which would cause the error to come back from ES.

Relevant log output

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"ta_video","node":"6V50WepGQHS9j3s32-DcFQ","reason":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.","caused_by":{"type":"illegal_argument_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."}}},"status":400}

Anything else?

No response

@lamusmaser
Copy link
Collaborator

Just confirming that the above statements are full and correct.

I will take a look into the Pagination class to determine what should be done to fix this - obviously, if we modify pagination requests to ES, this can be handled differently, but fixing this overall issue should be relatively easy. It'll get a PR when I'm ready to submit a fix, unless someone else gets to it before me.

@lamusmaser lamusmaser added the bug Something isn't working label Jun 11, 2024
@gautamkrishnar
Copy link

Any update on this issue?

@bbilly1
Copy link
Member

bbilly1 commented Jul 20, 2024

Any update on this issue?

You are at the right place to see any updates.

@gautamkrishnar
Copy link

gautamkrishnar commented Jul 20, 2024

@bbilly1 I was about to debug and try to open a PR to fix it, unfortunately, I switched my deployment to an older version this got fixed. I am not able to reproduce it anymore, when I moved to a newer release. Hopefully, someone else will fix it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants