Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished in non-block mode #59

Merged
merged 2 commits into from
Dec 9, 2024

Conversation

llsj14
Copy link
Contributor

@llsj14 llsj14 commented Jul 1, 2024

issues

#43
#56

Summary

  • Subsequent requests cannot be sent until whole requests have all finished even in non-block mode.
  • Fixing the request launcher was challenging due to its dependency on Ray, so I used multiple threads and request launchers, each holding one client and controlling only one request.

…s' requests have all finished in non-blocking mode

Signed-off-by: Sungjae Lee <[email protected]>
@llsj14 llsj14 force-pushed the fix/get-next-ready branch from d75db69 to 570f780 Compare July 3, 2024 16:12
@jouDance
Copy link

jouDance commented Nov 10, 2024

Hey, was there any progress regarding this issue ? it seems its detrimental for the correct operation of llmperf but on the other hand no one gives it any attention.
@gracehonv

@llsj14
Copy link
Contributor Author

llsj14 commented Nov 12, 2024

I guess maybe it might be better to move it to a different file in an overlay manner instead of modifying the token_benchmark_ray.py file..

Anyway, I have been using llmperf well since applying this commit without any errors.

@gracehonv
Copy link
Contributor

I had made a commit 5 months ago to move the prompt construction outside the send loop so the benchmark doesn't get slowed down. I didn't know this affects the num_concurrent_requests mode. Let me know if I can help in some way.

@llsj14
Copy link
Contributor Author

llsj14 commented Nov 13, 2024

@gracehonv
I rebased my code onto your commit, and it didn't slow down.

@gracehonv
Copy link
Contributor

It does look like in your commit you have prepared all of the prompts before launching all the concurrent requests so there shouldn't be any slowdown. Thanks for fixing!

@cpwan
Copy link

cpwan commented Dec 7, 2024

image
The y axis is the number of concurrent requests.

The pattern on the left is with the current token_benchmark_ray, the pattern on the right is with your fix.

Thanks for the fix!

@llsj14
Copy link
Contributor Author

llsj14 commented Dec 8, 2024

@cpwan,
Thank you for your analysis and visualization related to this PR.

@llsj14
Copy link
Contributor Author

llsj14 commented Dec 8, 2024

@avnishn @rickyyx
Would it be possible to get this PR reviewed please? Thank you.

Copy link
Collaborator

@rickyyx rickyyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@rickyyx rickyyx merged commit f1d6bed into ray-project:main Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants