Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking on pending requests despite block == false #43

Open
dacorvo opened this issue Apr 3, 2024 · 1 comment
Open

Blocking on pending requests despite block == false #43

dacorvo opened this issue Apr 3, 2024 · 1 comment

Comments

@dacorvo
Copy link

dacorvo commented Apr 3, 2024

I am using the litellm client to benchmark a HuggingFace TGI server.

In token_benchmark_ray.py, req_launcher.get_next_ready() is called periodically to fetch pending results, with the block parameter set to False.

However, the call is actually blocking until all pending requests are complete, which can be very long if I set a high number of concurrent requests (typically 128).

The result is that instead of continuously injecting new requests as they complete, the benchmark script instead sends a batch of max_concurrent_requests, waits for them to complete, then sends another batch.

Is this the expected behaviour ? I double-checked why the call is blocking and from the code in request launcher this seems to be the normal behaviour because it only checks if there are still requests in the ray actor pool.

@llsj14
Copy link
Contributor

llsj14 commented Jun 18, 2024

I found that I have the same issue as you. (#56)
I think the get_next_ready function should return the result as soon as the request is finished in nonblock mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants