-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished #56
Comments
I found that the previous revision isn't sufficient to solve the problem. To send a request asynchronously right after the previous one finishes, many parts need fixing. I attempted to make the Here is the link to the relevant code: ray/util/actor_pool.py lines 311-326. I think there are two potential approaches for change:
|
Hey @llsj14 I'm facing the same issue - without issuing concurrent requests at a set rate, it's no longer a proper load testing framework, do you have plans to fix this? |
Hello @ashutoshsaboo, I've made some changes to the load testing code to continuously send requests without waiting for all "num_concurrent_requests" to finish. Since modifying the core part related to Ray was challenging, I used multiple threads and request launchers, each holding a single client. Code branch: Commit: |
@llsj14 Saw your branch code, I'll test it. Can you please rebase your branch with the latest commit in this repo, your branch seems to be one commit behind from the main branch in this project? I think the latest commit missing from your branch - llsj14/llmperf@main...ray-project:llmperf:main - is useful for the dynamic prompts that it generates. |
@ashutoshsaboo |
Hello,
I've encountered an issue where the request launcher does not allow the next requests to be sent until all requests specified by
num_concurrent_requests
have finished.This behavior seems counterintuitive for benchmarking TTFT and throughput in Continuous Batching systems accurately, as it can block subsequent requests even when the serving system is capable of handling them.
To address this, I believe the
get_next_ready
function should be modified as follows, enabling it to return results as soon as each individual request is completed:I am prepared to submit a pull request with this change and would appreciate your feedback.
Thank you.
The text was updated successfully, but these errors were encountered: