Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divide by zero: request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT] #55

Open
yaronr opened this issue Jun 13, 2024 · 7 comments

Comments

@yaronr
Copy link

yaronr commented Jun 13, 2024

Running the benchmark script on a llama-3-8b-inst on inferentia 2 (djl-serving) results in:

python3.10 token_benchmark_ray.py \                                           
--model "openai/llama3-8b-inst" \
--mean-input-tokens 550 \
--stddev-input-tokens 150 \
--mean-output-tokens 150 \
--stddev-output-tokens 10 \
--max-num-completed-requests 1 \  
--timeout 600 \
--num-concurrent-requests 1 \ 
--results-dir "result_outputs" \
--llm-api "openai" \
--additional-sampling-params '{}'
Traceback (most recent call last):
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 456, in <module>
    run_token_benchmark(
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 297, in run_token_benchmark
    summary, individual_responses = get_token_throughput_latencies(
  File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 116, in get_token_throughput_latencies
    request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]
ZeroDivisionError: division by zero
e3oroush pushed a commit to e3oroush/llmperf that referenced this issue Aug 5, 2024
Makes sure the `REQ_OUTPUT_THROUGHPUT` won't be divided by zero, in case of server failure.
@sadrafh
Copy link

sadrafh commented Aug 5, 2024

same problem here, what did you do to fix this?

@ericg108
Copy link

ericg108 commented Aug 6, 2024

same problem here when testing a api deployed in house

2 similar comments
@changqingla
Copy link

same problem here when testing a api deployed in house

@hwzhuhao
Copy link

hwzhuhao commented Sep 6, 2024

same problem here when testing a api deployed in house

@Eviltuzki
Copy link

same problem
(OpenAIChatCompletionsClient pid=164504) 422
(OpenAIChatCompletionsClient pid=164507) 422
(OpenAIChatCompletionsClient pid=164510) 422
(OpenAIChatCompletionsClient pid=164506) 422
(OpenAIChatCompletionsClient pid=164500) 422
(OpenAIChatCompletionsClient pid=164502) 422
(OpenAIChatCompletionsClient pid=164498) 422
Traceback (most recent call last):
File "/home/llmperf/token_benchmark_ray.py", line 462, in
run_token_benchmark(
File "/home/llmperf/token_benchmark_ray.py", line 303, in run_token_benchmark
summary, individual_responses = get_token_throughput_latencies(
File "/home/llmperf/token_benchmark_ray.py", line 122, in get_token_throughput_latencies
request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]
ZeroDivisionError: division by zero
(OpenAIChatCompletionsClient pid=164508) 422
(OpenAIChatCompletionsClient pid=164508) Warning Or Error: 422 Client Error: Unprocessable Content for url: http://x.x.x.x:1025/v1/chat/completions

@kylin-zhou
Copy link

same problem here, how to fix this?

@Eviltuzki
Copy link

same problem here, how to fix this?

try print the response code and content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants