[Serve] Ray Serve TracingConfig Integration#61955
[Serve] Ray Serve TracingConfig Integration#61955suppagoddo wants to merge 4 commits intoray-project:masterfrom
Conversation
…guration Adds a TracingConfig class that replaces env-var-only tracing setup with a proper configuration object, consistent with HTTPOptions/gRPCOptions. TracingConfig is global (cluster-level) and static (set once at startup). It supports serve.start(), YAML ServeDeploySchema, and falls back to existing env vars for full backward compatibility. Resolves ray-project#61788 Signed-off-by: uditagrawal <[email protected]>
13 tests for TracingConfig validation: - Default values from module-level constants - Explicit enabled True/False - Dict parsing and roundtrip - Auto-fill default exporter when enabled=True with empty exporter - Custom exporter preserved when explicitly set - sampling_ratio bounds validation (0.0-1.0) - Extra fields forbidden - ServeDeploySchema integration (with, without, invalid) 3 tests for is_tracing_enabled mutable flag: - Default disabled state - Flag set behavior (with/without opentelemetry) - Flag unset returns False No mocks used — follows existing test patterns. Signed-off-by: uditagrawal <[email protected]>
4 integration tests added to test_tracing_utils.py: - test_tracing_config_enabled: TracingConfig(enabled=True) produces spans - test_tracing_config_disabled: TracingConfig(enabled=False) produces no spans - test_tracing_config_custom_sampling_ratio: 0% sampling produces no spans - test_tracing_config_from_dict: dict passed as tracing_config works Tests follow existing e2e patterns (serve.start, serve.run, make requests, verify span files). No mocks used. Signed-off-by: uditagrawal <[email protected]>
When apply_config() is called with a ServeDeploySchema that includes tracing_config, update the stored config on the controller, proxy state manager, application state manager, and deployment state manager. Existing actors are NOT reconfigured (tracing is static), but newly created proxies and replicas will use the updated config. Signed-off-by: uditagrawal <[email protected]>
There was a problem hiding this comment.
Code Review
This PR introduces TracingConfig to allow users to configure OpenTelemetry tracing in Ray Serve. The changes are well-structured, propagating the new configuration from the top-level API down to the individual components like replicas and proxies. The new TracingConfig schema is well-designed with sensible defaults and validation. The accompanying unit and end-to-end tests are comprehensive and cover various scenarios, ensuring the new feature is robust. I have one minor suggestion to improve the clarity of a test case.
|
|
||
| # Should have replica spans, proxy spans, and upstream spans | ||
| replica_filename = None | ||
| proxy_filename = None or RAY_SERVE_ENABLE_HA_PROXY |
There was a problem hiding this comment.
This line is a bit confusing. None or X evaluates to X, so if RAY_SERVE_ENABLE_HA_PROXY is True, proxy_filename is initialized to True. While the test might pass because this variable is overwritten if a proxy file is found, this pattern is hard to reason about and could be brittle.
To improve clarity, I suggest initializing proxy_filename to None and then adding an explicit assertion after the loop to check for the proxy span file's existence based on whether HAProxy is enabled. For example:
# After the loop
assert replica_filename, "No replica span file found"
assert upstream_filename, "No upstream span file found"
if not RAY_SERVE_ENABLE_HA_PROXY:
assert proxy_filename, "No proxy span file found"| proxy_filename = None or RAY_SERVE_ENABLE_HA_PROXY | |
| proxy_filename = None |
Adding a TracingConfig class for configuring OpenTelemetry tracing in Ray Serve, replacing the env only approach with a proper first-class configuration object.
Issue:
#61788