[DND-172] [BE] perf: optimize opik-python-backend Dockerfile by GuySaar8 · Pull Request #7012 · comet-ml/opik

GuySaar8 · 2026-06-10T02:20:28Z

Details

Applies the same Dockerfile optimizations as #7010 (DND-171, sandbox-executor) to the opik-python-backend image — faster builds, smaller bytecode-only dependency layer, and safer gating — scoped to this server's runtime constraints.

Image size: bytecode-compile the venv (compileall -o 2 -b, PYTHONNODEBUGRANGES=1), delete *.py/*.pyi/__pycache__, and strip pip/setuptools/wheel/ensurepip. *.dist-info is intentionally kept — opik/litellm/tiktoken read their own version via importlib.metadata at import time.
Runtime: consolidated env into one layer — LITELLM_MODE=PRODUCTION (avoids litellm's load_dotenv() frame inspection, which breaks under a .pyc-only layout), PYTHONNODEBUGRANGES=1, PYTHONDONTWRITEBYTECODE=1. Added # syntax=docker/dockerfile:1 and stage banners.
Toolchain: unchanged build deps (rust/cargo for native wheels) stay in the builder stage only.
Flaky test fix (separate commit 027799d273): test_time_shift_distances::test_demo_spans_all_shifted_to_latest_trace_date asserted shifted_span_start.date() == shifted_latest_trace_end.date(). Since the shift moves the latest trace to now(), an earlier-in-day span lands a fixed delta before now — i.e. the previous calendar day whenever the run is just after UTC midnight — failing the .date() equality despite the spans being <4h apart. Now asserts the preserved gap (shifted_latest_trace_end - shifted_span_start ≤ 1 day) instead of comparing calendar dates, making it immune to the rollover. Unrelated to the Dockerfile but folded in here because it was blocking this PR's CI.

Divergence from DND-171

This image is a long-running Flask/gunicorn server (Alpine + Docker CLI), not a one-shot scorer, so two parts of the sandbox pattern do not carry over:

Strip scope = venv site-packages only. src/ keeps its .py: optimizer_runner.py is executed as a subprocess script and config.py is read via Flask from_pyfile — deleting app source would break optimizer jobs.
Runs as root, no selftest.sh gate. The server needs the Docker socket to spawn sandbox-executor containers, so it can't drop to the unprivileged USER 1001 the sandbox uses; the scoring selftest.sh build-gate is sandbox-specific and N/A here.

Change checklist

User facing
Documentation update

Issues

Resolves DND-172
DND-172

AI-WATERMARK

AI-WATERMARK: yes

Tools: Claude Code
Model(s): Claude Opus 4.8 (1M context)
Scope: Dockerfile refactor (bytecode strip of venv, env consolidation, syntax header), flaky-test fix, and PR authoring.
Human verification: Author built and ran the image locally (see Testing) before requesting review.

Testing

Built and verified locally with docker buildx (colima, arm64):

docker buildx build --load apps/opik-python-backend — builds green.
Dep imports from the .pyc-only stripped venv succeed: flask, gunicorn, docker, litellm, opik, tiktoken, redis, rq, pydantic, opentelemetry + opik.evaluation.metrics.BaseMetric → DEPS_OK. This is the main risk in the bytecode-only layout (litellm under .pyc-only), confirmed working.
Layout assertions in the built image: venv .py=0, .pyc=10453, .pyi=0, __pycache__=0, .dist-info=129 kept; pip/setuptools stripped; src/*.py=35 kept; optimizer_runner.py present.
test_time_shift_distances (both tests) run locally inside the same post-midnight UTC window that failed CI earlier → 2 passed; the full backend-tests run on the fix commit is green.
Not run locally: full gunicorn boot + end-to-end request (blocks on Redis/RQ + Docker daemon wiring) — covered in CI / a real deploy.

Documentation

No documentation changes — internal build optimization only.

Apply the DND-171 sandbox-executor optimizations to the python-backend image, scoped to this server's constraints: - Bytecode-compile the venv deps (compileall -o 2 -b, PYTHONNODEBUGRANGES=1), delete .py/.pyi/__pycache__, keep *.dist-info for importlib.metadata. - Strip pip/setuptools/wheel/ensurepip from the venv. - Consolidate runtime ENV (LITELLM_MODE=PRODUCTION, PYTHONNODEBUGRANGES=1, PYTHONDONTWRITEBYTECODE=1); add syntax header + stage banners. Diverged from DND-171 where the runtime differs: strip scope is the venv site-packages only — src/ keeps its .py because optimizer_runner.py is exec'd as a subprocess script and config.py is read via Flask from_pyfile. Kept root user + Alpine + tini + dockerd entrypoint since the server needs the Docker socket to spawn sandbox-executor containers. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

github-actions · 2026-06-10T02:25:06Z

Python Backend Tests Results

196 tests 193 ✅ 3m 0s ⏱️
1 suites 3 💤
1 files 0 ❌

Results for commit 027799d.

♻️ This comment has been updated with latest results.

…istances test_demo_spans_all_shifted_to_latest_trace_date asserted shifted_span_start.date() == shifted_latest_trace_end.date(). The shift moves the latest trace end to now(), so a span originally earlier in the day shifts to a fixed delta before now — which lands on the previous calendar day whenever the run happens just after UTC midnight, failing the .date() equality despite the spans being <4h apart. Assert the preserved gap (shifted_latest_trace_end - shifted_span_start) stays within one day instead of comparing calendar dates. Real demo data max gap is ~3h46m, so the 1-day bound holds with margin and is immune to the rollover. Verified passing locally inside the same post-midnight UTC window that failed CI attempts 1 and 2. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>

andrescrz

Hi @GuySaar8

Before you move forward.Can you please explain the goal of this PR?
This service runs in a quite optimal way nowadays and I don't see any revelant information here about what we're trying to accomplish:

https://comet-ml.atlassian.net/browse/DND-172

In addition, can you provide some bechmarking results?

Finally, this could be a sensitive change. Have you regressed the main functionality of this service:

Python Online Evaluations.
1.a With docker executor.
1.b With process executor.
Demo data generation.
Optimisations execution.

GuySaar8 requested a review from a team as a code owner June 10, 2026 02:20

github-actions Bot assigned GuySaar8 Jun 10, 2026

github-actions Bot added Backend Infrastructure labels Jun 10, 2026

baz-reviewer Bot approved these changes Jun 10, 2026

View reviewed changes

github-actions Bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. labels Jun 10, 2026

obezpalko approved these changes Jun 10, 2026

View reviewed changes

andrescrz requested changes Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DND-172] [BE] perf: optimize opik-python-backend Dockerfile#7012

[DND-172] [BE] perf: optimize opik-python-backend Dockerfile#7012
GuySaar8 wants to merge 2 commits into
mainfrom
guys/DND-172-optimize-python-backend-dockerfile

GuySaar8 commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

andrescrz left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GuySaar8 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Divergence from DND-171

Change checklist

Issues

AI-WATERMARK

Testing

Documentation

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Backend Tests Results

Uh oh!

andrescrz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GuySaar8 commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading

andrescrz left a comment •

edited

Loading