Skip to content

[DND-172] [BE] perf: optimize opik-python-backend Dockerfile#7012

Open
GuySaar8 wants to merge 2 commits into
mainfrom
guys/DND-172-optimize-python-backend-dockerfile
Open

[DND-172] [BE] perf: optimize opik-python-backend Dockerfile#7012
GuySaar8 wants to merge 2 commits into
mainfrom
guys/DND-172-optimize-python-backend-dockerfile

Conversation

@GuySaar8

@GuySaar8 GuySaar8 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Details

Applies the same Dockerfile optimizations as #7010 (DND-171, sandbox-executor) to the opik-python-backend image — faster builds, smaller bytecode-only dependency layer, and safer gating — scoped to this server's runtime constraints.

  • Image size: bytecode-compile the venv (compileall -o 2 -b, PYTHONNODEBUGRANGES=1), delete *.py/*.pyi/__pycache__, and strip pip/setuptools/wheel/ensurepip. *.dist-info is intentionally kept — opik/litellm/tiktoken read their own version via importlib.metadata at import time.
  • Runtime: consolidated env into one layer — LITELLM_MODE=PRODUCTION (avoids litellm's load_dotenv() frame inspection, which breaks under a .pyc-only layout), PYTHONNODEBUGRANGES=1, PYTHONDONTWRITEBYTECODE=1. Added # syntax=docker/dockerfile:1 and stage banners.
  • Toolchain: unchanged build deps (rust/cargo for native wheels) stay in the builder stage only.
  • Flaky test fix (separate commit 027799d273): test_time_shift_distances::test_demo_spans_all_shifted_to_latest_trace_date asserted shifted_span_start.date() == shifted_latest_trace_end.date(). Since the shift moves the latest trace to now(), an earlier-in-day span lands a fixed delta before now — i.e. the previous calendar day whenever the run is just after UTC midnight — failing the .date() equality despite the spans being <4h apart. Now asserts the preserved gap (shifted_latest_trace_end - shifted_span_start ≤ 1 day) instead of comparing calendar dates, making it immune to the rollover. Unrelated to the Dockerfile but folded in here because it was blocking this PR's CI.

Divergence from DND-171

This image is a long-running Flask/gunicorn server (Alpine + Docker CLI), not a one-shot scorer, so two parts of the sandbox pattern do not carry over:

  • Strip scope = venv site-packages only. src/ keeps its .py: optimizer_runner.py is executed as a subprocess script and config.py is read via Flask from_pyfile — deleting app source would break optimizer jobs.
  • Runs as root, no selftest.sh gate. The server needs the Docker socket to spawn sandbox-executor containers, so it can't drop to the unprivileged USER 1001 the sandbox uses; the scoring selftest.sh build-gate is sandbox-specific and N/A here.

Change checklist

  • User facing
  • Documentation update

Issues

  • Resolves DND-172
  • DND-172

AI-WATERMARK

AI-WATERMARK: yes

  • Tools: Claude Code
  • Model(s): Claude Opus 4.8 (1M context)
  • Scope: Dockerfile refactor (bytecode strip of venv, env consolidation, syntax header), flaky-test fix, and PR authoring.
  • Human verification: Author built and ran the image locally (see Testing) before requesting review.

Testing

Built and verified locally with docker buildx (colima, arm64):

  • docker buildx build --load apps/opik-python-backend — builds green.
  • Dep imports from the .pyc-only stripped venv succeed: flask, gunicorn, docker, litellm, opik, tiktoken, redis, rq, pydantic, opentelemetry + opik.evaluation.metrics.BaseMetricDEPS_OK. This is the main risk in the bytecode-only layout (litellm under .pyc-only), confirmed working.
  • Layout assertions in the built image: venv .py=0, .pyc=10453, .pyi=0, __pycache__=0, .dist-info=129 kept; pip/setuptools stripped; src/*.py=35 kept; optimizer_runner.py present.
  • test_time_shift_distances (both tests) run locally inside the same post-midnight UTC window that failed CI earlier → 2 passed; the full backend-tests run on the fix commit is green.
  • Not run locally: full gunicorn boot + end-to-end request (blocks on Redis/RQ + Docker daemon wiring) — covered in CI / a real deploy.

Documentation

No documentation changes — internal build optimization only.

Apply the DND-171 sandbox-executor optimizations to the python-backend
image, scoped to this server's constraints:

- Bytecode-compile the venv deps (compileall -o 2 -b, PYTHONNODEBUGRANGES=1),
  delete .py/.pyi/__pycache__, keep *.dist-info for importlib.metadata.
- Strip pip/setuptools/wheel/ensurepip from the venv.
- Consolidate runtime ENV (LITELLM_MODE=PRODUCTION, PYTHONNODEBUGRANGES=1,
  PYTHONDONTWRITEBYTECODE=1); add syntax header + stage banners.

Diverged from DND-171 where the runtime differs: strip scope is the venv
site-packages only — src/ keeps its .py because optimizer_runner.py is exec'd
as a subprocess script and config.py is read via Flask from_pyfile. Kept root
user + Alpine + tini + dockerd entrypoint since the server needs the Docker
socket to spawn sandbox-executor containers.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Python Backend Tests Results

196 tests   193 ✅  3m 0s ⏱️
  1 suites    3 💤
  1 files      0 ❌

Results for commit 027799d.

♻️ This comment has been updated with latest results.

…istances

test_demo_spans_all_shifted_to_latest_trace_date asserted
shifted_span_start.date() == shifted_latest_trace_end.date(). The shift moves
the latest trace end to now(), so a span originally earlier in the day shifts
to a fixed delta before now — which lands on the previous calendar day whenever
the run happens just after UTC midnight, failing the .date() equality despite
the spans being <4h apart.

Assert the preserved gap (shifted_latest_trace_end - shifted_span_start) stays
within one day instead of comparing calendar dates. Real demo data max gap is
~3h46m, so the 1-day bound holds with margin and is immune to the rollover.
Verified passing locally inside the same post-midnight UTC window that failed
CI attempts 1 and 2.

Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
@github-actions github-actions Bot added python Pull requests that update Python code tests Including test files, or tests related like configuration. labels Jun 10, 2026

@andrescrz andrescrz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GuySaar8

Before you move forward.Can you please explain the goal of this PR?
This service runs in a quite optimal way nowadays and I don't see any revelant information here about what we're trying to accomplish:

In addition, can you provide some bechmarking results?

Finally, this could be a sensitive change. Have you regressed the main functionality of this service:

  1. Python Online Evaluations.
    1.a With docker executor.
    1.b With process executor.
  2. Demo data generation.
  3. Optimisations execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Infrastructure python Pull requests that update Python code tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants