Skip to content

fix(ci): isolate test files into separate processes to stop kuzu SIGSEGV#317

Merged
HumanBean17 merged 1 commit into
masterfrom
fix/ci-segfault-process-isolation
Jun 14, 2026
Merged

fix(ci): isolate test files into separate processes to stop kuzu SIGSEGV#317
HumanBean17 merged 1 commit into
masterfrom
fix/ci-segfault-process-isolation

Conversation

@HumanBean17

Copy link
Copy Markdown
Owner

Root cause

CI segfaulted (exit 139) at ~53% in ladybug/kuzu's NodeTableScanState::scanNext — a bad-pointer memcpy (glibc AVX) during the find_by_name_or_fqn MATCH (s:Symbol) scan. ladybug 0.17.1 is kuzu (re-vendored as lbug).

Diagnosis runs (real x86, via a temporary gdb workflow on debug/segfault-gdb):

Experiment Result
Full suite under gdb 💥 SIGSEGV at ~53%, backtrace → lbug::storage::NodeTableScanState::scanNext
test_ladybug_queries.py alone ✅ 39/39 pass — not a kuzu query bug
Full suite + OMP_NUM_THREADS=1 RAYON_NUM_THREADS=1 💥 still crashes (~52%) — not the OMP/rayon pools
Full suite + pytest-xdist --dist loadfile 771 passed / 9 skipped, no segfault

So the crash is accumulated cross-library native process-state corruption: by 53% one process has ~280 threads (cocoindex + lancedb each run their own Tokio runtime, plus kuzu's TaskScheduler and torch). That corrupts the heap; kuzu's parallel scanner later reads a Symbol string property from a bad pointer. Process isolation prevents the accumulation.

Fix

  • Add pytest-xdist to dev deps.
  • Run pytest tests -n auto --dist loadfile -v — each test file runs in its own fresh worker process.

⚠️ Production caveat

This fixes CI/tests. The underlying native-stack corruption under sustained single-process load is a potential concern for the MCP server (it loads the same cocoindex + lancedb + kuzu + torch stack). Worth a follow-up: investigate the Tokio-runtime proliferation in cocoindex/lancedb and/or report upstream.

Cleanup

The throwaway debug workflow + branch (debug/segfault-gdb) can be deleted after this merges.

🤖 Generated with Claude Code

Running the full suite in one process accumulated native runtimes (cocoindex + lancedb Tokio, kuzu scheduler, torch) that corrupted the heap, crashing kuzu's NodeTableScanState::scanNext with a SIGSEGV at ~53%. pytest-xdist --dist loadfile gives each test file a fresh worker process so no cross-file native state accumulates. Verified on real x86 CI: 771 passed / 9 skipped, no segfault.

Co-Authored-By: Claude <[email protected]>
@HumanBean17 HumanBean17 merged commit 8efb7d1 into master Jun 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant