Skip to content

Conversation

@ever0de
Copy link
Contributor

@ever0de ever0de commented Nov 21, 2025

Fixed #6287

reference

Summary by CodeRabbit

  • Refactor
    • Improved concurrency and initialization safety for the built-in SQLite connection: core connection state now uses concurrency-safe primitives with an explicit initialization guard and atomic state operations.
    • Revised lifecycle and reinitialization flow to open/close safely, enforce thread confinement when enabled, reset row/text factories to defaults on reinit, and ensure failed opens leave the connection uninitialized.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

Refactors Connection in crates/stdlib/src/sqlite.rs to use concurrency-safe primitives: adds initialized: PyAtomic<bool>, converts detect_types, check_same_thread to atomics, changes thread_ident to a mutex, and updates init/reinit, close, and thread-checking flows plus helper functions drop_db and reset_factories.

Changes

Cohort / File(s) Summary
SQLite Connection concurrency & init changes
crates/stdlib/src/sqlite.rs
Added initialized: PyAtomic<bool>; converted detect_types: c_intPyAtomic<c_int>, check_same_thread: boolPyAtomic<bool>, and thread_ident: ThreadIdPyMutex<ThreadId>. Updated py_new/init and reinitialization flow (open DB before mutating state, ensure uninitialized on failure), added drop_db and reset_factories, and replaced direct field accesses with atomic loads/stores and mutex usage across connection, cursor, and statement code paths.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant C as Connection
    participant S as SQLiteDB
    participant A as Atomics

    U->>C: __init__(params)
    C->>A: initialized.store(false)
    C->>S: open database
    alt DB open succeeds
        C->>A: set detect_types / check_same_thread
        C->>C: set thread_ident (PyMutex)
        C->>A: initialized.store(true)
        C-->>U: return
    else DB open fails
        C->>A: ensure initialized stays false
        C-->>U: raise Error
    end

    U->>C: execute / executemany
    C->>A: if !initialized.load() -> raise ProgrammingError
    alt initialized == true
        C->>A: check_same_thread.load()
        C->>C: lock thread_ident if needed
        C->>S: execute statement
        C-->>U: result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • Atomic ordering and correctness for initialized, detect_types, and check_same_thread.
    • Reinitialization failure paths ensuring initialized remains false on error.
    • All call sites (cursor/statement/description) updated to use new atomics.
    • thread_ident mutex use and potential contention/latency impacts.

Possibly related PRs

Suggested reviewers

  • youknowone
  • ShaharNaveh

Poem

🐇 I hopped through atomics, soft and bright,

initialized set to guard the night,
mutex cradles the creator thread tight,
re-init falters — we keep state right,
now connections rest secure and light.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix sqlite connection reinitialization' directly and clearly summarizes the main change, addressing the core issue of handling state invalidation during failed Connection reinitialization.
Linked Issues check ✅ Passed The implementation addresses all coding requirements from #6287: introduces initialized flag tracking via PyAtomic, ensures state is marked uninitialized on failed reinitialization, and supports proper ProgrammingError raising on subsequent operations.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing sqlite connection reinitialization: atomic state management, thread confinement, and initialization flow align with the issue requirements without introducing unrelated modifications.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ece048d and c9fc8a7.

📒 Files selected for processing (1)
  • crates/stdlib/src/sqlite.rs (9 hunks)
🔇 Additional comments (5)
crates/stdlib/src/sqlite.rs (5)

906-936: Reinitialization logic looks correct.

The initialization flow properly ensures that if initialize_db (line 917) fails, the connection remains in an uninitialized state (initialized=false, db=None), which will cause subsequent operations to raise ProgrammingError via _db_lock. This matches the PR objectives.

The memory ordering changes (AcqRel at line 906, Release at line 934) address the past review comment about establishing proper happens-before relationships.

Minor observation: The Release store at line 934 publishes all prior Relaxed stores (lines 926-930), but only threads that Acquire-load initialized would see these updates through that synchronization path. Since most code paths use the db mutex for synchronization instead, this is likely acceptable.


941-949: Helper methods look good.

The drop_db and reset_factories helpers correctly encapsulate resource cleanup and state reset. The reset_factories implementation matches CPython's behavior as noted in the comment at line 908.


1484-1495: Thread checking logic is correct.

The atomic load of check_same_thread with Relaxed ordering is appropriate since it's a boolean safety flag. The mutex lock on thread_ident properly serializes access to the thread ID.


1668-1669: Atomic reads of detect_types are appropriate.

Using Relaxed ordering for reading detect_types is acceptable here since:

  • It's a configuration parameter that affects parsing behavior, not safety
  • Even if a thread briefly sees a stale value during reinitialization, it won't cause corruption
  • The database mutex provides the primary synchronization for correctness

Also applies to: 1717-1718, 1883-1884


836-843: The initialized field is redundant; db.is_some() already guards operations.

The initialized flag tracks reinitialization state but is never checked to guard database operations. The actual guard is db.is_some() in the _db_lock method (line 969), which raises "Base Connection.init not called." when the connection is uninitialized.

If init() fails, both db and initialized remain false/None, so the behavior is correct. However, the initialized field appears unused for operation safety—consider removing it unless it's explicitly needed for other purposes (e.g., observability or future optimization noted in the TODO on thread_ident).


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/stdlib/src/sqlite.rs (1)

906-906: Consider stronger memory ordering for the initialized flag.

Using Ordering::Relaxed for the initialized flag swap may not provide sufficient synchronization guarantees across threads. When one thread sets initialized to true (line 934), another thread reading it with Relaxed ordering might not see the updated value promptly, or might observe it out of order with respect to other field updates.

Consider using Ordering::AcqRel for the swap and Ordering::Release when storing (line 934) to ensure proper synchronization.

Apply this change for stronger guarantees:

-        let was_initialized = zelf.initialized.swap(false, Ordering::Relaxed);
+        let was_initialized = zelf.initialized.swap(false, Ordering::AcqRel);

And at line 934:

-            zelf.initialized.store(true, Ordering::Relaxed);
+            zelf.initialized.store(true, Ordering::Release);
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7ddcd2 and 216ab89.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_sqlite3/test_dbapi.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/stdlib/src/sqlite.rs (9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run tests under miri
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check Rust code with clippy
🔇 Additional comments (6)
crates/stdlib/src/sqlite.rs (6)

836-840: LGTM! Thread-safe field wrappers are appropriate.

The atomic wrappers for detect_types and check_same_thread, plus the mutex for thread_ident, correctly enable thread-safe access to these fields that are read outside the db mutex protection. The initialized flag properly tracks initialization state for re-initialization handling.


869-880: LGTM! Constructor initialization is correct.

The initialization logic properly sets initialized to true for base classes (where db is initialized immediately) and false for subclasses (where __init__ must be called). The use of Radium::new for atomic fields is appropriate.


905-936: Potential race condition with concurrent re-initialization.

If two threads concurrently call __init__ on the same Connection object, they could interleave their operations unpredictably:

  • Both threads swap initialized to false
  • Both threads reset factories and drop the database
  • Both threads open new databases
  • Both threads try to store their database handle, with the second overwriting the first (leaking the first connection)

While concurrent __init__ calls are uncommon in typical Python usage, this is technically a data race. Consider documenting that __init__ must not be called concurrently, or add protection against this scenario.


941-949: LGTM! Helper methods are correct.

The drop_db and reset_factories helper methods correctly encapsulate their respective cleanup operations. Note that close() calls drop_db but doesn't reset the initialized flag, though this doesn't cause incorrect behavior since _db_lock checks db.is_some() rather than the flag directly.


1484-1495: LGTM! Atomic field access is correct.

The check_thread method correctly uses load() for the atomic check_same_thread field and lock() for the mutex-protected thread_ident. The Relaxed ordering is acceptable here since these are configuration checks.


1668-1669: LGTM! Atomic detect_types access is correct.

All sites correctly read detect_types using load(Ordering::Relaxed) from the atomic field. The Relaxed ordering is appropriate for reading this configuration field.

Also applies to: 1717-1718, 1883-1884

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@youknowone
Copy link
Member

Could you check this comment? #6288 (review)

This sounds reasonable. And because it is initialization step, we don't have much disadvantage to use more stronger ordering.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5150186 and b2f3adf.

📒 Files selected for processing (1)
  • crates/stdlib/src/sqlite.rs (9 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: auto_format
🔇 Additional comments (6)
crates/stdlib/src/sqlite.rs (6)

836-843: LGTM! Struct changes properly support reinitialization tracking.

The addition of the initialized flag and conversion of configuration fields to atomic types correctly enables tracking initialization state and thread-safe access. The use of PyMutex<ThreadId> is appropriate given the TODO comment and platform constraints.


869-880: LGTM! Initialization correctly distinguishes base vs. subclass construction.

The logic properly initializes the database immediately for the base Connection class while deferring initialization for subclasses until __init__ is called. The atomic field initialization using Radium::new is appropriate.


941-949: LGTM! Helper methods correctly manage connection state.

The drop_db and reset_factories methods appropriately encapsulate state management. The use of unsafe swap operations is consistent with the existing codebase patterns for PyAtomicRef.


1483-1495: LGTM! Thread checking correctly uses atomic flag and mutex.

The implementation properly loads the check_same_thread configuration flag atomically and locks the mutex to safely access thread_ident. The use of Ordering::Relaxed for the configuration flag is acceptable since it doesn't require synchronization with other state.


1668-1669: LGTM! Consistent atomic loads for configuration values.

The atomic loads of detect_types with Ordering::Relaxed are appropriate throughout the codebase. Since this is a configuration value that doesn't require synchronization with other mutable state, relaxed ordering is sufficient and provides good performance.

Also applies to: 1717-1718, 1883-1884


908-917: No action required—factory reset timing matches CPython behavior.

CPython's sqlite3.Connection does reset row_factory and text_factory during init reinitialization. The code correctly resets factories before the database open attempt, aligning with CPython's reinitialization approach. CPython's own reinitialization is documented as having inconsistent timing between text_factory and row_factory, and the implementation here matches that behavior.

Apply code review feedback to use Ordering::AcqRel for the swap operation
and Ordering::Release for the store operation to ensure proper
synchronization across threads when setting the initialized flag.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/stdlib/src/sqlite.rs (2)

1483-1495: Thread safety check correctly updated to use atomic primitives.

The method properly reads check_same_thread atomically and uses the mutex for thread_ident. While check_same_thread uses Relaxed ordering, this is acceptable because:

  • The subsequent db_lock() call provides additional synchronization via the database mutex
  • Thread checks happen before every operation, so any transient staleness during reinitialization is brief
  • Reinitialization is rare (typically tests or error recovery)

If you want stricter guarantees, consider using Acquire ordering for the check_same_thread load to synchronize with the Release store in init at line 934. However, this is optional since the current approach is likely sufficient for practical usage.

-            if self.check_same_thread.load(Ordering::Relaxed) {
+            if self.check_same_thread.load(Ordering::Acquire) {

1668-1669: LGTM: detect_types correctly read atomically in cursor operations.

The consistent use of Ordering::Relaxed for detect_types loads is acceptable because:

  • These reads occur within operations that have already called db_lock(), providing synchronization via the database mutex
  • Type detection behavior is not safety-critical
  • Reinitialization-induced staleness would be transient and rare

Similar to check_same_thread, you could optionally use Acquire ordering for these loads to establish a stronger happens-before relationship with the Release store during initialization (line 934), but this is not necessary for correctness in typical usage patterns.

Also applies to: 1717-1718, 1883-1884

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b2f3adf and ece048d.

📒 Files selected for processing (1)
  • crates/stdlib/src/sqlite.rs (9 hunks)
🔇 Additional comments (5)
crates/stdlib/src/sqlite.rs (5)

834-843: LGTM: Struct fields correctly updated for thread-safe reinitialization.

The addition of the initialized atomic flag and conversion of detect_types and check_same_thread to atomics enables proper state tracking during reinitialization. The TODO comment on thread_ident is acknowledged in the PR comments.


854-883: LGTM: Proper initialization for both base and subclass Connection instances.

The initialization correctly handles the base class (immediate DB setup) vs. subclass (deferred setup) distinction, and properly initializes all atomic fields.


902-937: LGTM: Reinitialization logic correctly implements CPython-compatible behavior.

The initialization sequence properly handles the requirements from issue #6287:

  1. Atomically marks connection as uninitialized (line 906 with AcqRel ordering)
  2. Resets factories to defaults (line 909), matching CPython behavior
  3. Opens DB before mutating other state (line 917), so failures leave the connection uninitialized and subsequent operations raise ProgrammingError
  4. Only marks initialized on success (line 934 with Release ordering)

The memory ordering has been improved from the previous version, now using AcqRel for the swap and Release for the final store, which is stronger than the past review's suggestion and aligns with the maintainer's comment that "using stronger ordering has little downside" for initialization.

Based on learnings from past review comments.


941-949: LGTM: Helper methods correctly encapsulate reinitialization concerns.

drop_db and reset_factories cleanly separate the resource cleanup and default restoration logic, matching CPython's connection reinitialization behavior.


1039-1043: LGTM: Refactored to use the drop_db helper for consistency.

@github-actions
Copy link
Contributor

Code has been automatically formatted

The code in this PR has been formatted using cargo fmt.
The changes have been committed with [skip ci] to avoid triggering another CI run.

Triggered by commit: ece048d0be2079e2f9a808c634f744769dbb1d17
Last formatted: 2025-11-22T12:04:20Z

You may need to pull the latest changes before pushing again:

git pull origin fixed-issue-6287

@youknowone youknowone merged commit a9469a2 into RustPython:main Nov 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sqlite: Handle state invalidation when Connection.__init__ fails during re-initialization

2 participants