Skip to content

Avoid re-locking reused UTXO futures#4673

Merged
TheBlueMatt merged 1 commit into
lightningdevkit:mainfrom
tnull:2026-06-utxo-same-future-deadlock
Jun 17, 2026
Merged

Avoid re-locking reused UTXO futures#4673
TheBlueMatt merged 1 commit into
lightningdevkit:mainfrom
tnull:2026-06-utxo-same-future-deadlock

Conversation

@tnull

@tnull tnull commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

UtxoLookup implementations may cache and return the same async future for repeated requests for a short channel id. When a replacement channel announcement arrived for an in-flight lookup, the async path held the future state while comparing the existing pending entry, which could point to that same state.

Drop the state guard before checking or replacing the pending entry so repeated lookups can update the pending announcement without re-entering the mutex.

Co-Authored-By: HAL 9000

This finding was discovered by Project Loupe

@tnull tnull requested a review from TheBlueMatt June 10, 2026 11:51
@ldk-reviews-bot

ldk-reviews-bot commented Jun 10, 2026

Copy link
Copy Markdown

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-claude-review-bot

ldk-claude-review-bot commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

No issues found.

The implementation changed since my prior review pass (it now compares through the already-held guard via Arc::ptr_eq rather than dropping/re-locking the guard), but the new logic is correct:

  • The deadlock is avoided by not re-locking future.state when the pending entry is Arc::ptr_eq to it; the comparison reads replacement_messages (the held guard) directly.
  • At compare time async_messages.channel_announce still holds the previous announcement (the new one is written after the check_replace_previous_entry call at utxo.rs:475), so it correctly compares the incoming message against the existing pending one.
  • The else branch's unsafe_well_ordered_double_lock_self only runs for a genuinely different mutex, preserving the new→old lock order.
  • resolve_single_future runs only under the internal lock held throughout, so no interleaving can lose updates.
  • The regression test is valid (shared SCID 0 and fixed bitcoin keys make good_script match the replacement).

One non-blocking observation (not a regression, and strictly better than the deadlock being fixed): with a reused future for the same SCID but a different announcement, there is a single channel_announce slot, so the second announcement overwrites the first and resolve_single_future (utxo.rs:538) processes only the survivor. The nearby comment at utxo.rs:367 ("both results will be handled") only holds when the two requests have distinct future states. This is acceptable for the same-SCID case since the funding output is fixed and only one set of bitcoin keys can match the on-chain script.

@tnull tnull moved this to Goal: Merge in Weekly Goals Jun 11, 2026
@tnull tnull force-pushed the 2026-06-utxo-same-future-deadlock branch from fa1de66 to 0f8072f Compare June 11, 2026 16:42
@tnull

tnull commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Updated to line-wrap commit messages.

@TheBlueMatt TheBlueMatt left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure anyone would ever implement it this way, but it seems the correct fix is to handle the duplicate with an Arc::ptr_eq check in check_replace_previous_entry, not unlock+relock.

@ldk-reviews-bot

Copy link
Copy Markdown

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

UtxoLookup implementations may cache and return the same async future
for repeated requests for a short channel id. When a replacement channel
announcement arrives while that future is in-flight, the pending-entry
comparison may point back to the future state already held by the async
path.

Detect that case with Arc::ptr_eq inside check_replace_previous_entry
and compare against the held messages instead of taking the mutex again.
This keeps duplicate-announcement filtering intact while letting
replacement announcements update the pending entry without re-entering
the lock.

Co-Authored-By: HAL 9000

This finding was discovered by Project Loupe
@tnull tnull force-pushed the 2026-06-utxo-same-future-deadlock branch from 0f8072f to dce31b7 Compare June 17, 2026 09:04
@tnull

tnull commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Not entirely sure anyone would ever implement it this way, but it seems the correct fix is to handle the duplicate with an Arc::ptr_eq check in check_replace_previous_entry, not unlock+relock.

Now updated with this approach. Let me know if you find that preferable, or if we should even consider not fixing this?

@tnull tnull requested a review from TheBlueMatt June 17, 2026 09:07

@TheBlueMatt TheBlueMatt left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@TheBlueMatt TheBlueMatt merged commit 00d9065 into lightningdevkit:main Jun 17, 2026
@github-project-automation github-project-automation Bot moved this from Goal: Merge to Done in Weekly Goals Jun 17, 2026
@TheBlueMatt

Copy link
Copy Markdown
Collaborator

IMO we should skip backporting this. I don't believe any of our implementations of the lookup interface will ever return the same future and implementing that interface by caching pending lookups and returning copies of the future seems like a lot of work that implementers aren't gonna do. LMK if you disagree @tnull

@tnull

tnull commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

IMO we should skip backporting this. I don't believe any of our implementations of the lookup interface will ever return the same future and implementing that interface by caching pending lookups and returning copies of the future seems like a lot of work that implementers aren't gonna do. LMK if you disagree @tnull

Fine by me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants