Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Interruptible #417

Merged
merged 4 commits into from
Dec 4, 2024
Merged

Reimplement Interruptible #417

merged 4 commits into from
Dec 4, 2024

Conversation

hms
Copy link
Contributor

@hms hms commented Nov 18, 2024

Details for each of the commits are in the separate commit messages

  • Some minor housekeeping with .gitignore, .rubocop.ymlt and solid_queue.gemspec
  • Proposed update to the tests to silence successful tests from outputting backtraces (keep the test output clean where possible and only output for the unexpected
  • Proposed reimplementation of the Interruptible concern to help mitigate some mysql database connectivity issues.

Copy link
Member

@rosa rosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR! I need a little bit of time to review it and I'm on-call this week, so I hope to get to it next week 🙏

@rosa
Copy link
Member

rosa commented Dec 2, 2024

Thanks again, @hms! Would you mind elaborating a bit more on:

Proposed reimplementation of the Interruptible concern to help mitigate some mysql database connectivity issues.

What where the connectivity issues you found? Was this only in tests, or did you observe these in a production environment?

@hms
Copy link
Contributor Author

hms commented Dec 2, 2024

@rosa

Hi Rosa,

I don't run MySQL in production. The issues I've come across are strictly in a testing environment. What I get are random disconnections (and miles long backtraces) at the driver level that are frequently paired with broken pipe errors. Extra rescues on the pipes side didn't help. It's always more database errors than pipe errors, leading me to assume the root cause was the database driver.

Since you folks are MySQL based and haven't seen any of these issues, I'm sure this is some weird combination of Mac, my local builds via brew, and/or how I'm configured to build. I did some digging, and the mysql2 gem had a window of time with similar issues some years ago right after an undocumented MySQL upgrade on the Oracle side, so maybe I'm a cannery in the coal mine?

Relative to a justification for the change, I do think using a pure Ruby solution as long as it's 100% in compliance with the requirements is worth considering. My understanding for why the self-pipe is to avoid signal processing races that can be triggered by the anything using one of the signal aggregation APIs (database drivers) and signal traps. The Thread::Queue makes those guarantees with the same performance profile and a very simply to understand implementation.

hms

@rosa
Copy link
Member

rosa commented Dec 2, 2024

My understanding for why the self-pipe is to avoid signal processing races that can be triggered by the anything using one of the signal aggregation APIs (database drivers) and signal traps.

Yes, exactly! Thanks for the extra justification; that's super helpful. I do like this implementation much more, so I'm going to run this in production for a couple of days and see if we run into anything, and then I'll merge afterwards. Thank you!

@rosa
Copy link
Member

rosa commented Dec 2, 2024

Oops, would you mind rebasing against the last main and resolving the conflicts? Sorry!

@hms
Copy link
Contributor Author

hms commented Dec 2, 2024

No problem. But I won't be able to do for a couple of hours. Just about to step out.

hms

@rosa
Copy link
Member

rosa commented Dec 2, 2024

Oh, of course, no rush at all! 🙏

hms added 3 commits December 2, 2024 09:04
* Adds to RVM .ruby-gemset to .gitignore for RVM and gemset holdouts.
* Updates Gemfile.lock to include arm64-darwin-24 (weird that it's
  missing)
* Adds 4 development dependencies to quiet deprecation warnings
* Updates rubocop to allow Ruby 3.3 syntax, matching .ruby-version
SolidQueue has excellent built-in error reporting. While this is
fantastic for SQ users, it is less than ideal for testing
SolidQueue because any test that deliberately uses or triggers an
exception produces voluminous error reporting.

This error reporting is hugely valuable when the exception is
not expected, but distracting and of limited value for expected
use-cases, especially when the test confirms the correct outcomes
via assertions.

This commit adds:
* A generic test-specific Exception class: ExpectedTestError
  This allows testing for specific exceptions while retaining
  all error reporting infrastructure for unexpected exceptions.

* Two helper methods for silencing on_thread_error output
  These methods accept an Exception or Array(Exception)
  and simply does not call the output mechanism if the
  exception passed to on_thread_error matches.
  This way, any unexpected error during test still
  reports in a highly visible manner while the exceptions
  being tested are validated via assertions.

* Replaces the stock on_thread_error with one that ignores
  ExpectedTextError.  Updated several tests from using
  the ruby stock RuntimeError to ExpectedTestError.

* Configures tests to run with YJIT enabled
  This is to test under likely production deployment configuration,
  not for performance reasons.

  Note: With the very recent reporting on M4's crashing on Mac's
  with YJIT enabled, we might want to either defer this change
  or add a conditional to opt in until the problem is resolved.
* Replaces a little Unix cleverness with a standard Ruby class.
  This pushes the responsibiity for meeting the SQ requirements
  from SQ to stock Ruby

* Delivers equivelent performance, identical API, and API behaviors
  with the original implementation (see note below on Futures)

* Mostly fixes a *platform / version dependent* issue with
  MySQL (see below)

* Meets 100% of SQ's functional requirements:

  * interruptible_sleep: a potentially blocking operation
    interruptible via either a "wake_event" (possibly requested
    prior to entering interruptible_sleep) or blocking until a
    timeout.

  * wake_up / interrupt: a Signal#trap and thread-safe method that
    does not require user-level synchronization (with the risk of
    not fully understanding all of the complexities required) code
    that either interrupts an inflight-interruptible_sleep or enqueues
    the event to processed in the invocation of interruptible_sleep

* Interruptible's API is trivially reproduceable via Thread::Queue
  * interruptible_sleep => Queue.pop(timeout:) where pushing anything
    into the queue acts as the interrupt event and timeout is reliable
    without any extra code or exception handling.

  * wake_up / interrupt => Queue.push(Anything) is thread, fiber, and
    Signal.trap safe (can be called from anywhere) and captures
    all wake_up events whenever requested, automaticall caching any
    "event" not processed by a currently executing interruptible_sleep
    matching existing functionality exactly.

Why the Future in #interruptible_sleep?

While Thread::Queue micro benchmarks as having the same performance on
the main thread Vs. any form of a sub-thread (or Fiber) and self-pipe,
when running the SQ test suite we see a 35% slow down Vs. the original
self-pipe implenentation.  One assumes this slowdown would manifest
in production. By moving the just the Queue#pop into a separate thread via
Concurrent::Promises.future we get +/- identical performance to the original
self-pipe implementation.

I'm assuming this root causes to Ruby main-thread only housekeeping and/or
possibly triggering a fast/slow path issue.

Why a Future Vs. Thread#new for each interruptible_sleep call?

Every other threaded operation in SQ is implemented using Concurrent
Ruby. Using a Future is for code and architectual consistency. There is
no difference in performance or functionality between the two.

MySQL *only* issues:

There seems to be a *platform specific* or *version specific* problem
with MySQL database connectivity and/or broken self-pipes leading to
randomly failing tests and a stream of distracting backtraces *even
with successful* tests.  Adding to the complexity sometimes, the lost
database connection can self-heal -- HOWEVER -- this takes time and given
how much of the test suite has time based assertions, leads to
additional random test failures.

These, or similar, issues have been observed in the past when changes to
the MySQL client library forced changes in the mysql2 gem.

With the Thread::Queue based implementation of the Interruptible concern,
the random failures and amount of spurious output are dramatically
improved (but not eliminated).
@hms
Copy link
Contributor Author

hms commented Dec 2, 2024

@rosa

Rebased.

@rosa
Copy link
Member

rosa commented Dec 3, 2024

This is working great so far, so I'm going to merge it. Thanks a lot @hms!

@rosa
Copy link
Member

rosa commented Dec 3, 2024

Oops, so sorry, conflicts again! It needs rebasing again 🙈 Also, I realised I missed a comment for a small change, that I'll add now.

solid_queue.gemspec Outdated Show resolved Hide resolved
* Streamlined a comment in the code
* Removed unneeded gemspec developer_dependencies that we get handled
  via the upgrade to Rails 7.1.4.1
* Added a gemspec version dependency required for keyword argument
  support
* Rebase to main
@hms
Copy link
Contributor Author

hms commented Dec 3, 2024

@rosa

Rebased.

Thank you for all of your feedback and help to get this finished.

@rosa rosa merged commit 4f29fc8 into rails:main Dec 4, 2024
4 checks passed
@rosa
Copy link
Member

rosa commented Dec 4, 2024

Thank you!

@hms hms deleted the interruptible branch December 18, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants