bpo-25942: make subprocess more graceful on ^C#5026
Conversation
bpo-25942: Do not allow receiving a SIGINT to cause the subprocess module to trigger a SIGKILL of the child process. SIGINT is normally sent to the child process by the OS at the same time already as was the established normal behavior in 2.7 and 3.2. this behavior change was introduced during the fix to https://bugs.python.org/issue12494 and is generally surprising to command line tool users who expect other tools launched in child processes to get their own SIGINT and do their own cleanup.
In Python 3.3-3.6 subprocess.call and subprocess.run would immediately SIGKILL the child process upon receiving a SIGINT (which raises a KeyboardInterrupt). We now give the child a small amount of time to exit gracefully before resorting to a SIGKILL. This is also the case for subprocess.Popen.__exit__ which would previously block indefinitely waiting for the child to die. This was hidden from many users by virtue of subprocess.call and subprocess.run sending the signal immediately. Behavior change: subprocess.Popen.__exit__ will not block indefinitely when the exiting exception is a KeyboardInterrupt. This is done for user friendliness as people expect their ^C to actually happen. This could cause occasional orphaned Popen objects when not using `call` or `run` with a child process that hasn't exited. Refactoring involved: The Popen.wait method deals with the KeyboardInterrupt second chance, existing platform specific internals have been renamed to _wait().
| # How long to resume waiting on a child after the first ^C. | ||
| # There is no right value for this. The purpose is to be polite | ||
| # yet remain good for interactive users trying to exit a tool. | ||
| self._sigint_wait_secs = 0.25 # 1/xkcd221.getRandomNumber() |
There was a problem hiding this comment.
Original work at https://www.random.org/analysis/dilbert.jpg :-)
| # If we have not already waited a brief amount of time in | ||
| # an interrupted .wait() or .communicate() call, do so here | ||
| # for consistency. | ||
| if self._sigint_wait_secs > 0: |
There was a problem hiding this comment.
Since this code is repeated is several places (wait, communicate), how about factoring it out?
There was a problem hiding this comment.
exit and the Popen instance have no knowledge of timeouts or whether the child process should be killed and in what circumstances. I didn't want to move that into the Popen instance.
If I do, it would be internal-only APIs. I don't want to complicate the public Popen API surface further.
There was a problem hiding this comment.
You could simply have a self._handle_keyboard_interrupt(timeout) helper doing the _sigint_wait_secs dance. You would also centralize the comment there.
Lib/subprocess.py
Outdated
| # exit under the common assumption that it also received the ^C | ||
| # generated SIGINT and will exit rapidly. A second ^C will | ||
| # abort this immediately. Matching user patterns of hitting | ||
| # ^C more than once before resorting to ^\ (SIGKILL). |
There was a problem hiding this comment.
This is not very clear to me. You should mention where the SIGKILL is emitted.
There was a problem hiding this comment.
A second ^C will interrupt this exception handler and not be caught by anything.
SIGKILL comes from the user hitting ^\ when they are impatient at ^C not having terminated the command line tool they were trying to exit.
make sense? I could just get rid of everything after the first sentence.
There was a problem hiding this comment.
Uh, I see. The "resorting to SIGKILL" sounded like it was done by subprocess itself (especially as p.kill is called elsewhere).
Lib/subprocess.py
Outdated
| except: # Including KeyboardInterrupt, wait handled that. | ||
| p.kill() | ||
| p.wait() | ||
| # We don't call p.wait() as p.__exit__ does that for us. |
There was a problem hiding this comment.
But you're calling p.kill before p.__exit__ has a chance of calling wait(0.25)... am I missing something?
(same comment for run() below)
There was a problem hiding this comment.
The p.wait(timeout=timeout) above where this exception was caught from already did a short wait upon receiving KeyboardInterrupt.
There is always a race condition here though (even before this change):
Popen launches the process in its constructor. KeyboardInterrupt can occur while that is happening, even before it has set self._child_created and exited the constructor. Or before enter is called, or during enter, and within the context manager during all code up and until the try: within wait() itself. That is impossible to handle and falls back to destructor del or exit based cleanup with no explicit kill() ever happening at all.
The same is just as true within run() (which uses communicate(), which similarly handle KeyboardInterrupt).
There was a problem hiding this comment.
Oops, you're right, sorry... Perhaps spell that "We don't call p.wait() again", then? :-)
In Python 3.3-3.6
subprocess.callandsubprocess.runwould immediatelySIGKILLthe child process upon receiving aSIGINT(which raises aKeyboardInterrupt). We now give the child a small amount of time toexit gracefully before resorting to a
SIGKILL.This is also the case for
subprocess.Popen.__exit__which wouldpreviously block indefinitely waiting for the child to die. This was
hidden from many users by virtue of
subprocess.callandsubprocess.runsending the signal immediately.
Behavior change:
subprocess.Popen.__exit__will not block indefinitelywhen the exiting exception is a KeyboardInterrupt. This is done for
user friendliness as people expect their ^C to actually happen. This
could cause occasional orphaned Popen objects when not using
callorrunwith a child process that hasn't exited.Internal refactoring: The Popen.wait method deals with the
KeyboardInterrupt. The existing platform specific internal
methods have been renamed to
_wait()which it calls.https://bugs.python.org/issue25942