Skip to content

bpo-25942: make subprocess more graceful on ^C#5026

Merged
gpshead merged 9 commits intopython:masterfrom
gpshead:issue25942
Jan 30, 2018
Merged

bpo-25942: make subprocess more graceful on ^C#5026
gpshead merged 9 commits intopython:masterfrom
gpshead:issue25942

Conversation

@gpshead
Copy link
Member

@gpshead gpshead commented Dec 28, 2017

In Python 3.3-3.6 subprocess.call and subprocess.run would immediately
SIGKILL the child process upon receiving a SIGINT (which raises a
KeyboardInterrupt). We now give the child a small amount of time to
exit gracefully before resorting to a SIGKILL.

This is also the case for subprocess.Popen.__exit__ which would
previously block indefinitely waiting for the child to die. This was
hidden from many users by virtue of subprocess.call and subprocess.run
sending the signal immediately.

Behavior change: subprocess.Popen.__exit__ will not block indefinitely
when the exiting exception is a KeyboardInterrupt. This is done for
user friendliness as people expect their ^C to actually happen. This
could cause occasional orphaned Popen objects when not using call or
run with a child process that hasn't exited.

Internal refactoring: The Popen.wait method deals with the
KeyboardInterrupt. The existing platform specific internal
methods have been renamed to _wait() which it calls.

https://bugs.python.org/issue25942

bpo-25942: Do not allow receiving a SIGINT to cause the subprocess
module to trigger a SIGKILL of the child process.  SIGINT is normally
sent to the child process by the OS at the same time already as was the
established normal behavior in 2.7 and 3.2.  this behavior change was
introduced during the fix to https://bugs.python.org/issue12494 and is
generally surprising to command line tool users who expect other tools
launched in child processes to get their own SIGINT and do their own
cleanup.
In Python 3.3-3.6 subprocess.call and subprocess.run would immediately
SIGKILL the child process upon receiving a SIGINT (which raises a
KeyboardInterrupt).  We now give the child a small amount of time to
exit gracefully before resorting to a SIGKILL.

This is also the case for subprocess.Popen.__exit__ which would
previously block indefinitely waiting for the child to die.  This was
hidden from many users by virtue of subprocess.call and subprocess.run
sending the signal immediately.

Behavior change: subprocess.Popen.__exit__ will not block indefinitely
when the exiting exception is a KeyboardInterrupt.  This is done for
user friendliness as people expect their ^C to actually happen.  This
could cause occasional orphaned Popen objects when not using `call` or
`run` with a child process that hasn't exited.

Refactoring involved: The Popen.wait method deals with the
KeyboardInterrupt second chance, existing platform specific internals
have been renamed to _wait().
# How long to resume waiting on a child after the first ^C.
# There is no right value for this. The purpose is to be polite
# yet remain good for interactive users trying to exit a tool.
self._sigint_wait_secs = 0.25 # 1/xkcd221.getRandomNumber()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# If we have not already waited a brief amount of time in
# an interrupted .wait() or .communicate() call, do so here
# for consistency.
if self._sigint_wait_secs > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this code is repeated is several places (wait, communicate), how about factoring it out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit and the Popen instance have no knowledge of timeouts or whether the child process should be killed and in what circumstances. I didn't want to move that into the Popen instance.

If I do, it would be internal-only APIs. I don't want to complicate the public Popen API surface further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simply have a self._handle_keyboard_interrupt(timeout) helper doing the _sigint_wait_secs dance. You would also centralize the comment there.

# exit under the common assumption that it also received the ^C
# generated SIGINT and will exit rapidly. A second ^C will
# abort this immediately. Matching user patterns of hitting
# ^C more than once before resorting to ^\ (SIGKILL).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very clear to me. You should mention where the SIGKILL is emitted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A second ^C will interrupt this exception handler and not be caught by anything.

SIGKILL comes from the user hitting ^\ when they are impatient at ^C not having terminated the command line tool they were trying to exit.

make sense? I could just get rid of everything after the first sentence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I see. The "resorting to SIGKILL" sounded like it was done by subprocess itself (especially as p.kill is called elsewhere).

except: # Including KeyboardInterrupt, wait handled that.
p.kill()
p.wait()
# We don't call p.wait() as p.__exit__ does that for us.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you're calling p.kill before p.__exit__ has a chance of calling wait(0.25)... am I missing something?

(same comment for run() below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The p.wait(timeout=timeout) above where this exception was caught from already did a short wait upon receiving KeyboardInterrupt.

There is always a race condition here though (even before this change):

Popen launches the process in its constructor. KeyboardInterrupt can occur while that is happening, even before it has set self._child_created and exited the constructor. Or before enter is called, or during enter, and within the context manager during all code up and until the try: within wait() itself. That is impossible to handle and falls back to destructor del or exit based cleanup with no explicit kill() ever happening at all.

The same is just as true within run() (which uses communicate(), which similarly handle KeyboardInterrupt).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right, sorry... Perhaps spell that "We don't call p.wait() again", then? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants