Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions Doc/library/subprocess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,14 @@ compatibility with older versions, see the :ref:`call-function-trio` section.
The *universal_newlines* argument is equivalent to *text* and is provided
for backwards compatibility. By default, file objects are opened in binary mode.

If the current process encounters an exception (e.g. :exc:`KeyboardInterrupt`)
while waiting for the child process and *cleanup_timeout* is non-zero,
the child process is given additional time to finish before it is killed,
to allow potential clean-up operations in the child to complete.
During this time, a second exception will kill the child immediately.
If *cleanup_timeout* is not specified, then any remaining time from the original
*timeout* is used, or 1 second if no original *timeout* was specified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too complicated.


Examples::

>>> subprocess.run(["ls", "-l"]) # doesn't capture output
Expand All @@ -97,9 +105,11 @@ compatibility with older versions, see the :ref:`call-function-trio` section.
Added *encoding* and *errors* parameters

.. versionchanged:: 3.7

Added the *text* parameter, as a more understandable alias of *universal_newlines*

.. versionchanged:: 3.7
*cleanup_timeout* was added.

.. class:: CompletedProcess

The return value from :func:`run`, representing a process that has finished.
Expand Down Expand Up @@ -860,7 +870,7 @@ Prior to Python 3.5, these three functions comprised the high level API to
subprocess. You can now use :func:`run` in many cases, but lots of existing code
calls these functions.

.. function:: call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None)
.. function:: call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None, cleanup_timeout=None)

Run the command described by *args*. Wait for command to complete, then
return the :attr:`~Popen.returncode` attribute.
Expand All @@ -886,6 +896,9 @@ calls these functions.
.. versionchanged:: 3.3
*timeout* was added.

.. versionchanged:: 3.7
*cleanup_timeout* was added.

.. function:: check_call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None)

Run command with arguments. Wait for command to complete. If the return
Expand Down
89 changes: 73 additions & 16 deletions Lib/subprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,22 +255,23 @@ def _args_from_interpreter_flags():
return args


def call(*popenargs, timeout=None, **kwargs):
def call(*popenargs, timeout=None, cleanup_timeout=None, **kwargs):
"""Run command with arguments. Wait for command to complete or
timeout, then return the returncode attribute.

The arguments are the same as for the Popen constructor. Example:
This is equivalent to:

retcode = call(["ls", "-l"])
"""
with Popen(*popenargs, **kwargs) as p:
try:
return p.wait(timeout=timeout)
except:
p.kill()
p.wait()
raise
run(...).returncode

(except that the input and check parameters are not supported).

See run() for argument details.

Example:
retcode = call(["ls", "-l"])
"""
return run(*popenargs, timeout=timeout, cleanup_timeout=cleanup_timeout,
**kwargs).returncode

def check_call(*popenargs, **kwargs):
"""Run command with arguments. Wait for command to complete. If
Expand Down Expand Up @@ -371,7 +372,7 @@ def check_returncode(self):
self.stderr)


def run(*popenargs, input=None, timeout=None, check=False, **kwargs):
def run(*popenargs, input=None, timeout=None, check=False, cleanup_timeout=None, **kwargs):
"""Run command with arguments and return a CompletedProcess instance.

The returned instance will have attributes args, returncode, stdout and
Expand All @@ -386,6 +387,17 @@ def run(*popenargs, input=None, timeout=None, check=False, **kwargs):
If timeout is given, and the process takes too long, a TimeoutExpired
exception will be raised.

If the current process encounters any exception (e.g. KeyboardInterrupt) and
cleanup_timeout is non-zero, the child process is given some additional
time to finish before it is killed, to allow potential clean-up
operations in the child to complete. During this time, a second exception
will kill the child immediately.

If cleanup_timeout is not specified, then a default value is chosen:
If timeout was provided, then any remaining time from the original timeout
is used. Otherwise, if the exception is a KeyboardInterrupt, the child is
given 1 second to clean up. Otherwise, the child is killed immediately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cleanup timeout is not specified, there should be no cleanup timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cleanup timeout is not specified, there should be no cleanup timeout.

I assume you chose that phrasing to be pointedly simple, but, unfortunately, I think your phrasing is not as simple as it appears. In the subprocess API and docs, "no timeout" (i.e. timeout=None) actually means "timeout=infinity". That's probably not what you mean.

I think you meant to propose: "If cleanup timeout is not specified, there should be a cleanup timeout of 0.0".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stuarteberg I agree that "no cleanup timeout" is not clear enough, but it is not the same as cleanup timeout of 0 seconds.

How about:

If kill_timeout is not specified, the child process will be killed immediately when the timeout expire. If specified, the child process will be terminated gracefully using SIGTERM. If the process did not terminate within kill_timeout seconds, it will be killed.

This should be modified if graceful termination should be the default behavior. I'm not sure what should be the default, but this can be left for another PR.


There is an optional argument "input", allowing you to
pass bytes or a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument, as
Expand All @@ -404,6 +416,8 @@ def run(*popenargs, input=None, timeout=None, check=False, **kwargs):
raise ValueError('stdin and input arguments may not both be used.')
kwargs['stdin'] = PIPE

start_time = _time()

with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
Expand All @@ -412,10 +426,53 @@ def run(*popenargs, input=None, timeout=None, check=False, **kwargs):
stdout, stderr = process.communicate()
raise TimeoutExpired(process.args, timeout, output=stdout,
stderr=stderr)
except:
process.kill()
process.wait()
raise
except BaseException as ex:
try:
if cleanup_timeout is None:
# Choose a reasonable default
if timeout:
remaining_timeout = timeout - (_time() - start_time)
cleanup_timeout = max(0.0, remaining_timeout)
elif isinstance(ex, KeyboardInterrupt):
# KeyboardInterrupts are:
# 1. Typically automatically triggered in the child, too.
# Therefore, the child is likely to be already cleaning up
# (if we give it the chance).
# 2. Generated interactively by humans at a keyboard.
# If the child is well-written (exits in response to SIGINT),
# a 1 second timeout is usually enough time for it to clean up.
# Othwerwise, 1 second seems like a tolerable delay in response to Ctrl+C,
# and a resonable price to pay for correct behavior in the well-written case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot assume that a child will get SIGINT, this is true only for a foreground process in the shell. The parent process can be a daemon and nobody will generate SIGINT for its children.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Disclaimer: I cannot claim to be an expert in POSIX conventions, so forgive me if you have expertise in this area and know the following to be incorrect.)

If I understand correctly, the primary distinction betweenSIGTERM and SIGINT is that the latter is specifically intended to be used in interactive terminal programs. For instance, that's why a POSIX-compliant shell automatically forwards SIGINT to all processes in the foreground group. (The same goes for the windows equivalent, CTRL_C_EVENT.) Of course, that's also why Python chose to name it's SIGINT-triggered exception KeyboardInterrupt.

It is reasonable to assume that receiving aSIGINT is a strong clue to the program that it is running in the foreground process group, and therefore its child is, too.

IMHO, programmers who would send a SIGINT programmatically shouldn't be surprised by slightly odd behavior. But the good news is that the "odd behavior" in this case is quite benign: it's merely a 1 second delay.

On the other hand, a default cleanup timeout of 0.0 would mean that the child is killed prematurely, and the user experiences incorrect/incomplete cleanup behavior when hitting Ctrl+C. That's the situation Python 3 users are in today.

But what about a default cleanup delay of 1.0 second? If the child responds to SIGINT then the user experiences no extra delay. But for those (very few) children that don't actually respond to SIGINT, the consequences are relatively minor. The user would merely have to do one of the following:

  1. Hit Ctrl+C again, OR
  2. Do nothing... for 1 second.

The idea with this PR is to help Python programmers write code that "just works" in the default case. After all, the run() function is a convenience API. It's meant to help avoid boilerplate and spare the user from the details of the lower-level Popen API, if they don't want to think about those details. But right now, Python does the wrong thing by default, which is very un-Pythonic, if you ask me. :-P

(BTW, in Python 2, the behavior wasn't perfect, but it happened to be better-behaved in this one instance.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot assume that a child will get SIGINT, this is true only for a foreground process in the shell.

Question: After thinking about your comment some more, I'm now wondering: What do you think about explicitly sending SIGINT to the child, assuming we're handling a KeyboardInterrupt? (For other exceptions, we would send SIGTERM, as you suggested below.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should TERM (or, worse, KILL) it, not INTerrupt it.

Firstly, it is reasonable to assume the child already got the INT. More importantly, a TERM better fits the longstanding semantic conventions of signal use here.

While it's true that a SIGINT could be generated by some circumstance other than a terminal driver catching Ctrl-C and signaling the whole foreground process group (not individual PIDs), in practice that would be an exceedingly rare violation of longstanding convention. POSIX refers to SIGINT as the "terminal interrupt signal", for example. Indeed, the fact that python by default turns a SIGINT into a KeyboardInterrupt tells you something about its expected use.

I don't see benefit in second-guessing system behavior when an INT rolls in. If we're going to terminate the child, TERM the child — and please let devs like me suppress that behavior entirely. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command lines tools are used both interactively and run by other programs. They are expected to handle both SIGINT and SIGTERM, and typically both are handled in the same way.

We can assume that a child process is handling both signals. If you run a program that needs special treatment, use Popen directly instead of the high level easy to use call() or run().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are expected to handle both SIGINT and SIGTERM, and typically both are handled in the same way.

I completely agree that they should. But as Python programmers, it would be particularly odd for us to assume that they do. After all, tools written in Python itself will respond differently for SIGINT and SIGTERM -- unless the author was kind enough to register a handler for SIGTERM. Most authors do not bother.

For example, on a whim, I just checked a few popular utilities that are implemented in Python. Frankly, these results shocked me:

  • httpie handlesSIGINT gracefully, but not SIGTERM. (I expected better from a project project that has 32,518 stars on github!)
  • Another example seems to be the boto suite of AWS utilities. (I didn't verify via experiment, I can find no mention of SIGTERM in it's source code.)
  • Holy crap, you guys aren't gonna believe this: Even pip doesn't handle SIGTERM gracefully. Apparently, if you send it SIGTERM, it does no cleanup at all -- you just end up with a partially installed python package (garbage, in other words). I'm legitimately surprised!

Again, I completely agree regarding what the proper POSIX conventions ought to be. But now we have to decide: we can either be POSIX purists, or be considerate to the typical user. I can see arguments in favor of either, but we should acknowledge the trade-off we're making.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subprocess module is not the place to fix bad signal handling in other programs.

Maybe send a PR to httpie/pip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making a PR to those projects. But of course there will always be others with the same issue, and we can't fix all of them.

I'm just wondering if we have an opportunity to apply the Zen of Python maxim: "practicality beats purity". I'm honestly not sure. But doing the most "friendly" thing (if we can figure out what that is) seems spiritually compatible with a convenience API like run(), even if it's less pure -- or maybe even a little ugly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot fix the issue in subprocess. For example we can have a correct program that block SIGINT when running without a controlling terminal. The only way to terminate the program is SIGTERM. We must support such program by sending SIGTERM. If another program misbehaves by not handling SIGTERM, the program author should fix the program.

cleanup_timeout = 1.0
else:
# The above reasoning does not apply to arbitrary exceptions,
# so kill the child immediately.
cleanup_timeout = 0.0

if cleanup_timeout == 0.0:
raise # Skip cleanup

# Close streams to/from child
streams = (process.stdin, process.stdout, process.stderr)
for f in filter(None, streams):
try:
f.close()
except:
pass # Ignore EBADF or other errors.

try:
process.wait(timeout=cleanup_timeout)
except:
# Ignore cleanup errors;
# original exception is always re-raised below
pass
finally:
# Kill the child and re-raise original exception,
# regardless of successful/failed cleanup
process.kill()
process.wait()
raise ex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need much simple solution.

except:
    if kill_timeout is not None:
        process.terminate()
        try:
            process.wait(timeout=kill_timoeut)
         except:
             pass
    process.kill()
    process.wait()
    raise

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Something like kill_timeout=None is the key here, letting the developer suppress the (in my view misguided) SIGKILL. That lets someone with my crusty POSIX sensibilities use an easy subprocess abstraction, and backport it to earlier python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the default timeout value, I do like this proposal to explicitly terminate() the child, rather than merely hoping it will just exit gracefully on its own after we close its stdin/stdout.

One open question is whether or not it is bad to send a SIGTERM to a process that has (probably) already received a SIGINT.

For example, if the child is a Python program, then the default behavior of SIGTERM is to end the process without any cleanup. If it has already received a SIGINT, then the process will likely end during cleanup.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the perspective of the child process, a TERM is at worst equivalent to a KILL, and possibly much more polite, since it can be caught. So, if we are committed to killing the child process, I'd encourage a TERM first.

(It's tempting to full-on generalize this and allow something like timeout_handler=some_func_accepting_the_process_as_arg or err_kill=( (0.5, SIGTERM), (0.5, SIGQUIT), (1.0, SIGKILL) ) Those args are just illustrative, but you get the idea.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd encourage a TERM first.

👍

It's tempting to full-on generalize this

Agree with the sentiment. I also completely agree with what you implied by using the word "tempting" (instead of, say, "justified" 😄 ). The run() API is meant for easy, general usage, with reasonable default behavior. If a programmer cares about nuanced error-handling, she should just use the Popen API.

retcode = process.poll()
if check and retcode:
raise CalledProcessError(retcode, process.args,
Expand Down
76 changes: 76 additions & 0 deletions Lib/test/test_subprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -1476,6 +1476,82 @@ def test_run_kwargs(self):
env=newenv)
self.assertEqual(cp.returncode, 33)

def test_run_interrupted_cleanup_succeeds(self):
# In the event of KeyboardInterrupt, run() should allow the
# child time to cleanup (if cleanup_timeout was given).
# In this case, child's cleanup completes before cleanup_timeout runs out.
self._test_run_interrupted(0.2, 1.0)

def test_run_interrupted_cleanup_timed_out(self):
# In the event of KeyboardInterrupt, run() should allow the
# child time to cleanup (if cleanup_timeout was given).
# In this case, child's cleanup takes too long for the cleanup_timout.
self._test_run_interrupted(3.0, 0.1)

@unittest.skipIf(mswindows, "cannot use 'ps' command windows")
@unittest.skipIf(threading is None, "threading required")
def _test_run_interrupted(self, cleanup_time, cleanup_timeout):
# In the event of KeyboardInterrupt, run() should allow the
# child time to cleanup (if cleanup_timeout was given).

parent_pid = os.getpid()
def generate_sigint():
# Send SIGINT to the current process and its child after a short delay
time.sleep(0.5)
children = self._find_children(parent_pid)
for (pid, command) in children:
if 'python' in command:
os.kill(pid, signal.SIGINT)
os.kill(parent_pid, signal.SIGINT)

interrupter_thread = threading.Thread(target=generate_sigint)

tf = tempfile.NamedTemporaryFile(delete=False)
tf.close()

try:
with self.assertRaises(KeyboardInterrupt):
interrupter_thread.start()
subprocess.run([sys.executable, "-c",
textwrap.dedent(f"""\
import time
try:
time.sleep(3.0)
except KeyboardInterrupt:
time.sleep({cleanup_time})
with open('{tf.name}', 'w') as f:
f.write('cleaned up')
""")],
timeout=None, cleanup_timeout=cleanup_timeout,
check=True)

with open(tf.name, 'r') as tf:
if cleanup_time < cleanup_timeout:
self.assertEqual(tf.read(), 'cleaned up')
else:
# child should have been killed before the cleanup executed
self.assertEqual(tf.read(), '')
finally:
os.unlink(tf.name)
interrupter_thread.join()

def _find_children(self, parent_pid):
"""Return a list of tuples (child_pid, command)
for all direct children of the given parent_id.
(Unix only.)
Note: Output may include the 'ps' command itself.
"""
children = []
ps_output = subprocess.check_output(['ps', '-o', 'pid,ppid,command'])
for line in ps_output.decode().strip().split('\n')[1:]:
words = line.split()
pid = int(words[0])
ppid = int(words[1])
command = ' '.join(words[2:])
if ppid == parent_pid:
children.append( (pid, command) )
return children


@unittest.skipIf(mswindows, "POSIX specific tests")
class POSIXProcessTestCase(BaseTestCase):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added new parameter, ``cleanup_timeout``, to ``subprocess.run()``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill_timeout may be more clear, the time we give child process to terminate before we kill it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ambivalent regarding the parameter name.