Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssh: Specifically designate a master multiplex connection #5537

Merged
merged 1 commit into from
Oct 11, 2023

Conversation

KyleFromKitware
Copy link
Contributor

SSHTransfer.Shutdown() was attempting to first shut down the first connection it created (which happened to be the master connection), but this deadlocked because the master connection was waiting for the extra connections to shut down. Designate the first connection as the master connection, make sure that it truly is the master connection, and shut it down after shutting down all extra connections.

Issue: #5535

@KyleFromKitware KyleFromKitware requested a review from a team as a code owner October 4, 2023 14:58
@KyleFromKitware KyleFromKitware force-pushed the fix-ssh-multiplex-deadlock branch 7 times, most recently from 16fd24b to 418c0f4 Compare October 4, 2023 19:40
Copy link
Member

@bk2204 bk2204 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. This looks like it's going in the right direction, but I have a few questions.

ssh/ssh.go Show resolved Hide resolved
ssh/ssh.go Outdated Show resolved Hide resolved
@KyleFromKitware KyleFromKitware force-pushed the fix-ssh-multiplex-deadlock branch 4 times, most recently from 5e45821 to 6d3fbe9 Compare October 6, 2023 18:59
@KyleFromKitware
Copy link
Contributor Author

One of the macOS tests failed. It looks unrelated to my changes. My guess is it's a spurious failure. Could you please run it again?

@KyleFromKitware KyleFromKitware force-pushed the fix-ssh-multiplex-deadlock branch from 6d3fbe9 to ce9ac13 Compare October 9, 2023 14:29
SSHTransfer.Shutdown() was attempting to first shut down the first
connection it created (which happened to be the master connection),
but this deadlocked because the master connection was waiting for
the extra connections to shut down. Designate the first connection
as the master connection, make sure that it truly is the master
connection, and shut it down after shutting down all extra
connections.

Issue: git-lfs#5535
@KyleFromKitware KyleFromKitware force-pushed the fix-ssh-multiplex-deadlock branch from ce9ac13 to 448b0c4 Compare October 9, 2023 18:24
@bk2204 bk2204 merged commit 63d04e2 into git-lfs:main Oct 11, 2023
10 checks passed
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 22, 2024
In commit b44cbe4 of PR git-lfs#5136 the
"multiplex" argument was added to the FormatArgs() function in our
ssh/ssh.go source file.  Later, the "controlPath" argument was also
added, in commit 448b0c4 of PR git-lfs#5537.

Neither of these arguments is used in the function's body, though,
so we can remove them now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 22, 2024
In commit 448b0c4 of PR git-lfs#5537 our
lfs-ssh-echo test helper utility was updated to examine the value of
any provided ControlMaster command-line argument and simulate the action
of the OpenSSH program when it is passed a ControlMaster argument with
a value of either "yes" or "no".  (The other values OpenSSH supports
for its ControlMaster option are not accepted by our lfs-ssh-echo
utility at the moment.)

When a ControlMaster command-line argument is found, and has the value
"yes", the lfs-ssh-echo program attempts to create a temporary file at
the location given by the ControlPath command-line argument, which must
also be provided.  If the file already exists, or some other error
occurs while creating it, the program halts with a non-zero exit code.

If the ControlMaster argument's value is "no", and the ControlPath
argument is also defined, the program attempts to open the file at
the given path, and exits with a non-zero code if the file does not
already exist or some other error occurs.

This logic is designed to emulate the behaviour of OpenSSH, which
supports the use of ControlMaster and ControlPath arguments to multiplex
connections over a common control socket.  When the ControlMaster argument
is "yes", OpenSSH creates a socket, which may then be shared by other
invocations of OpenSSH by setting ControlMaster to "no" and passing
the socket's path in the ControlPath argument.

Our lfs-ssh-echo program, however, creates its temporary file with
no defined file permissions (i.e., a file mode argument of zero).
This results in a file which can not be opened by any other instances
of the lfs-ssh-echo program.  At the moment this does not cause any
problems with our test suite, because we have no tests which exercise
the SSH version of the Git LFS object transfer protocol with more
than a single object, so multiple SSH connections are not started.

We would like to create such tests, though, to help diagnose problems
such as those described in git-lfs#5880.  Therefore we change our lfs-ssh-echo
program to use the conventional file mode of 0666 when creating its
temporary file, which will allow subsequent invocations of the problem
with ControlMaster set to "no" to also open the same file.

In addition, we alter the way we handle errors when the file can not be
created or opened, so that we output distinct error messages in the
cases where the file already exists or does not exist, respectively,
as well as in the cases where some other type of error occurs.
Previously, we assumed any error must be due to a file either existing
or not existing, and did not report the actual error message, so problems
such as those caused by missing file permissions were obscured.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 23, 2024
In commit b44cbe4 of PR git-lfs#5136 the
"multiplex" argument was added to the FormatArgs() function in our
ssh/ssh.go source file.  Later, the "controlPath" argument was also
added, in commit 448b0c4 of PR git-lfs#5537.

Neither of these arguments is used in the function's body, though,
so we can remove them now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 23, 2024
In commit 448b0c4 of PR git-lfs#5537 our
lfs-ssh-echo test helper utility was updated to examine the value of
any provided ControlMaster command-line argument and simulate the action
of the OpenSSH program when it is passed a ControlMaster argument with
a value of either "yes" or "no".  (The other values OpenSSH supports
for its ControlMaster option are not accepted by our lfs-ssh-echo
utility at the moment.)

When a ControlMaster command-line argument is found, and has the value
"yes", the lfs-ssh-echo program attempts to create a temporary file at
the location given by the ControlPath command-line argument, which must
also be provided.  If the file already exists, or some other error
occurs while creating it, the program halts with a non-zero exit code.

If the ControlMaster argument's value is "no", and the ControlPath
argument is also defined, the program attempts to open the file at
the given path, and exits with a non-zero code if the file does not
already exist or some other error occurs.

This logic is designed to emulate the behaviour of OpenSSH, which
supports the use of ControlMaster and ControlPath arguments to multiplex
connections over a common control socket.  When the ControlMaster argument
is "yes", OpenSSH creates a socket, which may then be shared by other
invocations of OpenSSH by setting ControlMaster to "no" and passing
the socket's path in the ControlPath argument.

Our lfs-ssh-echo program, however, creates its temporary file with
no defined file permissions (i.e., a file mode argument of zero).
This results in a file which can not be opened by any other instances
of the lfs-ssh-echo program.  At the moment this does not cause any
problems with our test suite, because we have no tests which exercise
the SSH version of the Git LFS object transfer protocol with more
than a single object, so multiple SSH connections are not started.

We would like to create such tests, though, to help diagnose problems
such as those described in git-lfs#5880.  Therefore we change our lfs-ssh-echo
program to use the conventional file mode of 0666 when creating its
temporary file, which will allow subsequent invocations of the problem
with ControlMaster set to "no" to also open the same file.

In addition, we alter the way we handle errors when the file can not be
created or opened, so that we output distinct error messages in the
cases where the file already exists or does not exist, respectively,
as well as in the cases where some other type of error occurs.
Previously, we assumed any error must be due to a file either existing
or not existing, and did not report the actual error message, so problems
such as those caused by missing file permissions were obscured.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 29, 2024
In commit b44cbe4 of PR git-lfs#5136 the
"multiplex" argument was added to the FormatArgs() function in our
ssh/ssh.go source file.  Later, the "controlPath" argument was also
added, in commit 448b0c4 of PR git-lfs#5537.

Neither of these arguments is used in the function's body, though,
so we can remove them now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 29, 2024
In commit 448b0c4 of PR git-lfs#5537 our
lfs-ssh-echo test helper utility was updated to examine the value of
any provided ControlMaster command-line argument and simulate the action
of the OpenSSH program when it is passed a ControlMaster argument with
a value of either "yes" or "no".  (The other values OpenSSH supports
for its ControlMaster option are not accepted by our lfs-ssh-echo
utility at the moment.)

When a ControlMaster command-line argument is found, and has the value
"yes", the lfs-ssh-echo program attempts to create a temporary file at
the location given by the ControlPath command-line argument, which must
also be provided.  If the file already exists, or some other error
occurs while creating it, the program halts with a non-zero exit code.

If the ControlMaster argument's value is "no", and the ControlPath
argument is also defined, the program attempts to open the file at
the given path, and exits with a non-zero code if the file does not
already exist or some other error occurs.

This logic is designed to emulate the behaviour of OpenSSH, which
supports the use of ControlMaster and ControlPath arguments to multiplex
connections over a common control socket.  When the ControlMaster argument
is "yes", OpenSSH creates a socket, which may then be shared by other
invocations of OpenSSH by setting ControlMaster to "no" and passing
the socket's path in the ControlPath argument.

Our lfs-ssh-echo program, however, creates its temporary file with
no defined file permissions (i.e., a file mode argument of zero).
This results in a file which can not be opened by any other instances
of the lfs-ssh-echo program.  At the moment this does not cause any
problems with our test suite, because we have no tests which exercise
the SSH version of the Git LFS object transfer protocol with more
than a single object, so multiple SSH connections are not started.

We would like to create such tests, though, to help diagnose problems
such as those described in git-lfs#5880.  Therefore we change our lfs-ssh-echo
program to use the conventional file mode of 0666 when creating its
temporary file, which will allow subsequent invocations of the problem
with ControlMaster set to "no" to also open the same file.

In addition, we alter the way we handle errors when the file can not be
created or opened, so that we output distinct error messages in the
cases where the file already exists or does not exist, respectively,
as well as in the cases where some other type of error occurs.
Previously, we assumed any error must be due to a file either existing
or not existing, and did not report the actual error message, so problems
such as those caused by missing file permissions were obscured.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 1, 2024
In commit b44cbe4 of PR git-lfs#5136 the
"multiplex" argument was added to the FormatArgs() function in our
ssh/ssh.go source file.  Later, the "controlPath" argument was also
added, in commit 448b0c4 of PR git-lfs#5537.

Neither of these arguments is used in the function's body, though,
so we can remove them now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 1, 2024
In commit 448b0c4 of PR git-lfs#5537 our
lfs-ssh-echo test helper utility was updated to examine the value of
any provided ControlMaster command-line argument and simulate the action
of the OpenSSH program when it is passed a ControlMaster argument with
a value of either "yes" or "no".  (The other values OpenSSH supports
for its ControlMaster option are not accepted by our lfs-ssh-echo
utility at the moment.)

When a ControlMaster command-line argument is found, and has the value
"yes", the lfs-ssh-echo program attempts to create a temporary file at
the location given by the ControlPath command-line argument, which must
also be provided.  If the file already exists, or some other error
occurs while creating it, the program halts with a non-zero exit code.

If the ControlMaster argument's value is "no", and the ControlPath
argument is also defined, the program attempts to open the file at
the given path, and exits with a non-zero code if the file does not
already exist or some other error occurs.

This logic is designed to emulate the behaviour of OpenSSH, which
supports the use of ControlMaster and ControlPath arguments to multiplex
connections over a common control socket.  When the ControlMaster argument
is "yes", OpenSSH creates a socket, which may then be shared by other
invocations of OpenSSH by setting ControlMaster to "no" and passing
the socket's path in the ControlPath argument.

Our lfs-ssh-echo program, however, creates its temporary file with
no defined file permissions (i.e., a file mode argument of zero).
This results in a file which can not be opened by any other instances
of the lfs-ssh-echo program.  At the moment this does not cause any
problems with our test suite, because we have no tests which exercise
the SSH version of the Git LFS object transfer protocol with more
than a single object, so multiple SSH connections are not started.

We would like to create such tests, though, to help diagnose problems
such as those described in git-lfs#5880.  Therefore we change our lfs-ssh-echo
program to use the conventional file mode of 0666 when creating its
temporary file, which will allow subsequent invocations of the problem
with ControlMaster set to "no" to also open the same file.

In addition, we alter the way we handle errors when the file can not be
created or opened, so that we output distinct error messages in the
cases where the file already exists or does not exist, respectively,
as well as in the cases where some other type of error occurs.
Previously, we assumed any error must be due to a file either existing
or not existing, and did not report the actual error message, so problems
such as those caused by missing file permissions were obscured.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 2, 2024
In commit 448b0c4 of PR git-lfs#5537 our
lfs-ssh-echo test helper utility was updated to examine the value of
any provided ControlMaster command-line argument and simulate the action
of the OpenSSH program when it is passed a ControlMaster argument with
a value of either "yes" or "no".  (The other values OpenSSH supports
for its this option are not accepted by our lfs-ssh-echo utility at
the moment.)

When a ControlMaster command-line argument is found, and has the value
"yes", the lfs-ssh-echo program attempts to create a temporary file at
the location given by the ControlPath command-line argument, which must
also be provided.  If the file already exists, or some other error
occurs while creating it, the program halts with a non-zero exit code.

If the ControlMaster argument's value is "no", and the ControlPath
argument is also defined, the program attempts to open the file at
the given path, and exits with a non-zero code if the file does not
already exist or some other error occurs.

This logic is designed to emulate the behaviour of OpenSSH, which
supports the use of these arguments to multiplex connections over a
common control socket.  When the ControlMaster argument is "yes",
OpenSSH creates a socket, which may then be shared by other invocations
of OpenSSH by setting the argument to "no" and passing the socket's
path in the ControlPath argument.

Our lfs-ssh-echo program, however, creates its temporary file with
no defined file permissions (i.e., a file mode argument of zero).
This results in a file which can not be opened by any other instances
of the lfs-ssh-echo program.  At the moment this does not cause any
problems with our test suite, because we have no tests which exercise
the SSH version of the Git LFS object transfer protocol with more
than a single object, so multiple SSH connections are not started.

We would like to create such tests, though, to help diagnose problems
such as those described in git-lfs#5880.  Therefore we change our lfs-ssh-echo
program to use the conventional file mode of 0666 when creating its
temporary file, which will allow subsequent invocations of the problem
with ControlMaster set to "no" to also open the same file.

In addition, we alter the way we handle errors when the file can not be
created or opened, so that we output distinct error messages in the
cases where the file already exists or does not exist, respectively,
as well as in the cases where some other type of error occurs.
Previously, we assumed any error must be due to a file either existing
or not existing, and did not report the actual error message, so problems
such as those caused by missing file permissions were obscured.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 3, 2024
In commit b44cbe4 of PR git-lfs#5136 the
"multiplex" argument was added to the FormatArgs() function in our
ssh/ssh.go source file.  Later, the "controlPath" argument was also
added, in commit 448b0c4 of PR git-lfs#5537.

Neither of these arguments is used in the function's body, though,
so we can remove them now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 3, 2024
In commit 448b0c4 of PR git-lfs#5537 our
lfs-ssh-echo test helper utility was updated to examine the value of
any provided ControlMaster command-line argument and simulate the action
of the OpenSSH program when it is passed a ControlMaster argument with
a value of either "yes" or "no".  (The other values OpenSSH supports
for its this option are not accepted by our lfs-ssh-echo utility at
the moment.)

When a ControlMaster command-line argument is found, and has the value
"yes", the lfs-ssh-echo program attempts to create a temporary file at
the location given by the ControlPath command-line argument, which must
also be provided.  If the file already exists, or some other error
occurs while creating it, the program halts with a non-zero exit code.

If the ControlMaster argument's value is "no", and the ControlPath
argument is also defined, the program attempts to open the file at
the given path, and exits with a non-zero code if the file does not
already exist or some other error occurs.

This logic is designed to emulate the behaviour of OpenSSH, which
supports the use of these arguments to multiplex connections over a
common control socket.  When the ControlMaster argument is "yes",
OpenSSH creates a socket, which may then be shared by other invocations
of OpenSSH by setting the argument to "no" and passing the socket's
path in the ControlPath argument.

Our lfs-ssh-echo program, however, creates its temporary file with
no defined file permissions (i.e., a file mode argument of zero).
This results in a file which can not be opened by any other instances
of the lfs-ssh-echo program.  At the moment this does not cause any
problems with our test suite, because we have no tests which exercise
the SSH version of the Git LFS object transfer protocol with more
than a single object, so multiple SSH connections are not started.

We would like to create such tests, though, to help diagnose problems
such as those described in git-lfs#5880.  Therefore we change our lfs-ssh-echo
program to use the conventional file mode of 0666 when creating its
temporary file, which will allow subsequent invocations of the problem
with ControlMaster set to "no" to also open the same file.

In addition, we alter the way we handle errors when the file can not be
created or opened, so that we output distinct error messages in the
cases where the file already exists or does not exist, respectively,
as well as in the cases where some other type of error occurs.
Previously, we assumed any error must be due to a file either existing
or not existing, and did not report the actual error message, so problems
such as those caused by missing file permissions were obscured.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 4, 2024
In commit 448b0c4 of PR git-lfs#5537 our
lfs-ssh-echo test helper utility was updated to examine the value of
any provided ControlMaster command-line argument and simulate the action
of the OpenSSH program when it is passed a ControlMaster argument with
a value of either "yes" or "no".  (The other values OpenSSH supports
for this option are not accepted by our lfs-ssh-echo utility at the
moment.)

When a ControlMaster command-line argument is found, and has the value
"yes", the lfs-ssh-echo program attempts to create a temporary file at
the location given by the ControlPath command-line argument, which must
also be provided.  If the file already exists, or some other error
occurs while creating it, the program halts with a non-zero exit code.

If the ControlMaster argument's value is "no", and the ControlPath
argument is also defined, the program attempts to open the file at
the given path, and exits with a non-zero code if the file does not
already exist or some other error occurs.

This logic is designed to emulate the behaviour of OpenSSH, which
supports the use of these arguments to multiplex SSH sessions over
a common connection using a control socket.  When the ControlMaster
argument is "yes", OpenSSH creates a socket, which may then be shared
with other invocations of OpenSSH by setting the ControlMaster argument
to "no" and passing the socket's path in the ControlPath argument.

Our lfs-ssh-echo program, however, creates its temporary file with
no defined file permissions (i.e., with a file mode argument of zero).
This results in a file which can not be opened by any other instances
of the lfs-ssh-echo program.  At the moment this does not cause any
problems with our test suite, because we have no tests which exercise
the SSH version of the Git LFS object transfer protocol with more
than a single object, so multiple SSH sessions with the same
ControlPath argument are never started.

We would like to create additional tests which push and fetch multiple
Git LFS objects over SSH, though, to help diagnose issues such as
those described in git-lfs#5880.  Therefore we change our lfs-ssh-echo
program to use the conventional file mode of 0666 when creating its
temporary file, which will allow subsequent invocations of the program
with the ControlMaster argument set to "no" to also open the same file.

As well as fixing this bug, we also alter the way the lfs-ssh-echo
program handles errors when the file specified by the ControlPath
argument can not be created or opened.  We now output distinct error
messages in the cases where the file already exists or does not exist,
respectively, as well as in the cases where some other type of error
occurs.  Previously, we assumed any error must be due to a file either
existing or not existing, and did not report the actual error message,
so problems such as those caused by missing file permissions were
obscured.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 4, 2024
In commit b44cbe4 of PR git-lfs#5136 the
"multiplex" argument was added to the FormatArgs() function in our
ssh/ssh.go source file.  Later, the "controlPath" argument was also
added, in commit 448b0c4 of PR git-lfs#5537.

Neither of these arguments is used in the function's body, though,
so we can remove them now.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 6, 2024
Since commit 448b0c4 in PR git-lfs#5537
the GetExeAndArgs() function in our "ssh" package sets the default
value of our "lfs.ssh.autoMultiplex" configuration option to "false"
when running on Windows, and "true" otherwise.

This choice was made because the SSH clients available on Windows
may not support multiplexing SSH sessions over a single connection,
as OpenSSH does with its ControlMaster and ControlPath options.

Since some of these SSH clients may fail if they are passed the
ControlMaster and ControlPath options, we require Windows users who
want to use SSH multiplexing to explicitly enable it by setting the
"lfs.ssh.autoMultiplex" option to "true".  See also the discussion in:

  git-lfs#5537 (comment)

However, our git-lfs-config(5) manual page was not updated in PR git-lfs#5537
to reflect the change in the default value of the "lfs.ssh.autoMultiplex"
option on Windows, so we update it now.

Note that users with the Git for Windows project installed will
typically have a version of OpenSSH available which supports the
ControlMaster option.  However, the OpenSSH for Windows client may
not support multiplexing, as noted in PowerShell/Win32-OpenSSH#1328.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants