-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure plugin for client auth #43987
Azure plugin for client auth #43987
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Hi @cosmincojocar. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
cc @kubernetes/sig-auth-pr-reviews @colemickens |
"sync" | ||
"time" | ||
|
||
"github.com/Azure/go-autorest/autorest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a net/http wrapper. Kubernetes already has a ton of these in tree. Is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is required. This is the base of the azure sdk, which contains the authentication package which is used by this plugin.
|
||
* Replace `USER_NAME` with your user name. | ||
|
||
### Device Flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there some docs on what the technical details of this flow? That'd be very helpful when reviewing this PR. e.g. why a user would prefer this over the OpenID Connect client auth plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I couldn't find some reasonable documentation for this flow. The big difference between client credentials and device code flows is that the JWT token acquired with the former one contains user information. It is a user token not a client token. The claims form this token can be used later for role-based access control.
This flow is suited for CLI tools. e.g. Azure CLI makes use of it in order to acquire user tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be roughly this extension: https://tools.ietf.org/html/draft-ietf-oauth-device-flow-05
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, doesn't Google do something similar with a gcloud
specific plugin?
In Azure's case, the actual id_token
s produced by AADv1 (at least after refresh) are not signed and thus not usable for this scenario. Hence the use of the access_token presumably - but again, that's the same thing the Google plugin does when it reads the access token from the json response from gcloud config config-helper
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google's GCP plugin signs a JWT but the format is somewhat propitiatory to GCP. They use webhook auth to process the token on their end, they don't use the OpenID Connect plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I forgot I learned that a couple weeks ago...
``` | ||
kubeclt get pods | ||
|
||
To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code DEC7D48GA to authenticate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is DEC7D48GA
just an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fake code number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this prompt triggered by kubectl directly? What happens if it's not in an interactive session? I'd expect these kind of flows to be handled by az login
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two scenarios:
-
The user logs in with
az login
which stores the tokens in~/.azure/accessToken.json
file. The azure plugin will read the tokens from this file and refresh them if needed using the client_id also from the same file. -
If the plugin does not find any
az
file available. It will start the device flow by prompting to the use the device code retrieved from AAD and waiting for use to login. The user will login with AAD in a browser by providing the device code. Finally the plugin will receive the tokens directly from AAD.
Note that in both cases the tokens are stored in the kubernetes configuration. Next time when the kubectl is executed, it will read the token from kubernetes config first.
I signed it! |
This requires the user to auth every hour, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few typos + a question about using accessToken for AuthN
}, nil | ||
} | ||
|
||
func (ts *azureTokenSourceFromFile) isTeantAuthroity(authority string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Teant
|
||
func newAzureTokenSourceDeviceCode(clientID string, tenantID string, audience string) tokenSource { | ||
return &azureTokenSourceDeviceCode{ | ||
clinetID: clientID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: clinet
2. Configure the `apiserver` to use the Azure Active Directory as an OIDC provider with following options | ||
|
||
``` | ||
--oidc-client-id="https://management.core.windows.net/" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be parameterized as "APPLICATION_ID" in case the user chooses to use a different client_id below? Otherwise how does the token validation work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently the apiserver expects the client id to be the same like the audience. Actually apiserver which plays the role of a resource server does not need a valid client id. It only has to download the public keys of the identity provider in order to verify the signature of the token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does azure embed this as the audience? Isn't that claim supposed to be set as the client-id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The audience should be different than the client ID. The audience is typically a URI of a rest-endpoint or a UUID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the clean fix is to have two options, one for client id and other one for audience. The appiserver
should verify the audience from the token against the whitelisted value provided in the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The audience should be different than the client ID.
Why do you say this? That's not my understanding or expectation. With real id_token, the audience is the client_id. See: https://github.com/colemickens/azure-ad-k8s-oidc-example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and even though that example is AADv2, the same is true of id_tokens issued to AADv1). The example works as normal, with normal client_id (not the resource
value), etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take an example. I have a resource application registered with AAD for apiserver which has the resourceAppId fdfb78c2-788c-4ab6-bbed-cfd89cd21025
. This will be provided in the --oidc-client-id
. Now if I need to acquire a token for this application, another client application must be registered wiht appId 319e3b43-c214-43e8-8d8d-8d33b8e47902
which has delegated permissions to the apiserver application.
I can request a token with the following values:
- applicatoinId:
319e3b43-c214-43e8-8d8d-8d33b8e47902
- tenantId:
- resource:
fdfb78c2-788c-4ab6-bbed-cfd89cd21025
The returned token will have these claims (many claims ignored for brevity):
{
"aud": "spn:fdfb78c2-788c-4ab6-bbed-cfd89cd21025",
"iss": "https://sts.windows.net/26f087ba-1a21-11e7-93ae-92361f002671/",
"appid": "319e3b43-c214-43e8-8d8d-8d33b8e47902"
}
azure cli does the same but the audience is an URL:
{
"aud": "https://management.core.windows.net/",
"iss": "https://sts.windows.net/72f988bf-86f1-41af-91ab-2d7cd011db47/",
"appid": "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
}
As you see from the first example the audience can be almost the same like the client id if one uses the UUID of the resource when acquiring the token.
Note that any owner of an application can set delegated permissions
to any public resource application
registered in the same AAD. The conclusion is that the RBAC policy should be based on the user and group claims. The audience does not have strong security guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The audience should be different than the client ID.
No, this is totally against the OpenID Connect spec[0]
[0] https://openid.net/specs/openid-connect-core-1_0.html#IDTokenValidation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get you point a client needs to request a token for itself. In that case AAD will return:
{
"aud": "spn:319e3b43-c214-43e8-8d8d-8d33b8e47902",
"iss": "https://sts.windows.net/72f988bf-86f1-41af-91ab-2d7cd011db47/",
"appid": "319e3b43-c214-43e8-8d8d-8d33b8e47902",
}
Are you fine if the oidc-id is configured to spn:319e3b43-c214-43e8-8d8d-8d33b8e479021
?
@ericchiang @colemickens
I guess, the azure cli integration has to be removed because it adhere to OAuth2 actors model (http://nordicapis.com/api-security-oauth-openid-connect-depth/) requesting always a token for a resource server not for itself.
I can keep the device code and also remove the audience
configuration.
kubectl config set-credentials "USER_NAME" --auth-provider=azure \ | ||
--auth-provider-arg=client-id=APPLICATION_ID \ | ||
--auth-provider-arg=tenant-id=TENANT_ID \ | ||
--auth-provider-arg=audiance=https://management.core.windows.net/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: audiance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colemickens Thank you for review! I fixed the typos.
req2.Header[k] = append([]string(nil), s...) | ||
} | ||
|
||
req2.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token.token.AccessToken)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in understanding this is using an accessToken
as an OIDC token?
My understanding is that this was discouraged: https://oauth.net/articles/authentication/
But I think the Google auth works this same way, by taking the CLI access token and using it as an OIDC token...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @ericchiang can you share any thoughts on this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In AAD the access token is equivalent with id token. It is a JWT token with all claims of a user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm still pretty confused as to how this conforms to OIDC, or is secure in its usage. To be clear, as far as I can tell, it seems to be the same pattern that Google uses... but it seems invalid with my understanding of OIDC.
Consider: If the client_id
is not validated, and any old access token can be used, it means that I can take accessTokens granted to me via any application and then use them to talk to your apiserver? That seems... potentially very bad. Or am I misunderstanding? Doesn't this mean my personal application that lets users do OAuth flow to let me create some things in their subscription... can't I now turn around and use those scoped access tokens to get access to some disparate Kubernetes clusters that just happen to be configured for the same tenant without the user's consent?
This is sort of the scenario discussed in the link above that discourages the use of accessTokens for AuthN...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider: If the client_id is not validated, and any old access token can be used, it means that I can take accessTokens granted to me via any application and then use them to talk to your apiserver? That seems... potentially very bad.
Yes, it is extremely weird that the oidc-client-id is not set to a unique client_id.
Why can't we just use the id_token directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AADv1 doesn't sign id_token
when they're refreshed - meaning user would have to do device auth every hour - regardless of if kubectl
was retrieving directly, or if we forced user back to az
.
AADv2 lacks this entire device flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is right the id_token
issued by AAD cannot be used to verity the access, but as I said the access token contains the same claims.
It is a bit strange how the appiserver
validates the audience
. The identity provider will request consent to a user for a specific audience when issuing the token. In this case should be a dedicated audience for kubernetes apiserver
which should be registered in the identity provider as resource server.
The issue is in CorsOS's go-oidc
package which is used by the oidc plugin of apiserver. It property verifies the signature, expiration but it does the following assumption on audience.
https://github.com/coreos/go-oidc/blob/master/oidc/verification.go
Code snippet from VerifyClaims
function
// aud REQUIRED. Audience(s) that this ID Token is intended for.
// It MUST contain the OAuth 2.0 client_id of the Relying Party as an audience value.
// It MAY also contain identifiers for other audiences. In the general case, the aud
// value is an array of case sensitive strings. In the common special case when there
// is one audience, the aud value MAY be a single case sensitive string.
if aud, ok, err := claims.StringClaim("aud"); err == nil && ok {
if aud != clientID {
return fmt.Errorf("invalid claims, 'aud' claim and 'client_id' do not match, aud=%s, client_id=%s", aud, clientID)
}
} else if aud, ok, err := claims.StringsClaim("aud"); err == nil && ok {
if !containsString(clientID, aud) {
return fmt.Errorf("invalid claims, cannot find 'client_id' in 'aud' claim, aud=%v, client_id=%s", aud, clientID)
}
} else {
return errors.New("invalid claim value: 'aud' is required, and should be either string or string array")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cosmincojocar the audience shouldn't be the API server, it should be whatever the client_id of the client that initiated the login flow to request an ID token. That's specifically called out in the docs. The API server is NOT a client. It's set up to only allow ID tokens issues for a single client.
As for that snippet, it's just implementing the spec[0]
- The Client MUST validate that the aud (audience) Claim contains its client_id value registered at the Issuer identified by the iss (issuer) Claim as an audience. The aud (audience) Claim MAY contain an array with more than one element. The ID Token MUST be rejected if the ID Token does not list the Client as a valid audience, or if it contains additional audiences not trusted by the Client.
The issue seems to be the aud
embedded in your access token, which is https://management.core.windows.net/
rather then your client_id.
[0] https://openid.net/specs/openid-connect-core-1_0.html#IDTokenValidation
@colemickens The user will authenticate only once, then the plugin will use underneath the refresh token every hour to fetch a new access token. I would say from usability perspective this is quite nice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks fine, but I don't understand if OIDC is being used appropriately in K8s right now, I've left a comment about it. Maybe @ericchiang can fix my understanding of OIDC and/or why it's okay to use accessTokens.
Please do see the note about Authority URL. ACS-Engine is deploying working clusters in GermanyCloud/ChinaCloud, so it would be good if this integration were to "just work" there as well.
|
||
const ( | ||
azureTimeFormat = "\"2006-01-02 15:04:05.99999\"" | ||
azureAuthority = "https://login.microsoftonline.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not hard code this? This will fail in Germany/China clouds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the real (full tenant) authority URL in the tokens json anyway, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the hardcoded authority URL. It is read anyhow from file. Also the Azure environment is now configurable for device code flow.
|
||
* Replace `USER_NAME` with your user name. | ||
|
||
### Device Flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, doesn't Google do something similar with a gcloud
specific plugin?
In Azure's case, the actual id_token
s produced by AADv1 (at least after refresh) are not signed and thus not usable for this scenario. Hence the use of the access_token presumably - but again, that's the same thing the Google plugin does when it reads the access token from the json response from gcloud config config-helper
.
req2.Header[k] = append([]string(nil), s...) | ||
} | ||
|
||
req2.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token.token.AccessToken)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm still pretty confused as to how this conforms to OIDC, or is secure in its usage. To be clear, as far as I can tell, it seems to be the same pattern that Google uses... but it seems invalid with my understanding of OIDC.
Consider: If the client_id
is not validated, and any old access token can be used, it means that I can take accessTokens granted to me via any application and then use them to talk to your apiserver? That seems... potentially very bad. Or am I misunderstanding? Doesn't this mean my personal application that lets users do OAuth flow to let me create some things in their subscription... can't I now turn around and use those scoped access tokens to get access to some disparate Kubernetes clusters that just happen to be configured for the same tenant without the user's consent?
This is sort of the scenario discussed in the link above that discourages the use of accessTokens for AuthN...
@colemickens @cosmincojocar I've added this to the agenda for the sig-auth call tomorrow (4/5) if either of you are able to join (I know it's not a convenient time in Zürich, sorry...); might be faster to talk through how this fits into kubectl and azure auth in person. |
@colemickens Not any token will be accepted by The access can be also narrowed down by using the role based access control provided by Kubernetes. The |
Per the discussion in sig-auth this morning, there is no precedent for having the kubectl auth plugin do an interactive flow. Instead, the Azure CLI needs to have a command that will return a properly refreshed token available to use. I've filed a feature request on |
I removed the Azure CLI integration and configured the client to fetch a token for itself. The device code message is promoted only if no tokens are already present in the kubecl configuration. This is a better user experience than manually fetching the id_token and storing it in the configuration. |
My understanding of the feedback in the sig-auth call is that there should be NO interactive dialogs in the kubectl plugin - so sort of the opposite of the change that you made. The point made was that all of the actual token retrieval mechanism is delegated such that If you look at a real @ericchiang Somewhat tangentially... Is there anyway we can get more information about the internals of the service that Google uses with the webhook authenticator? How are y'all safely validating |
I don't quite get this argument interactive vs non-interactive flow. What is the advantage of delegating the authentication to an external tool (e.g. az) which is called as a process from the plugin? This tool most like will execute the same kind of flow. The error handling looks a bit fragile to me, the plugin will rely only on the exit code of the tool and probably will have to parse the stderr/stdout in order to get more details about the error. Why not using directly the Go library to achieve the same? The user of this plugin has also the choice to set manually the tokens in the configuration and in that case the interactive flow will not be executed. The confusion regarding the This plugin requests tokens from AAD v1. I guess is more the question: Do we want to support AAD v1 which is OAuth 2.0 but not OIDC compatible? The |
@cosmincojocar I hadn't seen a comment addressing client_secret auth flow. Did you want to add it in this PR? |
@ericchiang I would prefer that we close this PR since it got quite long. I will follow with another PR for client_secret auth. |
Going to wait for @colemickens' comment. Other than that lgtm. |
bump. since freeze is friday I'm going to advocate we merge as is. any objections? |
/lgtm I'm fine with merging as is and adding client_secret auth later |
/assign @lavalamp |
/lgtm |
@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this |
@k8s-bot test this |
Let me figure out if we made/how we make 1.7 for this. |
Any chance to get this merged? |
@cosmincojocar This wasn't approved when the 1.7 freeze came down. I think we're out of luck. |
/approve |
/approve I'm approving this for 1.7 since it was LGTM the day before code freeze. @dchen1107 for visibility. This is low risk since it is a plugin and off by default. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brendandburns, colemickens, cosmincojocar, ericchiang Associated issue: 29 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@cosmincojocar: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276) |
This is an Azure Active Directory plugin for client authentification. It provides an integration with Azure CLI 2.0 login command. It can also be used standalone, in that case it will use the device code flow to acquire an access token.
More details are provided in the README.md file.
kubernetes/kubectl#29
cc @brendandburns @colemickens