-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start recording cloud provider metrics for AWS #43477
Start recording cloud provider metrics for AWS #43477
Conversation
/assign @justinsb |
144ebe1
to
4b58f71
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only a small one, after that looks good
Name: "aws_attach_volume_duration_seconds", | ||
Help: "Latency of aws attach call", | ||
}, | ||
[]string{"kube_namespace"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kube_namespace
-> namespace
4b58f71
to
5220193
Compare
@brancz fixed. ptal |
/lgtm |
Looks good from an AWS perspective. My two concerns:
|
We don't account for exponential backoff here. There is separate feature request @rootfs is working on which considers total attach time, which possibly will include any exponential backoff incurred. |
requestTime := time.Now() | ||
resp, err := s.ec2.AttachVolume(request) | ||
timeTaken := time.Since(requestTime).Seconds() | ||
recordAwsMetric("aws_attach_volume", timeTaken, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will record duration for successful and failed calls in the same metric. I'd prefer only recording them for succesfull requests, but I know some people have different preferences.
@@ -626,16 +635,25 @@ func (s *awsSdkEC2) DescribeVolumes(request *ec2.DescribeVolumesInput) ([]*ec2.V | |||
} | |||
request.NextToken = nextToken | |||
} | |||
|
|||
timeTaken := time.Since(requestTime).Seconds() | |||
recordAwsMetric("aws_describe_volume", timeTaken, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's only recording in the success case. I think it should be consist either way.
@justinsb Regarding the naming, from a prometheus perspective it doesn't matter, it's the same number of timeseries it needs to handle. I personally slightly prefer having one metric as you suggested, mainly for easier aggregation. |
So I read our guidelines and the discussion on the design and it sounds like everything is OK @gnufied. While it would be nice to see the total requests, I think that might also be optimizing for our purposes as developers rather than end-users. I expect in most cases if someone is hitting a rate limit it would be pretty easy to send a canned query and observe that one line is much higher than the others. Splitting out success & failures sounds like we can either do it with more dimensions or maybe more names, but that we can do that later. So I'm going to approve so we can start collecting data :-) /approve |
@justinsb my plan is still to amend the metric names to have action as dimension. I opened the design PR for same - kubernetes/community#507 In the meanwhile @bowei has gone ahead and amended GCE metrics - #44510 I like the idea of recording errors and successes separately. I think we can do that here too. But there is too much disagreement on what dimensions to report. @bowei for example have dropped What I would like to do is have some consensus about this and then we can fix both GCE and AWS metrics to be on similar lines. |
5220193
to
181b23b
Compare
This is probably a good idea, we should adapt here if we decide to change it in the proposal. |
Lets start recording storage metrics for AWS.
181b23b
to
f2aa330
Compare
@brancz fixed the metric names as per suggestion. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brancz, gnufied, justinsb
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue |
What this PR does / why we need it:
This PR implements support for emitting metrics from AWS about storage operations.
Which issue this PR fixes
Fixes kubernetes/enhancements#182
Release note: