Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start recording cloud provider metrics for AWS #43477

Merged

Conversation

gnufied
Copy link
Member

@gnufied gnufied commented Mar 21, 2017

What this PR does / why we need it:

This PR implements support for emitting metrics from AWS about storage operations.

Which issue this PR fixes

Fixes kubernetes/enhancements#182

Release note:

Add support for emitting metrics from AWS cloudprovider about storage operations.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 21, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Mar 21, 2017
@gnufied
Copy link
Member Author

gnufied commented Mar 21, 2017

/assign @justinsb

@gnufied
Copy link
Member Author

gnufied commented Mar 28, 2017

cc @justinsb @jingxu97 can we review this? the other PR for gce is almost ready for merge too.

@k8s-github-robot k8s-github-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Mar 30, 2017
@gnufied
Copy link
Member Author

gnufied commented Mar 30, 2017

@piosz @brancz can you also review this one please, while we are at it?

/assign @brancz

@rootfs
Copy link
Contributor

rootfs commented Mar 30, 2017

LGTM

Copy link
Member

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a small one, after that looks good

Name: "aws_attach_volume_duration_seconds",
Help: "Latency of aws attach call",
},
[]string{"kube_namespace"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube_namespace -> namespace

@gnufied
Copy link
Member Author

gnufied commented Mar 31, 2017

@brancz fixed. ptal

@brancz
Copy link
Member

brancz commented Mar 31, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2017
@justinsb
Copy link
Member

Looks good from an AWS perspective.

My two concerns:

  1. naming (asking in sig-instrumentation)
  2. How do we account for exponential backoff

@gnufied
Copy link
Member Author

gnufied commented Mar 31, 2017

We don't account for exponential backoff here. There is separate feature request @rootfs is working on which considers total attach time, which possibly will include any exponential backoff incurred.

requestTime := time.Now()
resp, err := s.ec2.AttachVolume(request)
timeTaken := time.Since(requestTime).Seconds()
recordAwsMetric("aws_attach_volume", timeTaken, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will record duration for successful and failed calls in the same metric. I'd prefer only recording them for succesfull requests, but I know some people have different preferences.

@@ -626,16 +635,25 @@ func (s *awsSdkEC2) DescribeVolumes(request *ec2.DescribeVolumesInput) ([]*ec2.V
}
request.NextToken = nextToken
}

timeTaken := time.Since(requestTime).Seconds()
recordAwsMetric("aws_describe_volume", timeTaken, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it's only recording in the success case. I think it should be consist either way.

@discordianfish
Copy link
Contributor

@justinsb Regarding the naming, from a prometheus perspective it doesn't matter, it's the same number of timeseries it needs to handle. I personally slightly prefer having one metric as you suggested, mainly for easier aggregation.

@justinsb
Copy link
Member

So I read our guidelines and the discussion on the design and it sounds like everything is OK @gnufied. While it would be nice to see the total requests, I think that might also be optimizing for our purposes as developers rather than end-users. I expect in most cases if someone is hitting a rate limit it would be pretty easy to send a canned query and observe that one line is much higher than the others.

Splitting out success & failures sounds like we can either do it with more dimensions or maybe more names, but that we can do that later.

So I'm going to approve so we can start collecting data :-)

/approve

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 19, 2017
@gnufied
Copy link
Member Author

gnufied commented Apr 21, 2017

@justinsb my plan is still to amend the metric names to have action as dimension. I opened the design PR for same - kubernetes/community#507

In the meanwhile @bowei has gone ahead and amended GCE metrics - #44510

I like the idea of recording errors and successes separately. I think we can do that here too. But there is too much disagreement on what dimensions to report. @bowei for example have dropped namespace in his GCE PR and replaced them with zone and node dimensions.

What I would like to do is have some consensus about this and then we can fix both GCE and AWS metrics to be on similar lines.

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 26, 2017
@brancz
Copy link
Member

brancz commented Apr 27, 2017

This is probably a good idea, we should adapt here if we decide to change it in the proposal.

Lets start recording storage metrics for AWS.
@gnufied
Copy link
Member Author

gnufied commented Apr 28, 2017

@brancz fixed the metric names as per suggestion.

@brancz
Copy link
Member

brancz commented Apr 28, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 28, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: brancz, gnufied, justinsb

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 9afeabb into kubernetes:master Apr 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cloudprovider metrics for storage
9 participants