Kubelet should not report multiple stats for the same container in the summary endpoint #47853

yujuhong · 2017-06-21T16:45:25Z

We've observed that sometimes container cgroups may be left behind uncleaned even though the container has terminated. In those cases, cadvisor would continue reporting stats for the terminated container, which then is shown in kubelet's /stats/summary endpoint. Once kubelet starts a new container instance, it'd report both the terminated and the new container stats simultaneously.

"podRef": {
     "name": "foo",
     "namespace": "default",
     "uid": "11ebda1a-24de-11e7-b5f8-42010a8000fe"
    },
    "startTime": "2017-05-01T13:40:22Z",
    "containers": [
     {
      "name": "bar",
      "startTime": "2017-05-01T13:40:25Z",
      "cpu": {
       "time": "2017-05-17T11:14:08Z",
       "usageNanoCores": 158370,
       "usageCoreNanoSeconds": 192768296377
      },
     ...
     },
     {
      "name": "bar",
      "startTime": "2017-04-19T08:56:26Z",
      "cpu": {
       "time": "2017-05-17T11:14:12Z",
       "usageNanoCores": 0,
       "usageCoreNanoSeconds": 145043746869
      },
      ...
     },

The duplicated entries for the same container may confuse the monitoring components and cause loss of stats.

kubelet should do the due diligence and filters out the entries for the dead containers. These entries can be (mostly) identified by looking at the startTime (earlier than the new entry) and the usage (zero usage). Alternatively, kubelet can check the existence of the container by asking the container runtime, and filters out entries accordingly. However, due to the gap between the stats collection time and the existence check time, this may filter out more stats than desired. I propose we use the simpler, heuristics to remove the extra stats.

/cc @dchen1107 @timstclair @piosz
/cc @kubernetes/sig-node-bugs

The text was updated successfully, but these errors were encountered:

piosz · 2017-06-21T16:56:01Z

How about the approach where in case of duplicated containers names we leave only the one with the newest startTime?

yujuhong · 2017-06-21T17:35:31Z

How about the approach where in case of duplicated containers names we leave only the one with the newest startTime?

That's what I suggested -- using some heuristics to filter.

piosz · 2017-06-22T20:28:09Z

I just wanted to say that IMO very simple heuristic is enough.

yujuhong · 2017-06-22T20:33:11Z

I just wanted to say that IMO very simple heuristic is enough.

What if kubelet actually creates two containers by mistake? Shouldn't we expose their stats to raise attention? I think it's worth being more conservative and only filters out the containers we think are terminated.

yguo0905 · 2017-07-05T21:32:17Z

/assign

loburm · 2017-07-20T08:36:25Z

I have added small heuristic to the heapster, that will not allow him to report metrics multiple times for the same container.

@yguo0905 Should we close this issue, or you want also to change behavior of kubelet?

yujuhong · 2017-07-20T16:25:45Z

@yguo0905 Should we close this issue, or you want also to change behavior of kubelet?

I think it's still a good idea to add the safety in kubelet. @yguo0905 already has a PR #48739 up. I haven't had time to review yet though.

@yujuhong

Automatic merge from submit-queue Remove the status of the terminated containers in the summary endpoint Ref: #47853 - When building summary, a container is considered to be terminated if it has an older creation time and no CPU instantaneous or memory RSS usage. - We remove the terminated containers in the summary by grouping the containers with the same name in the same pod, sorting them in each group by creation time, and skipping the oldest ones with no usage in each group. Let me know if there's simpler way. **Release note**: ``` None ``` /assign @yujuhong

yujuhong · 2017-08-15T16:17:35Z

Fixed by #48739

Automatic merge from submit-queue Manual cherry-pick of #50636 #50636: Bumped Heapster version to 1.4.1 ```release-note Bumped Heapster version to 1.4.1: - handle gracefully problem when kubelet reports duplicated stats for the same container (see #47853) on Heapster side - fixed bugs and improved performance in Stackdriver Sink ```

yujuhong added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 21, 2017

fgrzadkowski mentioned this issue Jun 22, 2017

Heapster should handle duplicated stats from kubelet gracefully kubernetes-retired/heapster#1697

Closed

k8s-ci-robot assigned yguo0905 Jul 5, 2017

yguo0905 mentioned this issue Jul 10, 2017

Remove the status of the terminated containers in the summary endpoint #48739

Merged

yujuhong closed this as completed Aug 15, 2017

This was referenced Aug 30, 2017

Remove the status of the terminated containers in the summary endpoint #51610

Closed

Bumped Heapster version to 1.4.2 #51620

Merged

yguo0905 mentioned this issue Oct 6, 2017

CRI: Clearly define the expectation of Summary API and CRI Stats #53514

Closed

yguo0905 mentioned this issue Nov 7, 2017

Filter out duplicated container stats #54606

Merged

loburm mentioned this issue Jun 22, 2018

Stats summary endpoint listing two containers with same name #65332

Closed

littleroad mentioned this issue Feb 22, 2019

cri_stats_provider: Implement removeTerminatedContainer correctly #74336

Merged

ashleycutalo mentioned this issue Oct 16, 2019

cri_stats_provider: do not consider exited containers when calculating cpu usage #83504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet should not report multiple stats for the same container in the summary endpoint #47853

Kubelet should not report multiple stats for the same container in the summary endpoint #47853

yujuhong commented Jun 21, 2017

piosz commented Jun 21, 2017

yujuhong commented Jun 21, 2017

piosz commented Jun 22, 2017

yujuhong commented Jun 22, 2017

yguo0905 commented Jul 5, 2017

loburm commented Jul 20, 2017

yujuhong commented Jul 20, 2017

yujuhong commented Aug 15, 2017

Kubelet should not report multiple stats for the same container in the summary endpoint #47853

Kubelet should not report multiple stats for the same container in the summary endpoint #47853

Comments

yujuhong commented Jun 21, 2017

piosz commented Jun 21, 2017

yujuhong commented Jun 21, 2017

piosz commented Jun 22, 2017

yujuhong commented Jun 22, 2017

yguo0905 commented Jul 5, 2017

loburm commented Jul 20, 2017

yujuhong commented Jul 20, 2017

yujuhong commented Aug 15, 2017