-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet should not report multiple stats for the same container in the summary endpoint #47853
Comments
How about the approach where in case of duplicated containers names we leave only the one with the newest startTime? |
That's what I suggested -- using some heuristics to filter. |
I just wanted to say that IMO very simple heuristic is enough. |
What if kubelet actually creates two containers by mistake? Shouldn't we expose their stats to raise attention? I think it's worth being more conservative and only filters out the containers we think are terminated. |
/assign |
I have added small heuristic to the heapster, that will not allow him to report metrics multiple times for the same container. @yguo0905 Should we close this issue, or you want also to change behavior of kubelet? |
Automatic merge from submit-queue Remove the status of the terminated containers in the summary endpoint Ref: #47853 - When building summary, a container is considered to be terminated if it has an older creation time and no CPU instantaneous or memory RSS usage. - We remove the terminated containers in the summary by grouping the containers with the same name in the same pod, sorting them in each group by creation time, and skipping the oldest ones with no usage in each group. Let me know if there's simpler way. **Release note**: ``` None ``` /assign @yujuhong
Fixed by #48739 |
Automatic merge from submit-queue Manual cherry-pick of #50636 #50636: Bumped Heapster version to 1.4.1 ```release-note Bumped Heapster version to 1.4.1: - handle gracefully problem when kubelet reports duplicated stats for the same container (see #47853) on Heapster side - fixed bugs and improved performance in Stackdriver Sink ```
We've observed that sometimes container cgroups may be left behind uncleaned even though the container has terminated. In those cases, cadvisor would continue reporting stats for the terminated container, which then is shown in kubelet's /stats/summary endpoint. Once kubelet starts a new container instance, it'd report both the terminated and the new container stats simultaneously.
The duplicated entries for the same container may confuse the monitoring components and cause loss of stats.
kubelet should do the due diligence and filters out the entries for the dead containers. These entries can be (mostly) identified by looking at the startTime (earlier than the new entry) and the usage (zero usage). Alternatively, kubelet can check the existence of the container by asking the container runtime, and filters out entries accordingly. However, due to the gap between the stats collection time and the existence check time, this may filter out more stats than desired. I propose we use the simpler, heuristics to remove the extra stats.
/cc @dchen1107 @timstclair @piosz
/cc @kubernetes/sig-node-bugs
The text was updated successfully, but these errors were encountered: