-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPVS: kubelet, kube-proxy: unmark packets before masquerading … #92584
Conversation
/test pull-kubernetes-e2e-gci-gce-ipvs |
/assign @andrewsykim @uablrek This solves all the test failures reported in #92553
and gives us back to the original situation reported in, with the flexvolume test failure #91253
@danwinship added this fix to solve the double natting problem, I tried to backtrace the problem but seems that IPVS uses ipset too and I wasn't able to find out the problem , so reverting the change only in IPVS will solve the issue |
/priority urgent-critical /test pull-kubernetes-e2e-kind-ipv6 |
@aojea: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@aojea catching up on this. So we're reverting the double-nat fix for IPVS only since it used ipset and was not impacted? |
I don't know if it was impacted, sorry, didn't go so far understanding the ipset - iptables relationship, however, with the fix in IPVS is broken per the test results. I think that we should follow up and verify if the double-nat fix is needed for IPVS, but have it working until then. |
/retest
|
/retest |
test/e2e/storage/flexvolume.go
Outdated
@@ -182,7 +182,7 @@ var _ = utils.SIGDescribe("Flexvolumes", func() { | |||
suffix = ns.Name | |||
}) | |||
|
|||
ginkgo.It("should be mountable when non-attachable", func() { | |||
ginkgo.It("should be mountable when non-attachable [Feature:Flexvolumes]", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is for storage feature, but still feel inappropriate to directly add this label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing constantly, is breaking this job.
It seems a leftover from #91329, the linked issue says that these tests should have the feature tag to avoid breaking the e2e suite
#78737 (comment)
I've added it as a separate commit so the git history is clean and can be reverted independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @msau42 is this okay to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch the distroless should not have impacted the "non-attachable" test case. This test case is passing fine in the normal k8s job: https://k8s-testgrid.appspot.com/sig-storage-kubernetes#gce&include-filter-by-regex=Flexvolume
So it would be interesting to figure out why it is consistently failing in this job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about that, my mistake
/retest |
@aojea I think part of the problem is because
However, I think straightly revert the double-nat fix is not a good idea, maybe move |
agree that the best solution is to solve it, but right now IPVS mode is broken and I don't know if the double-nat issue with the random-fully flag is affecting IPVS, but if it does affect, it's only in some specific scenarios, we can see that all the test succeed now. Honestly, I don't have more time to investigate 😅 , seems this logic was added here #54219 , my suggestion is to merge this, fix the job and IPVS so we can have a good test signal, and open a new Issue to solve/understand the double-nat issue in IPVS. |
Agree. |
There were definitely people using IPVS in various places in the sprawling tangle of bug reports related to the random-fully/checksum problem.
er, yeah, my fix assumed that kubelet and kube-proxy were creating identical This will hopefully eventually get fixed as part of #82125 but yes, it seems like for now just moving the |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrewsykim, aojea The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
@Lion-Wei do you mind lgtm again? |
/retest |
@aojea Not at all. |
/retest |
I thought the flex volume tests were failing because quay.io was down but that seemed to be fixed now #91242 ? |
Maybe we just need to update to use the staging CI image for quay? @BenTheElder might know? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold cancel
Removing the hold though since only the flex volume tests are failing and this is a known issue
/retest Review the full test history for this PR. Silence the bot with an |
I don't think root cause of flex volume test failing in this job is known. It's passing in normal k8s ci, so I suspect it is related to the configuration of this job. This line from the test log is potentially suspicious and indicates that the test is trying to use IPs.
|
/retest Review the full test history for this PR. Silence the bot with an |
2 similar comments
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
@aojea: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Review the full test history for this PR. Silence the bot with an |
…upstream-release-1.18 Automated cherry pick of #92584: kube-proxy ipvs masquerade hairpin traffic
…upstream-release-1.17 Automated cherry pick of #92584: kube-proxy ipvs masquerade hairpin traffic
…upstream-release-1.16 Automated cherry pick of #92584: kube-proxy ipvs masquerade hairpin traffic
What type of PR is this?
/kind failing-test
What this PR does / why we need it:
The sig-network tests started to fail after the double nat fix.
The problem is that the IPVS proxier uses a POSTORUTING for the hairpin traffic that needs to happen before the
kubernetes specific postrouting rules. Prepending this ipvs only masquerade rule solves the issue.
Which issue(s) this PR fixes:
Fixes #92553
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: