Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPVS: kubelet, kube-proxy: unmark packets before masquerading … #92584

Merged
merged 1 commit into from
Jul 2, 2020

Conversation

aojea
Copy link
Member

@aojea aojea commented Jun 28, 2020

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation

/kind failing-test

/kind feature
/kind flake

What this PR does / why we need it:

The sig-network tests started to fail after the double nat fix.
The problem is that the IPVS proxier uses a POSTORUTING for the hairpin traffic that needs to happen before the
kubernetes specific postrouting rules. Prepending this ipvs only masquerade rule solves the issue.

Which issue(s) this PR fixes:

Fixes #92553

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@aojea catching up on this. So we're reverting the double-nat fix for IPVS only since it used ipset and was not impacted?

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/ipvs sig/network Categorizes an issue or PR as relevant to SIG Network. labels Jun 28, 2020
@aojea
Copy link
Member Author

aojea commented Jun 28, 2020

/test pull-kubernetes-e2e-gci-gce-ipvs

@k8s-ci-robot k8s-ci-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 28, 2020
@aojea
Copy link
Member Author

aojea commented Jun 28, 2020

/assign @andrewsykim @uablrek

This solves all the test failures reported in #92553

I0628 16:24:49.389] FAIL! -- 757 Passed | 1 Failed | 0 Pending | 4355 Skipped

and gives us back to the original situation reported in, with the flexvolume test failure #91253

I0628 16:24:49.388] [Fail] [sig-storage] Flexvolumes [It] should be mountable when non-attachable 
I0628 16:24:49.388] test/e2e/framework/volume/fixtures.go:502

@danwinship added this fix to solve the double natting problem, I tried to backtrace the problem but seems that IPVS uses ipset too and I wasn't able to find out the problem , so reverting the change only in IPVS will solve the issue

@aojea
Copy link
Member Author

aojea commented Jun 28, 2020

/priority urgent-critical
This breaks kube-proxy in IPVS mode

/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-e2e-gci-gce-ipvs

@k8s-ci-robot
Copy link
Contributor

@aojea: The label(s) priority/urgent-critical cannot be applied, because the repository doesn't have them

In response to this:

/priority urgent-critical
This breaks kube-proxy in IPVS mode

/test pull-kubernetes-e2e-kind-ipv6
/test pull-kubernetes-e2e-gci-gce-ipvs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrewsykim
Copy link
Member

@aojea catching up on this. So we're reverting the double-nat fix for IPVS only since it used ipset and was not impacted?

@k8s-ci-robot k8s-ci-robot added area/test sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 28, 2020
@aojea
Copy link
Member Author

aojea commented Jun 28, 2020

@aojea catching up on this. So we're reverting the double-nat fix for IPVS only since it used ipset and was not impacted?

I don't know if it was impacted, sorry, didn't go so far understanding the ipset - iptables relationship, however, with the fix in IPVS is broken per the test results.

I think that we should follow up and verify if the double-nat fix is needed for IPVS, but have it working until then.

@aojea
Copy link
Member Author

aojea commented Jun 28, 2020

/retest
it works 😄

pull-kubernetes-e2e-gci-gce-ipvs — Job succeeded.

@aojea
Copy link
Member Author

aojea commented Jun 28, 2020

/retest

@@ -182,7 +182,7 @@ var _ = utils.SIGDescribe("Flexvolumes", func() {
suffix = ns.Name
})

ginkgo.It("should be mountable when non-attachable", func() {
ginkgo.It("should be mountable when non-attachable [Feature:Flexvolumes]", func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is for storage feature, but still feel inappropriate to directly add this label.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing constantly, is breaking this job.

It seems a leftover from #91329, the linked issue says that these tests should have the feature tag to avoid breaking the e2e suite
#78737 (comment)

I've added it as a separate commit so the git history is clean and can be reverted independently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @msau42 is this okay to do here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch the distroless should not have impacted the "non-attachable" test case. This test case is passing fine in the normal k8s job: https://k8s-testgrid.appspot.com/sig-storage-kubernetes#gce&include-filter-by-regex=Flexvolume

So it would be interesting to figure out why it is consistently failing in this job.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about that, my mistake

@aojea
Copy link
Member Author

aojea commented Jun 29, 2020

/retest

@Lion-Wei
Copy link

@aojea I think part of the problem is because RETURN rule in KUBE-POSTROUTING break the loop back logic.

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
RETURN     all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000/0x4000
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK xor 0x4000
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src

However, I think straightly revert the double-nat fix is not a good idea, maybe move KUBE-LOOP-BACK up could be an option.

@aojea
Copy link
Member Author

aojea commented Jun 29, 2020

However, I think straightly revert the double-nat fix is not a good idea, maybe move KUBE-LOOP-BACK up could be an option.

agree that the best solution is to solve it, but right now IPVS mode is broken and I don't know if the double-nat issue with the random-fully flag is affecting IPVS, but if it does affect, it's only in some specific scenarios, we can see that all the test succeed now.

Honestly, I don't have more time to investigate 😅 , seems this logic was added here #54219 , my suggestion is to merge this, fix the job and IPVS so we can have a good test signal, and open a new Issue to solve/understand the double-nat issue in IPVS.

@Lion-Wei
Copy link

However, I think straightly revert the double-nat fix is not a good idea, maybe move KUBE-LOOP-BACK up could be an option.

agree that the best solution is to solve it, but right now IPVS mode is broken and I don't know if the double-nat issue with the random-fully flag is affecting IPVS, but if it does affect, it's only in some specific scenarios, we can see that all the test succeed now.

Honestly, I don't have more time to investigate 😅 , seems this logic was added here #54219 , my suggestion is to merge this, fix the job and IPVS so we can have a good test signal, and open a new Issue to solve/understand the double-nat issue in IPVS.

Agree.

@danwinship
Copy link
Contributor

There were definitely people using IPVS in various places in the sprawling tangle of bug reports related to the random-fully/checksum problem.

I think part of the problem is because RETURN rule in KUBE-POSTROUTING break the loop back logic.

maybe move KUBE-LOOP-BACK up could be an option.

er, yeah, my fix assumed that kubelet and kube-proxy were creating identical KUBE-POSTROUTING rules.

This will hopefully eventually get fixed as part of #82125 but yes, it seems like for now just moving the KUBE-LOOP-BACK rule should fix this

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 1, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, aojea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Member Author

aojea commented Jul 1, 2020

/hold cancel
flexvolume commit removed

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 1, 2020
@aojea
Copy link
Member Author

aojea commented Jul 1, 2020

@Lion-Wei do you mind lgtm again?

@aojea
Copy link
Member Author

aojea commented Jul 1, 2020

/retest
the ipvs job failure is because the flexvolume test only

@Lion-Wei
Copy link

Lion-Wei commented Jul 1, 2020

@aojea Not at all.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 1, 2020
@aojea
Copy link
Member Author

aojea commented Jul 1, 2020

/retest
🤔

@andrewsykim
Copy link
Member

I thought the flex volume tests were failing because quay.io was down but that seemed to be fixed now #91242 ?

@andrewsykim
Copy link
Member

Maybe we just need to update to use the staging CI image for quay? @BenTheElder might know?

Copy link
Member

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

Removing the hold though since only the flex volume tests are failing and this is a known issue

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@msau42
Copy link
Member

msau42 commented Jul 1, 2020

I don't think root cause of flex volume test failing in this job is known. It's passing in normal k8s ci, so I suspect it is related to the configuration of this job.

This line from the test log is potentially suspicious and indicates that the test is trying to use IPs.

Jul  1 15:50:26.384: INFO: Getting external IP address for e2e-bc0cae53a3-2e794-minion-group-mxnp

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot
Copy link
Contributor

@aojea: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gci-gce-ipvs c40081b link /test pull-kubernetes-e2e-gci-gce-ipvs

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 4d0ce2e into kubernetes:master Jul 2, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jul 2, 2020
k8s-ci-robot added a commit that referenced this pull request Jul 9, 2020
…upstream-release-1.18

Automated cherry pick of #92584: kube-proxy ipvs masquerade hairpin traffic
k8s-ci-robot added a commit that referenced this pull request Jul 9, 2020
…upstream-release-1.17

Automated cherry pick of #92584: kube-proxy ipvs masquerade hairpin traffic
k8s-ci-robot added a commit that referenced this pull request Jul 10, 2020
…upstream-release-1.16

Automated cherry pick of #92584: kube-proxy ipvs masquerade hairpin traffic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipvs area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some IPVS e2e tests are failing
10 participants