Description
We have apparently been accidentally deleting the
-t nat -A KUBE-MARK-DROP -j MARK --set-xmark 0xXXXX
rule for a few weeks (#85527), and no one noticed.
I suspect this is because KUBE-MARK-DROP
is really only needed if the host accepts all incoming packets by default, and if you have any sort of plausible firewall, then KUBE-MARK-DROP
is redundant, and so therefore the e2e tests that might otherwise catch KUBE-MARK-DROP
failures don't actually catch them.
The iptables proxier uses KUBE-MARK-DROP
in two cases, on cloud platforms where we create iptables rules for LoadBalancer IPs (eg, GCE but not AWS), when a service has a load balancer IP, and it has endpoints, and a packet arrives on the node whose destination is the load-balancer IP:
- If the service specifies
spec.loadBalancerSourceRanges
, and the packet's source IP is not in the source ranges, then we callKUBE-MARK-DROP
on the packet to drop it later.- This is theoretically tested by
"It should only allow access from service loadbalancer source ranges"
. However, if theKUBE-MARK-DROP
rule becomes a no-op, then the pod-to-LoadBalancer-IP connection will fall through the firewall chain, never hit the XLB chain, and eventually just get masqueraded and delivered to the LoadBalancer IP like it would for any other cluster-external IP. Since the cloud loadbalancer is also programmed with the source ranges, and the source range in this test is a single pod IP, the load balancer will then reject the packet (since it has the node's IP as its source at this point). - I think we can fix this test to fail in the absence of the drop rule by adding the node's IP to the source range. Then the expected-to-fail connection would (erroneously) not get dropped by the node, get passed to the cloud load balancer, which would accept it, and then get passed back to the service, causing the test case to fail.
- This is theoretically tested by
- If the service has
ServiceExternalTrafficPolicyTypeLocal
and no local endpoints then we callKUBE-MARK-DROP
on the packet to drop it later.- This does not get tested by
"It should only target nodes with endpoints"
, because if the load balancers are working correctly then they won't send any traffic to the nodes that are creating the drop rules anyway. - It also does not get tested by
"It should work from pods"
because a pod-to-LoadBalancer-IP connection will be rewritten to be pod-to-ClusterIP before the only-local check and bypass the drop rule. - I think it should be possible to test this by trying to connect to an only-local LoadBalancer service from a
hostNetwork
pod on a node that has no endpoints for the service. The drop rule ought to cause that connection to fail, but if the drop rule was missing then it would make a connection directly to the LoadBalancer and then succeed.
- This does not get tested by
The ipvs proxier refers to the KUBE-MARK-DROP
chain, but I think it doesn't actually use it...
/sig network
/priority important-soon