Troubleshoot configuration
This guide can help you solve common issues with Cloud NAT.
Common issues
VMs can reach the internet unexpectedly, without Cloud NAT
If your virtual machine (VM) instances or container instances can reach the internet without Cloud NAT, but you don't want them to, check for the following issues:
-
Determine if the VM's network interface has an external IP address. If the network interface has an external IP address assigned to it, Google Cloud automatically performs one-to-one NAT for packets whose sources match the interface's primary internal IP address. For more information, see Cloud NAT specifications.
To determine if a VM has an external IP address, see changing or assigning an external IP address to an existing instance.
Ensure that your Google Kubernetes Engine (GKE) cluster is a private cluster. Each node VM in a non-private cluster has an external IP address, so each node can use routes in your Virtual Private Cloud (VPC) network whose next hop is the default internet gateway without relying on Cloud NAT. For more information, including how non-private clusters interact with Cloud NAT gateways, see Compute Engine interaction.
List routes in your Virtual Private Cloud network, looking for ones that could provide internet connectivity through a next hop different than the default internet gateway. As examples:
Static routes whose next hops are VMs, internal passthrough Network Load Balancers, or Cloud VPN tunnels might indirectly provide internet connectivity. For example, the next hop VMs or backend VMs for an internal passthrough Network Load Balancer might have external IP addresses themselves, or a Cloud VPN tunnel might connect to a network that offers internet access.
Dynamic routes learned from on-premises networks by Cloud Routers in your VPC network might connect to a network that offers internet access.
Keep in mind that other custom routes in your VPC network might have higher priorities than routes whose next hops are default internet gateways. For information about how Google Cloud evaluates routes, see routing applicability and order.
No logs are generated
- Verify that NAT logging is enabled.
Double-check that your view of the logs isn't filtering out the logs that you are looking for. For instructions, see Viewing logs.
Make sure that a firewall rule isn't blocking traffic. Firewall rules that block egress (outbound) traffic are applied before the traffic would have been sent to the NAT gateway. You can use Firewall Rules Logging to see if your custom egress rules are blocking outbound traffic.
Review Types of Cloud NAT. The destination for your traffic might not be handled by NAT.
Certain logs are excluded
Verify that NAT logging is enabled and that your log filter is not excluding logs that you want to keep. You can clear a logs filter so that nothing is excluded.
Cloud NAT does not log every single event. During periods of heavy egress traffic, NAT logging is throttled, proportional to the machine type of the VM. Translation or error logs might be dropped, and it is not possible to determine what is omitted during throttling.
Packets dropped with reason: out of resources
If you see packet loss from VMs that use Cloud NAT, this might be because there are not enough available NAT source IP address and source port tuples for the VM to use at the time of the packet loss (port exhaustion). A five-tuple (NAT source IP address, source port, and destination 3-tuple) cannot be reused within the TCP TIME_WAIT timeout.
If there aren't enough available NAT tuples, the dropped_sent_packets_count
reason is
OUT_OF_RESOURCES
. For more information about metrics, see Using VM instance
metrics.
See Reduce your port usage for ways to reduce port usage.
If you use dynamic port allocation, see the following section for ways to reduce packet drops when dynamic port allocation is used.
Packets dropped when dynamic port allocation is configured
Dynamic port allocation detects when a VM is close to being out of ports, and doubles the number of ports that are allocated to the VM. This helps ensure that ports aren't wasted, but can result in dropped packets while the number of allocated ports is increasing.
To reduce the number of dropped packets, consider the following:
If you can ramp up connections more slowly, Cloud NAT has more time to allocate more ports.
If VMs are making TCP connections, you can configure the VMs with a larger value for
tcp_syn_retries
, which gives the system more time to establish the connection and increases the chances for the connection to succeed.For example, for Linux VMs, you can view the current setting:
sysctl net.ipv4.tcp_syn_retries
If needed you can increase the setting:
sudo sysctl -w net.ipv4.tcp_syn_retries=NUM
If you have bursty workloads and need to quickly allocate more ports, you might need to adjust the minimum number of ports per VM. View your port usage and determine an appropriate minimum number of ports per VM.
Packets dropped with reason: endpoint independent conflict
If you see packet loss from VMs that use Public NAT, and you have
Endpoint-Independent Mapping turned on, the packet loss might be caused by an
endpoint independent
conflict. If it is, the
dropped_sent_packets_count
reason is
ENDPOINT_INDEPENDENT_CONFLICT
. For more information about metrics, see Using
VM instance metrics.
You can reduce the chances of endpoint independent conflicts by using the following techniques:
Turn off Endpoint-Independent Mapping. This allows the new connection from a given source IP address and port to use a different NAT source IP address and port than it used before. Disabling or enabling Endpoint-Independent Mapping does not interrupt established connections.
Increase the minimum default number of NAT ports per VM instance, so that the port reservation procedure can assign more NAT source IP address and source port tuples to each client VM. This decreases the probability that two or more client IP address and ephemeral source port tuples are assigned the same NAT source IP address and source port tuple.
Check how many ephemeral source ports are being used:
For Linux VMs:
netstat -an | egrep 'ESTABLISHED|TIME_WAIT|CLOSE_WAIT' | wc -l
For Windows VMs:
netstat -tan | findstr "ESTABLISHED TIME_WAIT CLOSE_WAIT" | find /c /v ""
Configure your VM instances to use a larger set of ephemeral source ports:
For Linux VMs:
You can view what port range is configured with this command:
cat /proc/sys/net/ipv4/ip_local_port_range
You can set the
ip_local_port_range
to the maximum number of ephemeral source ports (64,512) with this command:echo 1024 65535 > /proc/sys/net/ipv4/ip_local_port_range
For Windows VMs:
You can view what port ranges are configured with these commands:
netsh int ipv4 show dynamicport tcp netsh int ipv4 show dynamicport udp
You can set the number of ephemeral source TCP and UDP ports to the maximum possible (64,512) with these commands:
netsh int ipv4 set dynamicport tcp start=1024 num=64512 netsh int ipv4 set dynamicport udp start=1024 num=64512
On Google Kubernetes Engine nodes, you can automate this configuration by using a privileged
DaemonSet
.
For GKE clusters, disable the source NAT performed on each node for packets sent to destinations of interest. You can do this in one of two ways:
By deploying the
ip-masq-agent
and adding the destinations of interest to the list ofnonMasqueradeCIDRs
.By disabling SNAT for the default non-masquerade destinations with the
--disable-default-snat
flag when you create a cluster.
Dropped received packets
A Cloud NAT gateway maintains a connection tracking table to store active connection details and IP address and port mappings—how VM IP addresses and ports translate to NAT IP addresses and ports. A Cloud NAT gateway drops an ingress data packet if the connection tracking table doesn't contain any entry for the connection.
The absence of the connection entry in the table can be due to any of the following reasons:
- An established TCP connection timed out because the TCP Established Connection Idle Timeout expired due to inactivity.
- An external endpoint fails to establish a new connection before the TCP Transitory
Connection Idle Timeout expired. For example, a Google Cloud resource
initiates a connection with
TCP SYN
, but the external endpoint fails to respond with aSYN ACK
. - An external endpoint, such as a prober, tries to connect to a NAT IP address and port. Cloud NAT doesn't accept unsolicited inbound connections. Entries for these type of connections won't be present in the connection table. So, any received packets will be dropped.
- If you remove NAT IPs from your gateway while NAT connections are still active, then the NAT Mappings become invalid, and these connections are immediately removed from the connection tracking table—any return traffic is dropped.
Before you address the ingress packet drops, confirm if the drops actually impact your application. To confirm, check your application for errors whenever spikes in dropped ingress packets occur.
If the ingress packet drops do impact your application, try using the following techniques to address the issue:
- Use keepalive mechanisms in your application, so that long-running connections can stay open for a longer period.
- Increase the value for TCP Transitory Connection Idle Timeout, so that external endpoints that receive traffic (initiated by Google Cloud resources) through a Cloud NAT gateway get more time to respond and to establish the connection.
- Increase the value for TCP Established Connection Idle Timeout if you have significantly decreased the default value.
Need to allocate more IP addresses
Sometimes your VMs are unable to reach the internet because you don't have enough NAT IP addresses. Multiple factors can cause this problem. For more information, see the following table.
Root cause | Symptom | Solution |
---|---|---|
You've manually allocated addresses, but you haven't allocated enough of them, given your current port usage. |
|
Do one of the following:
|
You have surpassed a hard limit for NAT IP addresses. |
|
|
To monitor failures caused by an insufficient number of IP addresses,
create an alert for the
nat_allocation_failed
metric. This metric is set to true
if Google Cloud is unable to
allocate sufficient IP addresses for any VM in your NAT gateway. For
information about alert policies, see
Defining alerting policies.
Reduce your port usage
You can minimize the number of ports that each VM uses in situations where allocating more NAT IP addresses is not possible or desirable.
To reduce port usage, complete the following steps:
Disable Endpoint-Independent Mapping.
Enable dynamic port allocation. To use dynamic port allocation, you set a minimum number of ports per VM and a maximum number of ports per VM. Cloud NAT automatically allocates a number of NAT source IP address and source port tuples between the minimum and maximum number of ports, inclusive. Using a low number for the minimum number of ports reduces wasting NAT source IP address and source port tuples on VMs with fewer active connections. If you encounter connection timeouts while ports are being allocated, see Reduce packet drops with dynamic port allocation.
Determine the lowest possible minimum number of ports to meet your needs. There are several methods to do this, and most rely on reviewing the number of used ports (
compute.googleapis.com/nat/port_usage
) as input to the decision-making process. For information about how to find port usage, see View port usage. The following are two example methods to determine a minimum number of ports:- Consider the average value of
compute.googleapis.com/nat/port_usage
over a representative time period for a representative number of VMs. - Consider the most frequently occurring value of
compute.googleapis.com/nat/port_usage
over a representative time period for a representative number of VMs.
- Consider the average value of
Determine the lowest possible maximum number of ports to meet your needs. Once again, review
compute.googleapis.com/nat/port_usage
as input to your decision-making process. Consider the maximum value ofcompute.googleapis.com/nat/port_usage
over a representative time period for a representative number of VMs as a starting point for the maximum number of ports. Keep in mind that setting the maximum number too high can prevent other VMs from receiving NAT source IP address and source port tuples.Finding the right values for minimum and maximum ports involves iterative testing. For steps to change minimum and maximum port numbers, see Change minimum or maximum ports when dynamic port allocation is configured.
Review the NAT timeouts, their meanings, and their default values. If you need to rapidly create a series of TCP connections to the same destination 3-tuple, consider reducing the TCP time wait so that Cloud NAT can more quickly re-use NAT source IP address and source port tuples. This allows Cloud NAT to more quickly use the same 5-tuple instead of needing to use a unique 5-tuple, which might require allocation of additional NAT source IP address and source port tuples for each sending VM. For steps to change NAT timeouts, see Change NAT timeouts.
Frequently asked questions
Regional restriction for Cloud NAT
Can I use the same Cloud NAT gateway in more than one region?
No. A Cloud NAT gateway cannot be associated with more than one region, VPC network, or Cloud Router.
If you need to provide connectivity for other regions or VPC networks, create additional Cloud NAT gateways for them.
Are the external NAT IP addresses used by Cloud NAT gateways global or regional?
Cloud NAT gateways use regional external IP addresses as NAT IP addresses. Even though they are regional, they are publicly routable. For information about different ways that NAT IP addresses can be allocated or assigned, see NAT IP addresses.
When Cloud NAT can and cannot be used
Does Cloud NAT apply to instances, including GKE node VMs, that have external IP addresses?
Generally, no. If the network interface of a VM has an external IP address, Google Cloud always performs 1-to-1 NAT for packets sent from the primary internal IP address of the network interface without using Cloud NAT. However, Cloud NAT could still provide NAT services to packets sent from alias IP address ranges of that same network interface. For additional details, see Cloud NAT specifications and Compute Engine interaction.
Does Public NAT let a source VM whose network interface lacks an external IP address send traffic to a destination VM or load balancer that has an external IP address, even when the source and destination are in the same VPC network?
Yes. The network path involves sending traffic out of the VPC network through a default internet gateway, and then receiving it in the same network.
When the source VM sends a packet to the destination, Public NAT performs source NAT (SNAT) before delivering the packet to the second instance. Public NAT performs destination NAT (DNAT) for responses from the second instance to the first. For a step-by-step example, see Basic Public NAT configuration and workflow.
Can I use Private NAT for communication between VMs in the same VPC network?
No, Private NAT doesn't perform NAT on traffic between VMs in the same VPC network.
Unsolicited incoming connections not supported
Does Cloud NAT allow for inbound connections (for example, SSH) to instances without external IP addresses?
No, Cloud NAT does not support unsolicited incoming connections.
For more information, see
Cloud NAT specifications.
However, Google Cloud's network edge might respond to pings if the
destination IP address is a Cloud NAT gateway external IP address that
has active port mappings to at least one VM instance. To see IP addresses
assigned to a Cloud NAT gateway, use the
gcloud compute routers get-nat-ip-info command.
External IP addresses marked as IN_USE
might respond to pings.
If you need to connect to a VM that doesn't have an external IP address, see Choose a connection option for internal-only VMs. For example, as part of the Cloud NAT example Compute Engine setup, you connect to a VM without an external IP address by using Identity-Aware Proxy.
Cloud NAT and ports
Why does a VM have a fixed number of ports (64
by default)?
When a Cloud NAT gateway provides NAT for a VM, it reserves source address and source port tuples according to the port reservation procedure.
For more information, see port reservation examples.
Can I change the minimum number of ports reserved for a VM?
Yes. You can increase or decrease the minimum number of ports per VM when you create a new Cloud NAT gateway or by editing it later. Each Cloud NAT gateway reserves source address and source port tuples according to the port reservation procedure.
For additional information about decreasing the minimum number of ports, see the next question.
Can I decrease the minimum number of ports per VM after creating the Cloud NAT gateway?
Yes; however, decreasing the minimum number of ports could result in the port reservation procedure reserving a smaller number of ports per VM. When this happens, existing TCP connections might be reset and, if so, must be re-established.
When switching NAT mapping from Primary and Secondary ranges to Primary range only, are additional ports allocated to each instance immediately released?
No. Any additional ports used by secondary ranges are retained by instances until the minimum ports per VM setting is reduced. When Cloud NAT is configured to map Secondary (alias) ranges for subnets, Cloud NAT assigns a minimum of 1,024 ports per instance, based on the port reservation procedure.
By switching to Primary ranges only, Cloud NAT conserves those additional allocated ports for instances that have already had those ports assigned. After changing the ranges for which Cloud NAT is applied to Primary only, the actual number of ports assigned to those instances is not changed until the minimum ports per VM setting is also reduced.
To reduce the amount of ports allocated to those instances, after switching to primary ranges, the minimum ports per VM setting must be reduced. After that value is reduced, Cloud NAT automatically adjusts the number of ports allocated per instance down, which reduces port consumption.
Cloud NAT and other Google services
Does Cloud NAT enable access to Google APIs and services?
When you enable Cloud NAT for a subnet's primary IP range, Google Cloud automatically enables Private Google Access. For more information, see Private Google Access interaction.