So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
Posted Feb 7, 2024 18:36 UTC (Wed) by Cyberax (✭ supporter ✭, #52523)Parent article: So you think you understand IP fragmentation?
In my perfect world, I'd have added two fields in the IP header, one for the forward MTU and one for the reflected MTU. Each router inspects the forward MTU and replaces it with its own MTU, if it's less than the one that is already there. Then the target host simply copies the resulting value into the "reflected MTU" field and sends it back with the next reply.
Done. MTU can be easily discovered within just one RTT.
Also, instead of fragmenting the packet or sending ETOOBIG, the routers should just truncate the packet and let it reach the destination. No need for ICMP.
Sadly, this is now all just random musings. IPv6 is a failure set in stone.
Posted Feb 7, 2024 19:01 UTC (Wed)
by vadim (subscriber, #35271)
[Link] (7 responses)
That sounds like a recipe for breaking almost everything.
Old fashioned protocols like cleartext SMTP will suddenly have bizarre failures, and parts of emails will randomly vanish into the ether.
Other protocols like SSL and SSH will return obscure errors nobody has seen before, or exhibit behaviors like waiting forever for data that doesn't arrive.
Downloads will be randomly corrupted.
IOT and other restricted devices will malfunction in very hard to debug ways.
Posted Feb 7, 2024 20:30 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
On the other hand, in-band MTU signalling would allow VPN protocols to more easily identify the flow that needs corrective actions (MTU clamping).
Posted Feb 7, 2024 22:04 UTC (Wed)
by shemminger (subscriber, #5739)
[Link] (5 responses)
Posted Feb 7, 2024 22:07 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 7, 2024 23:04 UTC (Wed)
by jengelh (subscriber, #33263)
[Link] (3 responses)
And it so happens IPv6 has done away with not only (v4-style auto)fragmentation, but also with the IP-level checksum, so YMMV.
Posted Feb 8, 2024 0:44 UTC (Thu)
by pizza (subscriber, #46)
[Link] (2 responses)
Unless you truncate the IPv6 packet smaller than its header length, truncating the IPv6 packet isn't going to cause processing problems.
Meanwhile, the IP payload (eg TCP or UDP) already provides its own checksum that will fail if it gets truncated.
Either way, truncating the packet isn't going to allow the application to receive garbage data.
Posted Feb 9, 2024 13:52 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (1 responses)
However, the UDP header also contains … surprise … a length. As long as you don't send IPv6 jumbograms (length word: zero) you're thus still safe there.
Posted Feb 9, 2024 14:20 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
Even IPv6 jumbograms have a length field, in the Jumbo Payload hop-by-hop header. This is a 32-bit number instead of a 16 bit number, but if you have the full IPv6 header, you'll get a length field to work with (either 16 bits, if not jumbo-sized, or 32 if jumbo-sized).
Posted Feb 8, 2024 8:55 UTC (Thu)
by intelfx (subscriber, #130118)
[Link] (19 responses)
It’s a neat idea, but does it justify paying the extra space cost in each and every packet?
Posted Feb 8, 2024 13:13 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (18 responses)
Correctly implementing this mechanism would also unlock larger packets. We no longer would be limited by just 1500 bytes.
Posted Feb 8, 2024 13:20 UTC (Thu)
by paulj (subscriber, #341)
[Link] (2 responses)
Posted Feb 8, 2024 21:33 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Feb 9, 2024 11:09 UTC (Fri)
by paulj (subscriber, #341)
[Link]
Posted Feb 9, 2024 11:50 UTC (Fri)
by intelfx (subscriber, #130118)
[Link] (14 responses)
I thought that we are limited by 1500 bytes because the Internet equipment does not support / is not configured to support larger packets. Even if you implement perfect discovery, that won't magically fix the equipment, no?
Posted Feb 9, 2024 12:22 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (10 responses)
We have a chicken-and-egg situation; the equipment is not configured to support larger packets because path MTU detection is not reliable, fragmentation is not reliable, and the failure case if both of those don't work is random failures of higher level protocols like TCP. This means that there is no reason for anyone to support a larger MTU on Internet-facing equipment, since you're likely to have issues where a required path between two points drops large MTU packets.
In addition, if your MTU is too large, you will frequently experience an RTT delay where something on the path sends an ICMP Too Big your way, and you have to reduce the detected path MTU; in Cyberax's proposal, you can determine the current path MTU with small packets (like those used to establish a TCP connection), and thus not pay that penalty unless the path is changing from larger MTU to smaller MTU during the lifetime of a single connection.
And it's worth noting that we already have examples of devices where there's a large MTU at PHY level, and we aggregate MAC level packets to fill a single PHY packet; it's called WiFi. Having a good way to handle variable MTU (which would include WiFi APs being able to change the path MTU on you, because the client has moved) would reduce overheads. But this needs not just Cyberax's idea, but also a change from switching Ethernet frames to routing IP packets everywhere, so that the WiFi AP is expected to know about per-device MTUs.
Posted Feb 9, 2024 16:06 UTC (Fri)
by paulj (subscriber, #341)
[Link] (6 responses)
Posted Feb 9, 2024 17:12 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (5 responses)
I last saw fragmentation being reliable in the 1980s; my experience ever since then with IPv4 is that it's hugely unreliable, because all sorts of entities use middleboxes that drop all fragments (rather than forwarding them, or reassembling then forwarding), and thus it's basically useless.
Posted Feb 9, 2024 17:21 UTC (Fri)
by paulj (subscriber, #341)
[Link] (4 responses)
The data-plane level support was fine though, until IETF moved to deprecate, and then vendors of course did.
Posted Feb 9, 2024 17:34 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (3 responses)
My experience was that they were always present, and become more and more of an issue throughout the 90s, until they were basically making fragmentation unusable unless the path was either between two academic institutions or between my ISP at the time and an academic institution.
Additionally, long before middleboxes became widespread, the dataplane support already sucked; there were plenty of Cisco routers that could do forwarding in hardware, but did fragmentation in software on a slow path. Not a problem from home, where my modem was the bottleneck, but a very noticeable issue when at an academic institution where the "wrong" MTU could bring speeds down from megabits per second to tens of kilobits per second.
Posted Feb 9, 2024 17:39 UTC (Fri)
by paulj (subscriber, #341)
[Link] (2 responses)
Slow but working beats the mess we have today: We will never be able to default to >1500 MTUs, and even then we still don't have reliable networking (VPNs, etc.), and because of that the awesome networking tool of encapsulation is restricted in utility.
Posted Feb 9, 2024 17:40 UTC (Fri)
by paulj (subscriber, #341)
[Link]
Posted Feb 9, 2024 17:58 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
By 1990, it was already the case IME that communication was not possible if there were smaller MTUs in the path, unless you were lucky enough to have a path where everything was run by sensible netadmins (usually true of academia), or you were on dial-up (where you had the bottleneck MTU).
And one of the many issues back then was routers with multi-MTU paths that were configured explicitly to not fragment packets because it could overload the CPU; packets were either pre-fragmented, or were dropped. Add in people configuring routers to drop fragments "because security" (which got worse after the ping of death vulnerability was discovered, since that depended on buggy fragment handling), and fragmentation became useless.
The IETF, by limiting fragmentation to the endpoints, were reacting to the state of play in 1990, where many routers already didn't fragment, but dropped packets that were too big.
Posted Feb 9, 2024 16:11 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Not necessarily. There are two ways this can be done without significant changes:
1. APs can just update the "forward MTU" field in IP packets, they don't need to be full routers for this. Yeah, it's a layering violation, but I doubt that people care too much about that sort of thing anymore.
2. MTU can be added to ARP/ND directly. So the sender will discover the L2 MTU of the destination when it does the initial L2 discovery. WiFi APs are responsible for ARP/ND already, so it even fits in well into the "proper" layered model.
Also, how do APs handle jumbo frames? I need to do some experiments...
Posted Feb 9, 2024 17:03 UTC (Fri)
by pizza (subscriber, #46)
[Link] (1 responses)
Based on my (admittedly not recent), they don't handle it well. Indeed, despite wifi nominally supporting 2300ish byte MTUs, APs routinely fail with anything over 1500 bytes. Because reasons.
(That is is the main reason why I reverted back to 1500 byte MTUs on my home networks....)
Posted Feb 9, 2024 17:22 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
I have 9k MTU within my home network (and I get 10500 megabits over 10GB connections), and so far my WiFi has been behaving pretty well. I can get 1.5GBps download and 2.5GBps upload.
Posted Feb 9, 2024 16:05 UTC (Fri)
by paulj (subscriber, #341)
[Link] (2 responses)
See my blog link in another comment in this article. It has quotes from an early paper on TCPIP from Kahn and Cerf explaining why it is important to have a reliable network mechanism to allow different MTU networks to inter-op. Unfortunately, we - collectively - failed to heed their wise words.
A reliable mechanism needs to be in-band. E.g., data-plane fragmentation. Side-band end-host solutions - i.e., relying on ICMP messages - have proven to be fragile. Pure end-host probing (i.e. Path-MTU Discovery, in protocol or out) is also inefficient, temporally unreliable, and fragile.
Posted Feb 9, 2024 17:00 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Except for WiFi. Its PHY MTU is just 2300 bytes.
Posted Feb 9, 2024 17:40 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
That's the MAC MTU; the PHY MTU can be as large as 2,097,148 bytes in 802.11ac networks (noting that the PHY MTU depends both on static parameters like channel width, but also dynamic parameters like time it will take to transmit the frame). For 802.11ax, the PHY MTU is permitted to go as high as 6,500,631 bytes. Even as early as 802.11n (in 2009), the PHY MTU was allowed to go as large as 65,536 bytes under good conditions.
This is made useful with a much smaller MAC MTU by having aggregation options, so that a single PHY frame contains many MAC frames; the downside is that there is overhead for each and every MAC frame in the PHY frame, which would go down if the MAC frames were larger. There would still be overhead mapping MAC frames into PHY frames, so you wouldn't have as large a MAC MTU as the PHY MTUs, but there would be large MTUs involved.
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
And truncated packets show up as other types of errors in counters.
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?
So you think you understand IP fragmentation?