I once had an issue where on Kubernetes cluster on a local machine, Alpine containers wouldn’t work at WeWork offices because:
WeWork had DHCP configured to do a .wework.com suffix check for DNS lookups; they’d presumably mostly fail as non-existent unless you were the unlucky person who decided on www or something as the hostname… (it may’ve been .internal.wework.com, been a while).
The global DNS server for wework.com did some weird thing where it returned an invalid empty IPv6 record or some nonsense.
This combined with the k8s internal DNS broke lookups when using musl.
The musl developers helped me debug this, and concluded (quite reasonably) that they won’t support every clearly broken configuration in the wild. Eventually WeWork fixed either problem 1 or 2, or both, but by then I’d just stopped using Alpine containers at all.
Given their design criteria, it seems like the musl developers are making reasonable decisions. At the same time, their design criteria mean you have much higher chances of having extra real-world problems. Since I generate enough problems with my own code, I’d rather just stick to glibc.
even if you manage to build an image that includes for example numpy, its size will be ~400MB, at which point using Alpine for its small size doesn’t really help much.
Maybe you should try properly isolating your build step and your runtime images like everyone else ^1? Most of that size is almost certainly coming from the build dependencies that aren’t necessary once you’ve compiled the package for your environment. I get that Python devs (and I’m predominantly a Python dev) aren’t used to thinking about compilation, but there are certainly tools to deal with this exact issue and it’s completely unfair to blame the maintainers of a distro and libc for your laziness or naivety.
I don’t think that’s the issue - you quoted part of it. Alpine image is small, but there are other deps that make it extra large, and then you’re not really saving that much. Yes, multi-stage builds help, but not always.
And the main reason I comment is that I don’t think the author is “blaming the maintainers of a distro and libc” for their laziness or naivety, as you put it. They say it themselves explicitly, “I won’t use it because of the DNS issue”. Not, “you suck because DNS over TCP”. The size is just a side problem, especially so for people and teams that potentially do not have the expertise to build multi-stage containers and optimize to the last byte.
is the problem really Alpine, or musl? i mean yea, Alpine uses musl, but it’s even mentioned in the article that DNS over TCP isn’t enabled by design, why not explore that a little more in the article?
It’s a flaw in musl, but using musl outside Alpine is … extremely rare, as far as I can tell.
The real question in my mind is why people continue to use musl and Alpine when it has such a serious flaw in it. Are they unaware of the problem, or do they just not care?
I don’t know that I’d call it a “flaw” rather than a “design choice”.
The DNS APIs in libc (getaddrinfo and gethostbyname) are poorly designed for the task of resolving DNS names (they are blocking and implementation-defined). musl implements these in a simple manner for simple use cases, but for anything more involved the recommendation of the musl maintainers is to use a dedicated DNS resolver library.
This article goes into a bit more depth, but at the end of the day I think it’s a reflection of the different philosophy behind musl more generally (which is why I call it a “design choice” instead of a “flaw”).
Better is different doesn’t imply that different is better. The getaddrinfo function is the only moderately good way of mapping names to hosts without embedding knowledge of the lookup mechanism in the application. Perhaps a modern Linux system could have a DBUS service to do this, but that would add a lot more to containers (if containers had a sane security model, this is how it would work, code outside the container would do the lookup, and the container would not be able to create sockets except by asking this service, but I digress).
The suggestion to use a DNS library misses the point: DNS should be an implementation detail. The application should not know if the name is resolved via a hosts file, a DNS, WINS, or something custom for micro service deployments. The decision on Alpine means that you need to encode that as custom logic in every program.
The decision on Alpine means that you need to encode that as custom logic in every program.
I think that’s a bit dramatic. Most applications won’t do a query that returns a DNS response bigger than 512 bytes because setting up TCP takes at least three times as longer than the UDP response, and that pisses off most users, so most sites try to make sure this isn’t necessary to show a website to sell people things, so very very few people outside of the containerverse will ever see it happen.
Most applications just do a gethostbyname and connect to whatever the first thing is. There’s no reason for that to take more than 512 bytes, and so it’s hard to lament: Yes yes, if you want 200 IP addresses for your service, you’ll need more than 512 byte packets, but 100 IP addresses will fit, and I absolutely wonder about the design of a system that wants to use gethostbyname to get more than 100 IP addresses.
The reason why, is because gethostbyname isn’t parallel, so an application that wants to use it in parallel service will need to use threads. Many NSS providers behave badly when threaded, so desktop applications that want to connect to multiple addresses in parallel (e.g. the happy eyeballs protocol used by chrome, firefox, curl, etc) avoid the NSS api completely and either implement DNS directly or use a non-blocking DNS client library.
Most applications won’t do a query that returns a DNS response bigger than 512 bytes
Most software that I’ve written that does any kind of name lookup takes address inputs that are not hard coded into the binary. As a library or application developer, I don’t know the maximum size of a host or domain name that users of my code are going to use. I don’t know if they’re going to use DNS at all, or whether they’re going to use host files, WINS via Samba, or something else. And the entire point of NSS is that I don’t have to know or care. If they want to use some exciting Web3 Blockchain nonsense that was invented after I wrote my code for looking up hosts, they can as long as they provide an NSS plugin. If I have to care about how host names provided by the user are mapped to network addresses as a result of using your libc, your libc has a bug.
Most applications just do a gethostbyname and connect to whatever the first thing is.
Hopefully not, anything written in the last 20 years should be using getaddrinfo and then it doesn’t have to care what network protocol it’s using for the connection. It may be IPv6, it may be something legacy like IPX (in which case the lookup definitely won’t be DNS!), it may be something that hasn’t been invented yet.
The reason why, is because gethostbyname isn’t parallel, so an application that wants to use it in parallel service will need to use threads.
That is a legitimate concern, and I’d love to see an asynchronous version of getaddrinfo.
As a library or application developer, I don’t know the maximum size of a host or domain name that users of my code are going to use.
Yes you do, because we’re talking about Alpine and Alpine use-cases, and in those use-cases where it tunnels DNS into the NSS API. RFC 1035 is clear on this. It’s 250 “bytes”.
There’s absolutely nothing you or any of your users who are using Alpine can do on a LAN serving a single A or AAAA record to get over 512 bytes.
It may be IPv6, it may be something legacy like IPX (in which case the lookup definitely won’t be DNS!),
No it won’t be IPX because we’re talking about Alpine and Alpine use-cases. Alpine users don’t use IPX.
it may be something that hasn’t been invented yet.
No it won’t. That’s not how anything works. First you write the code, then you can use it.
Yes you do, because we’re talking about Alpine and Alpine use-cases
I don’t write my code for Alpine, I write it to work on a variety of operating systems and on a variety of use cases. Alpine breaks it. I would hazard a guess that the amount of code written specific targeting Alpine, rather than targeting Linux/POSIX and being run on Alpine, is a rounding error above zero.
I do not write my code assuming that the network is IPv4 or IPv6. I do not write my code assuming that the name lookup is a hosts file, that it’s DNS, WINS, or any other specific mechanism. I write my code over portable abstractions that let the user select the name resolution mechanism and let the name resolution mechanism select to transport protocol.
No it won’t. That’s not how anything works. First you write the code, then you can use it.
That is literally how the entire Berkeley socket API was designed: to allow code to be written without any knowledge of the network protocol and to move between them as required. This is how you wrote code 20-30 years ago that worked over DECNET, IPX, AppleTalk, or IP. The getaddrinfo function was only added about 20 years ago, so is relatively young, but added host resolution to this. Any code that was written using it and the rest of the sockets APIs was able to move to IPv6 with no modification (or recompile), to support mDNS when that was introduced, and so on.
These APIs were specifically designed to be future proof, so that when a new name resolution mechanism came along (e.g. DNS over TCP), or a new transport protocol, it could be transparently supported. If a new name lookup mechanism using a distributed consensus algorithm instead of hierarchical authority comes along, code using these APIs will work on any platform that decides that name resolution mechanism is sensible. If IPv7 comes along, as long as it offers stream and datagram connections, any code written using these APIs will be able to adopt it as soon as the kernel does, without a recompile.
Can you show a single example of a real-world environment that is broken by what Alpine is doing, and that isn’t some idiot trying to put more than one or two addresses in a response?
I don’t know if I agree or disagree with anything else you’re trying to say. I certainly would never say Alpine is “broken” because its telnet can’t reach IPX hosts on my lan, but you can’t be complaining about that because that’d be moronic. Some of the futuristic protocols you mention sound nice, but they can tunnel their responses in DNS too and will work on Alpine just fine. If you don’t want to use Alpine, don’t use Alpine, but switching to it saved me around 60gb of ram, so I was willing to make some changes to support Alpine. This is not one of the changes I had to make.
You have no good options for DNS on Linux. You can’t static link the glibc resolver, so you can either have your binaries break every time anything changes, or use musl and have very slightly broken DNS.
There are some standalone DNS libraries but they’re enormous and have nasty APIs and don’t seem worth using.
There are a great many things I dislike about glibc, but binary compatibility is one thing that they do exceptionally well. I think glibc was the first library to properly adopt ELF symbol versioning and their docs are what everyone else refers to. If they need to make an ABI-breaking change, they add new versioned symbols and keep the old ones. You can easily run a binary that was created with a 10-year-old glibc with the latest glibc shared object. As I recall, the last time glibc broke backwards binary compat was when they introduced symbol versioning.
The glibc resolver is site-specific, which means it’s designed to be customised by the system administrator and static-linking would prevent it’s primary use-case. It also has nothing to do with DNS except that it ships with a popular “fallback” configuration is to try looking up hosts on the Internet if they aren’t managed by the local administrator.
you can either have your binaries break every time anything changes
Nonsense: I upgrade glibc and my existing binaries still work. You’re doing something else wrong.
glibc has a quite stable ABI - it’s a major event when it breaks any sort of backwards compatibility. Sure, it’s not as stable as the Linux userspace ABI, but it’s still extremely rare to encounter breakage.
The real question in my mind is why people continue to use musl and Alpine when it has such a serious flaw in it. Are they unaware of the problem, or do they just not care?
I suppose I don’t care. I might even think of it as an anti-feature: I don’t want my kubernetes containers asking the Internet where my nodes are. It’s slow, it’s risky, and there’s no point, Kubernetes already has perfect knowledge anyway:
If you bind-mount an /etc/hosts file (or hosts.dbm or hosts.sqlite or whatever) that would be visible instantly to every client. This is a trivial controller that anyone can put in their cluster and it solves this “problem” (if you think it’s a problem) and more:
DNS introduces extra failure-modes I don’t need and don’t want, and having /etc/resolv.conf point to trash allows me to effectively prevent containers from doing DNS. DNS can be used to smuggle information in-and-out of the container, so having a policy of no-DNS-but-Internet-DNS makes audit much easier.
I’ve seen people suggest that installing bind-tools with apk will magically solve the problem, but this doesn’t make sense to me… unless there’s some fallback to using host or dig for DNS lookups… ?
BUT, it’s really odd to me that seemingly so many people use Alpine for smaller containers, but no one has bothered to fix the issue. Have people “moved on”? Is there another workaround people use?
I have a shell script that whips up a hosts file and bind-mounts it into the container. This prevents all DNS lookups (and failure cases), is faster, and allows me to disable DNS on any container that doesn’t need access to the Internet (a cheap way to slow down attackers). It uses kubectl get -w to wait for updates so it isn’t spinning all the time.
I can’t think the only advantage of Kubernetes abusing DNS for service discovery, and maybe there is one with Windows containers or something else I don’t use, but there are substantial performance and security disadvantages that I don’t even bother with it.
Would like to see some more technical exposition to understand why the DNS issue “can only happen in Kubernetes” and if it’s the fault of musl, or kubernetes, or the DNS nodes that for some reason require TCP. Natanael has a talk about how running musl can help make upstream code better, by catching things that depend on GNU-isms without being labeled as such.
I also wonder where the author gets the confidence to say “if your application requires CGO_ENABLED=1, you will obviously run into issue with Alpine.”
Not sure if it’s fixed, but there were other gotchas in musl related to shared libraries last time I looked. Their dlclose implementation is a no-op, so destructors will not run when you think you’ve unloaded a library, which can cause subtly wrong behaviour including memory leaks and state corruption. I hacked a bit on musl for another project a couple of years ago and it felt like a perfect example of the 90:10 rule: they implement the easy 90% without really understanding why the remaining difficult 90% is there and why people need it.
Oh, and on x86 platforms they ship a spectacularly bad assembly memcpy that performs worse than a moderately competent C one on any size under about 300 bytes (around 90% of memcpy calls, typically).
Natanael has a talk about how running musl can help make upstream code better, by catching things that depend on GNU-isms without being labeled as such.
Expecting getaddrinfo to work reliably isn’t a GNUism, it’s a POSIXism. Code that uses it to look up hosts that require DNS over TCP to resolve will work on GNU/Linux, Android, Darwin, *BSD, and Solaris.
They happen in Kubernetes if you use DNS for service discovery.
On the Internet, DNS uses UDP. RFC 1123 was really clear about that. It could use TCP, but Internet hosts typically didn’t because DNS responses that don’t fit in one packet require more than one packet, and that takes more time leading to a lower-quality experience, so people just turned it off. How much time depends mostly on the speed of light and the distance the packets need to travel, so we can use a random domain name to measure circuit length:
$ time host foiioj.google.com
Host foiioj.google.com not found: 3(NXDOMAIN)
real 0m0.103s
user 0m0.014s
sys 0m0.015s
Once being “off” was ubiquitous, DNS client implementations started showing up that didn’t bother with the TCP code that they would never use, and musl is one of these.
Kubernetes (ab)uses the DNS protocol for service discovery in most reference implementations, but the distance between nodes is typically much less than 1000 miles or so, so you aren’t going to notice the time-delay so much between one packet and five. As a result, when something goes wrong, people blame the wrong-thing that isn’t in most of those reference implementations (in this case, musl).
I use /etc/hosts for service discovery (and a shell script that builds it for all the containers from the output of kubectl get …) which is faster still, and reduces the number of partitions which can make tracking down some intermittent problems easier.
Natanael has a talk about how running musl can help make upstream code better, by catching things that depend on GNU-isms without being labeled as such.
This is a good point: If your application calls gethostbyname or something, what’s it going to do with more than 512 bytes of output? The most common reason seems to be people who use DNS to get everything implementing a service or sharing a label. Some of those are just displaying the list (on say a service dashboard), and for them, why not just ask the Kubernetes REST API? Who knows.
But others are doing this because they don’t know any better: If you get five responses and are only going to connect() to one you’ve made a design mistake and you might not notice unless you use Alpine!
depends mostly on the speed of light and the distance the packets need to travel
This reminded me of this absolute gem and must-read story from ancient computer history, of how people can’t send emails to people more then 520 miles away.
https://web.mit.edu/jemorris/humor/500-miles
You can use TCP_INFO to extract the RTT between the parts of the TCP handshake and use it to make firewall rules that block connections from too far away.
This works well for me since I generally know (geographically) where I am and where I will be, but people attacking my systems are going to be anywhere, probably on a VPN which hides their location (and makes their RTT longer)
The world appears split down the line of “Thrilled to understand the entire stack” and “Too busy to understand the entire stack.”
The complain-about-alpine posts all seem to rag on the topics of “My code won’t act the same on both platforms” forgetting that, yes, muslc is a different platform. Some stuff won’t work the same way, some proprietary stuff won’t either… one point of containerization is containing the “badness.” If you need to use glibc for some dependency… go ahead! Use a glibc-based container! This is not a problem and the solution is a feature.
But if you don’t actually need glibc, if you can take the time to figure out why it is your code has a dependency on an unrelated libc, or that libc’s non-standard behaviors (musl is unapologetic about its adherence to POSIX), then your code will likely be more portable and better for it. Realize that your code doesn’t work on the platform, not “the platform doesn’t work.”
This comment written using firefox on Void-Linux x86_64-musl.
I’m looking for a Linux distro to switch from increasingly commercial Ubuntu (ads in motd, ads in apt, apt forcibly replaced with the snap fiasco where I never know if I should use --classic without trial and error). Ideally, I’d like to have the same system on my laptop, VMs and containers. For me, systemd is a downside for Debian and I’ve tried using Alpine recently but it was too time consuming.
After creating a Dockerfile for an alpine-based container with Python, PyTorch and few other things, PyTorch would not import. There’s an over a year old thread[1] about this on Stack Overflow with the same error. Apparently, pthread_attr_setaffinity_np is not available on Alpine. However a Python module attempting to use it exists.
I don’t need bleeding-edge but definitely want to avoid being forced to use outdated software when newest versions work ok. For now I find Alpine inefficient for my work. I think I’ll end up with Debian.
What about GUIX? Being a GNU project, it’s decidedly non-commercial, and there’s plenty of active development so you won’t get outdated packages unless you pin them, and you can easily mix and match different versions. Unlike NixOS (which is tightly linked to systemd), it uses its own init system (GNU shepherd).
Best of all, if you use derivations in your dev projects you won’t even need containers and VMs. But it should run inside those just as well, if you want to use them.
Nit: it’s “Guix”, not “GUIX”. I’ve seen this mistake made a lot – curious, where did you pick up that spelling from? (I don’t mean to criticize you at all for this, FWIW.)
Use Debian. Yes, it has systemd, so what? I use Debian for many years, and I’m happy. (Sometimes there are some driver issues, wi-fi, hibernation, but this is same for every distro). Systemd is cool. It allows me to create systemd unit file, start it and then query systemd whether the service is started. Systemd keeps state. Compare this with sysvinit.
If you really-really hate systemd, consider devuan.
Also, keep in mind one particular problem with debian (possibly applies to all distros): when you run new debian release in docker in old debian release, sometimes everything breaks, such as here: https://github.com/debuerreotype/docker-debian-artifacts/issues/122 . Just add "seccomp-profile": "/etc/docker/seccomp.json" to /etc/docker/daemon.json and put {} to /etc/docker/seccomp.json (I can add more details).
Buildah (unlike Docker) makes it trivial to start with an empty base image. This lets you put the absolute minimum that you need in the container. It also doesn’t create container layers implicitly, so you can do the pip command to build all of the things you need, then the pip command to remove the toolchain and any .o files, and not end up with a load of layers that add the temporary things. And, my personal favourite, it has a copy command that copies from another container, so you can create a container that contains the build tools, build the thing, and then create a new container and (with a single command) copy from the build container to the deployment one.
Ah, you’re right. I misremembered, the missing thing in Docker is the opposite of that: copying out of a container. I don’t think that’s possible with the Docker model, where the Dockerfile is evaluated inside the new container. It is with buildah, where the equivalent is a script that runs outside. I use this a lot for building statically linked tools that I run outside of containers (especially ocaml things that scatter things everywhere during the build), so I can build in a clean environment and then throw it away at the end.
Yeah, Dockerfiles are made with the intention of the image being the artefact. To achieve this with docker you would have to actually run the container and use mounts to copy files out of it.
Afair, at one of my previous jobs we used docker, tini, multi stage builds and just copied the resulting binary plus any required shared libs. It was pretty easy and resulted in minimal images.
The biggest appeal of Alpine is its small size, so if you really care about that, then Wolfi (e.g. cgr.dev/chainguard/wolfi-base is just 12MB) or Distroless are good choices.
I have never used either one, but just spent a little time evaluating whether I wanted to try distroless. I kind of felt like I was signing up for the same class of headache there that I get with alpine. The complete absence of a shell, for one thing in particular, feels like it could bite in a similar way to MUSL. I find alpine very appealing and find distroless similarly so, but I decided that paring down a debian-based image was a better use of my time that was less likely to result in hard-to-troubleshoot runtime behavior later.
I once had an issue where on Kubernetes cluster on a local machine, Alpine containers wouldn’t work at WeWork offices because:
.wework.com
suffix check for DNS lookups; they’d presumably mostly fail as non-existent unless you were the unlucky person who decided onwww
or something as the hostname… (it may’ve been.internal.wework.com
, been a while).wework.com
did some weird thing where it returned an invalid empty IPv6 record or some nonsense.The
musl
developers helped me debug this, and concluded (quite reasonably) that they won’t support every clearly broken configuration in the wild. Eventually WeWork fixed either problem 1 or 2, or both, but by then I’d just stopped using Alpine containers at all.Given their design criteria, it seems like the
musl
developers are making reasonable decisions. At the same time, their design criteria mean you have much higher chances of having extra real-world problems. Since I generate enough problems with my own code, I’d rather just stick toglibc
.Maybe you should try properly isolating your build step and your runtime images like everyone else ^1? Most of that size is almost certainly coming from the build dependencies that aren’t necessary once you’ve compiled the package for your environment. I get that Python devs (and I’m predominantly a Python dev) aren’t used to thinking about compilation, but there are certainly tools to deal with this exact issue and it’s completely unfair to blame the maintainers of a distro and libc for your laziness or naivety.
I don’t think that’s the issue - you quoted part of it. Alpine image is small, but there are other deps that make it extra large, and then you’re not really saving that much. Yes, multi-stage builds help, but not always.
And the main reason I comment is that I don’t think the author is “blaming the maintainers of a distro and libc” for their laziness or naivety, as you put it. They say it themselves explicitly, “I won’t use it because of the DNS issue”. Not, “you suck because DNS over TCP”. The size is just a side problem, especially so for people and teams that potentially do not have the expertise to build multi-stage containers and optimize to the last byte.
is the problem really Alpine, or musl? i mean yea, Alpine uses musl, but it’s even mentioned in the article that DNS over TCP isn’t enabled by design, why not explore that a little more in the article?
It’s a flaw in musl, but using musl outside Alpine is … extremely rare, as far as I can tell.
The real question in my mind is why people continue to use musl and Alpine when it has such a serious flaw in it. Are they unaware of the problem, or do they just not care?
I don’t know that I’d call it a “flaw” rather than a “design choice”.
The DNS APIs in libc (getaddrinfo and gethostbyname) are poorly designed for the task of resolving DNS names (they are blocking and implementation-defined). musl implements these in a simple manner for simple use cases, but for anything more involved the recommendation of the musl maintainers is to use a dedicated DNS resolver library.
This article goes into a bit more depth, but at the end of the day I think it’s a reflection of the different philosophy behind musl more generally (which is why I call it a “design choice” instead of a “flaw”).
“Better is different” means people will get mad at you for trying to make things better. :-)
Better is different doesn’t imply that different is better. The getaddrinfo function is the only moderately good way of mapping names to hosts without embedding knowledge of the lookup mechanism in the application. Perhaps a modern Linux system could have a DBUS service to do this, but that would add a lot more to containers (if containers had a sane security model, this is how it would work, code outside the container would do the lookup, and the container would not be able to create sockets except by asking this service, but I digress).
The suggestion to use a DNS library misses the point: DNS should be an implementation detail. The application should not know if the name is resolved via a hosts file, a DNS, WINS, or something custom for micro service deployments. The decision on Alpine means that you need to encode that as custom logic in every program.
I think that’s a bit dramatic. Most applications won’t do a query that returns a DNS response bigger than 512 bytes because setting up TCP takes at least three times as longer than the UDP response, and that pisses off most users, so most sites try to make sure this isn’t necessary to show a website to sell people things, so very very few people outside of the containerverse will ever see it happen.
Most applications just do a gethostbyname and connect to whatever the first thing is. There’s no reason for that to take more than 512 bytes, and so it’s hard to lament: Yes yes, if you want 200 IP addresses for your service, you’ll need more than 512 byte packets, but 100 IP addresses will fit, and I absolutely wonder about the design of a system that wants to use gethostbyname to get more than 100 IP addresses.
The reason why, is because gethostbyname isn’t parallel, so an application that wants to use it in parallel service will need to use threads. Many NSS providers behave badly when threaded, so desktop applications that want to connect to multiple addresses in parallel (e.g. the happy eyeballs protocol used by chrome, firefox, curl, etc) avoid the NSS api completely and either implement DNS directly or use a non-blocking DNS client library.
Most software that I’ve written that does any kind of name lookup takes address inputs that are not hard coded into the binary. As a library or application developer, I don’t know the maximum size of a host or domain name that users of my code are going to use. I don’t know if they’re going to use DNS at all, or whether they’re going to use host files, WINS via Samba, or something else. And the entire point of NSS is that I don’t have to know or care. If they want to use some exciting Web3 Blockchain nonsense that was invented after I wrote my code for looking up hosts, they can as long as they provide an NSS plugin. If I have to care about how host names provided by the user are mapped to network addresses as a result of using your libc, your libc has a bug.
Hopefully not, anything written in the last 20 years should be using
getaddrinfo
and then it doesn’t have to care what network protocol it’s using for the connection. It may be IPv6, it may be something legacy like IPX (in which case the lookup definitely won’t be DNS!), it may be something that hasn’t been invented yet.That is a legitimate concern, and I’d love to see an asynchronous version of
getaddrinfo
.Yes you do, because we’re talking about Alpine and Alpine use-cases, and in those use-cases where it tunnels DNS into the NSS API. RFC 1035 is clear on this. It’s 250 “bytes”.
There’s absolutely nothing you or any of your users who are using Alpine can do on a LAN serving a single A or AAAA record to get over 512 bytes.
No it won’t be IPX because we’re talking about Alpine and Alpine use-cases. Alpine users don’t use IPX.
No it won’t. That’s not how anything works. First you write the code, then you can use it.
I don’t write my code for Alpine, I write it to work on a variety of operating systems and on a variety of use cases. Alpine breaks it. I would hazard a guess that the amount of code written specific targeting Alpine, rather than targeting Linux/POSIX and being run on Alpine, is a rounding error above zero.
I do not write my code assuming that the network is IPv4 or IPv6. I do not write my code assuming that the name lookup is a hosts file, that it’s DNS, WINS, or any other specific mechanism. I write my code over portable abstractions that let the user select the name resolution mechanism and let the name resolution mechanism select to transport protocol.
That is literally how the entire Berkeley socket API was designed: to allow code to be written without any knowledge of the network protocol and to move between them as required. This is how you wrote code 20-30 years ago that worked over DECNET, IPX, AppleTalk, or IP. The getaddrinfo function was only added about 20 years ago, so is relatively young, but added host resolution to this. Any code that was written using it and the rest of the sockets APIs was able to move to IPv6 with no modification (or recompile), to support mDNS when that was introduced, and so on.
These APIs were specifically designed to be future proof, so that when a new name resolution mechanism came along (e.g. DNS over TCP), or a new transport protocol, it could be transparently supported. If a new name lookup mechanism using a distributed consensus algorithm instead of hierarchical authority comes along, code using these APIs will work on any platform that decides that name resolution mechanism is sensible. If IPv7 comes along, as long as it offers stream and datagram connections, any code written using these APIs will be able to adopt it as soon as the kernel does, without a recompile.
Can you show a single example of a real-world environment that is broken by what Alpine is doing, and that isn’t some idiot trying to put more than one or two addresses in a response?
I don’t know if I agree or disagree with anything else you’re trying to say. I certainly would never say Alpine is “broken” because its telnet can’t reach IPX hosts on my lan, but you can’t be complaining about that because that’d be moronic. Some of the futuristic protocols you mention sound nice, but they can tunnel their responses in DNS too and will work on Alpine just fine. If you don’t want to use Alpine, don’t use Alpine, but switching to it saved me around 60gb of ram, so I was willing to make some changes to support Alpine. This is not one of the changes I had to make.
You have no good options for DNS on Linux. You can’t static link the glibc resolver, so you can either have your binaries break every time anything changes, or use musl and have very slightly broken DNS.
There are some standalone DNS libraries but they’re enormous and have nasty APIs and don’t seem worth using.
There are a great many things I dislike about glibc, but binary compatibility is one thing that they do exceptionally well. I think glibc was the first library to properly adopt ELF symbol versioning and their docs are what everyone else refers to. If they need to make an ABI-breaking change, they add new versioned symbols and keep the old ones. You can easily run a binary that was created with a 10-year-old glibc with the latest glibc shared object. As I recall, the last time glibc broke backwards binary compat was when they introduced symbol versioning.
The glibc resolver is site-specific, which means it’s designed to be customised by the system administrator and static-linking would prevent it’s primary use-case. It also has nothing to do with DNS except that it ships with a popular “fallback” configuration is to try looking up hosts on the Internet if they aren’t managed by the local administrator.
Nonsense: I upgrade glibc and my existing binaries still work. You’re doing something else wrong.
glibc
has a quite stable ABI - it’s a major event when it breaks any sort of backwards compatibility. Sure, it’s not as stable as the Linux userspace ABI, but it’s still extremely rare to encounter breakage.I suppose I don’t care. I might even think of it as an anti-feature: I don’t want my kubernetes containers asking the Internet where my nodes are. It’s slow, it’s risky, and there’s no point, Kubernetes already has perfect knowledge anyway:
If you bind-mount an /etc/hosts file (or hosts.dbm or hosts.sqlite or whatever) that would be visible instantly to every client. This is a trivial controller that anyone can put in their cluster and it solves this “problem” (if you think it’s a problem) and more:
DNS introduces extra failure-modes I don’t need and don’t want, and having
/etc/resolv.conf
point to trash allows me to effectively prevent containers from doing DNS. DNS can be used to smuggle information in-and-out of the container, so having a policy of no-DNS-but-Internet-DNS makes audit much easier.I’ve seen people suggest that installing
bind-tools
with apk will magically solve the problem, but this doesn’t make sense to me… unless there’s some fallback to usinghost
ordig
for DNS lookups… ?BUT, it’s really odd to me that seemingly so many people use Alpine for smaller containers, but no one has bothered to fix the issue. Have people “moved on”? Is there another workaround people use?
I have a shell script that whips up a hosts file and bind-mounts it into the container. This prevents all DNS lookups (and failure cases), is faster, and allows me to disable DNS on any container that doesn’t need access to the Internet (a cheap way to slow down attackers). It uses
kubectl get -w
to wait for updates so it isn’t spinning all the time.I can’t think the only advantage of Kubernetes abusing DNS for service discovery, and maybe there is one with Windows containers or something else I don’t use, but there are substantial performance and security disadvantages that I don’t even bother with it.
Would like to see some more technical exposition to understand why the DNS issue “can only happen in Kubernetes” and if it’s the fault of musl, or kubernetes, or the DNS nodes that for some reason require TCP. Natanael has a talk about how running musl can help make upstream code better, by catching things that depend on GNU-isms without being labeled as such.
I also wonder where the author gets the confidence to say “if your application requires CGO_ENABLED=1, you will obviously run into issue with Alpine.”
My application requires CGO_ENABLED=1, and I ran into this issue with Alpine: https://github.com/golang/go/issues/13492
TLDR: Cgo + musl + shared objects = a bad time
That’s really more of a reliance on glibc rather than a problem with musl. musl is explicitly not glibc.
Not sure if it’s fixed, but there were other gotchas in musl related to shared libraries last time I looked. Their dlclose implementation is a no-op, so destructors will not run when you think you’ve unloaded a library, which can cause subtly wrong behaviour including memory leaks and state corruption. I hacked a bit on musl for another project a couple of years ago and it felt like a perfect example of the 90:10 rule: they implement the easy 90% without really understanding why the remaining difficult 90% is there and why people need it.
Oh, and on x86 platforms they ship a spectacularly bad assembly memcpy that performs worse than a moderately competent C one on any size under about 300 bytes (around 90% of memcpy calls, typically).
The result is the same; my app, which really isn’t doing anything unusual in its Go or C bits, can’t be built on a system that uses musl.
Yes, but I suspect you could more accurately say that it doesn’t work on a system that doesn’t use glibc.
It works fine on macOS, no glibc there.
[Comment removed by author]
Expecting getaddrinfo to work reliably isn’t a GNUism, it’s a POSIXism. Code that uses it to look up hosts that require DNS over TCP to resolve will work on GNU/Linux, Android, Darwin, *BSD, and Solaris.
So is it in the POSIX standard?
They happen in Kubernetes if you use DNS for service discovery.
On the Internet, DNS uses UDP. RFC 1123 was really clear about that. It could use TCP, but Internet hosts typically didn’t because DNS responses that don’t fit in one packet require more than one packet, and that takes more time leading to a lower-quality experience, so people just turned it off. How much time depends mostly on the speed of light and the distance the packets need to travel, so we can use a random domain name to measure circuit length:
Once being “off” was ubiquitous, DNS client implementations started showing up that didn’t bother with the TCP code that they would never use, and musl is one of these.
Kubernetes (ab)uses the DNS protocol for service discovery in most reference implementations, but the distance between nodes is typically much less than 1000 miles or so, so you aren’t going to notice the time-delay so much between one packet and five. As a result, when something goes wrong, people blame the wrong-thing that isn’t in most of those reference implementations (in this case, musl).
I use /etc/hosts for service discovery (and a shell script that builds it for all the containers from the output of kubectl get …) which is faster still, and reduces the number of partitions which can make tracking down some intermittent problems easier.
This is a good point: If your application calls
gethostbyname
or something, what’s it going to do with more than 512 bytes of output? The most common reason seems to be people who use DNS to get everything implementing a service or sharing a label. Some of those are just displaying the list (on say a service dashboard), and for them, why not just ask the Kubernetes REST API? Who knows.But others are doing this because they don’t know any better: If you get five responses and are only going to connect() to one you’ve made a design mistake and you might not notice unless you use Alpine!
This reminded me of this absolute gem and must-read story from ancient computer history, of how people can’t send emails to people more then 520 miles away. https://web.mit.edu/jemorris/humor/500-miles
That’s a fun story.
You can use TCP_INFO to extract the RTT between the parts of the TCP handshake and use it to make firewall rules that block connections from too far away.
This works well for me since I generally know (geographically) where I am and where I will be, but people attacking my systems are going to be anywhere, probably on a VPN which hides their location (and makes their RTT longer)
This is a good opportunity to drop this for further reading: Using Alpine can make Python Docker builds 50× slower.
TL;DR Alpine is fine to use when running some code in a pipeline but beware of using it as a base.
Haha, @itamarst, are you really going to let someone else promote your blog and not do it yourself? :-)
Haha, I think Itamar knows I pass PythonSpeed.com articles around my communities frequently!
musl does support DNS-over-TCP: https://git.musl-libc.org/cgit/musl/commit/?id=51d4669fb97782f6a66606da852b5afd49a08001
The world appears split down the line of “Thrilled to understand the entire stack” and “Too busy to understand the entire stack.”
The complain-about-alpine posts all seem to rag on the topics of “My code won’t act the same on both platforms” forgetting that, yes, muslc is a different platform. Some stuff won’t work the same way, some proprietary stuff won’t either… one point of containerization is containing the “badness.” If you need to use glibc for some dependency… go ahead! Use a glibc-based container! This is not a problem and the solution is a feature.
But if you don’t actually need glibc, if you can take the time to figure out why it is your code has a dependency on an unrelated libc, or that libc’s non-standard behaviors (musl is unapologetic about its adherence to POSIX), then your code will likely be more portable and better for it. Realize that your code doesn’t work on the platform, not “the platform doesn’t work.”
This comment written using firefox on Void-Linux x86_64-musl.
I’m looking for a Linux distro to switch from increasingly commercial Ubuntu (ads in motd, ads in apt, apt forcibly replaced with the snap fiasco where I never know if I should use
--classic
without trial and error). Ideally, I’d like to have the same system on my laptop, VMs and containers. For me, systemd is a downside for Debian and I’ve tried using Alpine recently but it was too time consuming.After creating a Dockerfile for an alpine-based container with Python, PyTorch and few other things, PyTorch would not import. There’s an over a year old thread[1] about this on Stack Overflow with the same error. Apparently,
pthread_attr_setaffinity_np
is not available on Alpine. However a Python module attempting to use it exists.I don’t need bleeding-edge but definitely want to avoid being forced to use outdated software when newest versions work ok. For now I find Alpine inefficient for my work. I think I’ll end up with Debian.
[1] https://stackoverflow.com/questions/70740411/cannot-import-pytorch-in-alpine-docker-container
What about GUIX? Being a GNU project, it’s decidedly non-commercial, and there’s plenty of active development so you won’t get outdated packages unless you pin them, and you can easily mix and match different versions. Unlike NixOS (which is tightly linked to systemd), it uses its own init system (GNU shepherd).
Best of all, if you use derivations in your dev projects you won’t even need containers and VMs. But it should run inside those just as well, if you want to use them.
Nit: it’s “Guix”, not “GUIX”. I’ve seen this mistake made a lot – curious, where did you pick up that spelling from? (I don’t mean to criticize you at all for this, FWIW.)
I don’t really know. Maybe it’s due to some link to UNIX or because GNU is all caps too?
What about Void? You can get glibc, it’s rolling, it’s not bleeding edge.
Disclaimer: I have the Void Maintainer hat.
Use Debian. Yes, it has systemd, so what? I use Debian for many years, and I’m happy. (Sometimes there are some driver issues, wi-fi, hibernation, but this is same for every distro). Systemd is cool. It allows me to create systemd unit file, start it and then query systemd whether the service is started. Systemd keeps state. Compare this with sysvinit.
If you really-really hate systemd, consider devuan.
Also, keep in mind one particular problem with debian (possibly applies to all distros): when you run new debian release in docker in old debian release, sometimes everything breaks, such as here: https://github.com/debuerreotype/docker-debian-artifacts/issues/122 . Just add
"seccomp-profile": "/etc/docker/seccomp.json"
to/etc/docker/daemon.json
and put{}
to/etc/docker/seccomp.json
(I can add more details).A reminder: you often don’t need much of the OS at all if you link statically or include the libraries you absolutely need in your container.
Buildah (unlike Docker) makes it trivial to start with an empty base image. This lets you put the absolute minimum that you need in the container. It also doesn’t create container layers implicitly, so you can do the pip command to build all of the things you need, then the pip command to remove the toolchain and any .o files, and not end up with a load of layers that add the temporary things. And, my personal favourite, it has a copy command that copies from another container, so you can create a container that contains the build tools, build the thing, and then create a new container and (with a single command) copy from the build container to the deployment one.
Isn’t that identical to what you’d do with multi-stage docker builds?
FROM ubuntu AS builder
,FROM scratch
,COPY --from=builder source target
?Ah, you’re right. I misremembered, the missing thing in Docker is the opposite of that: copying out of a container. I don’t think that’s possible with the Docker model, where the Dockerfile is evaluated inside the new container. It is with buildah, where the equivalent is a script that runs outside. I use this a lot for building statically linked tools that I run outside of containers (especially ocaml things that scatter things everywhere during the build), so I can build in a clean environment and then throw it away at the end.
Yeah, Dockerfiles are made with the intention of the image being the artefact. To achieve this with docker you would have to actually run the container and use mounts to copy files out of it.
Or use:
Afair, at one of my previous jobs we used docker, tini, multi stage builds and just copied the resulting binary plus any required shared libs. It was pretty easy and resulted in minimal images.
“why I will never use Alpine Linux ever again”: Because there is no possibility that this problem will ever get fixed
</s>
https://git.musl-libc.org/cgit/musl/commit/?id=51d4669fb97782f6a66606da852b5afd49a08001
Exactly! ah, looks like the
</s>
was stripped from my post…Just add the glibc packages and most is well.
“‘Only an issue for 3.3 or earlier’ but I had the problem with 3.16”
Must be a typo there, either meant later or typo is in the version numbers.
16 > 3
Maybe it was a typo, and they meant 3.33?
I’m curious about this:
I have never used either one, but just spent a little time evaluating whether I wanted to try distroless. I kind of felt like I was signing up for the same class of headache there that I get with alpine. The complete absence of a shell, for one thing in particular, feels like it could bite in a similar way to MUSL. I find alpine very appealing and find distroless similarly so, but I decided that paring down a debian-based image was a better use of my time that was less likely to result in hard-to-troubleshoot runtime behavior later.
#shuddausednix