LCE: The failure of operating systems and how we can fix it
Ready to give LWN a try?With a subscription to LWN, you can stay current with what is happening in the Linux and free-software community and take advantage of subscriber-only site features. We are pleased to offer you a free trial subscription, no credit card required, so that you can see for yourself. Please, join us!
The abstract
of Glauber Costa's talk at LinuxCon Europe 2012 started with the humorous
note "I once heard that hypervisors are the living proof of operating
system's incompetence
". Glauber acknowledged that hypervisors have
indeed provided a remedy for certain deficiencies in operating system
design. But the goal of his talk was to point out that, for some cases,
containers may be an even better remedy for those deficiencies.
Operating systems and their limitations
Because he wanted to illustrate the limitations of traditional UNIX systems that hypervisors and containers have been used to address, Glauber commenced with a recap of some operating system basics.
In the early days of computing, a computer ran only a single program.
The problem with that mode of operation is that valuable CPU time was
wasted when the program was blocked because of I/O. So, Glauber noted
"whatever equivalent of Ingo Molnar existed back then wrote a
scheduler
" in order that the CPU could be shared among processes;
thus, CPU cycles were no longer wasted when one process blocked on I/O.
A later step in the evolution of operating systems was the addition of virtual memory, so that (physical) memory could be more efficiently allocated to processes and each process could operate under the illusion that it had an isolated address space.
However, nowadays we can see that the CPU scheduling and virtual memory abstractions have limitations. For example, suppose you start a browser or another program that uses a lot of memory. As a consequence, the operating system will likely start paging out memory from processes. However, because the operating system makes memory-management decisions at a global scope, typically employing a least recently used (LRU) algorithm, it can easily happen that excessive memory use by one process will cause another process to suffer being paged out.
There is an analogous problem with CPU scheduling. The kernel allocates CPU cycles globally across all processes on the system. Processes tend to use as much CPU as they can. There are mechanisms to influence or limit CPU usage, such as setting the nice value of a process to give it a relatively greater or lesser share of the CPU. But these tools are rather blunt. The problem is that while it is possible to control the priority of individual processes, modern applications employ groups of processes to perform tasks. Thus, an application that creates more processes will receive a greater share of the CPU. In theory, it might be possible to address that problem by dynamically adjusting process priorities, but in practice this is too difficult, since processes may come and go quite quickly.
The other side of the resource-allocation problem is denial-of-service attacks. With traditional UNIX systems, local denial-of-service attacks are relatively easy to perpetrate. As a first example, Glauber gave the following small script:
$ while true; do mkdir x; cd x; done
This script will create a directory structure that is as deep as possible. Each subdirectory "x" will create a dentry (directory entry) that is pinned in non-reclaimable kernel memory. Such a script can potentially consume all available memory before filesystem quotas or other filesystem limits kick in, and, as a consequence, other processes will not receive service from the kernel because kernel memory has been exhausted. (One can monitor the amount of kernel memory being consumed by the above script via the dentry entry in /proc/slabinfo.)
Fork bombs create a similar kind of problem that affects unrelated processes on the system. As Glauber noted, when an application abuses system resources in these ways, then it should be the application's problem, rather than being everyone's problem.
Hypervisors
Hypervisors have been the traditional solution to the sorts of problems described above; they provide the resource isolation that is necessary to prevent those problems.
By way of an example of a hypervisor, Glauber chose KVM. Under KVM, the Linux kernel is itself the hypervisor. That makes sense, Glauber said, because all of the resource isolation that should be done by the hypervisor is already done by the operating system. The hypervisor has a scheduler, as does the kernel. So the idea of KVM is to simply re-use the Linux kernel's scheduler to schedule virtual machines. The hypervisor has to manage memory, as does the kernel, and so on; everything that a hypervisor does is also part of the kernel's duties.
There are many use cases for hypervisors. One is simple resource isolation, so that, for example, one can run a web server and a mail server on the same physical machine without having them interfere with one another. Another use case is to gather accurate service statistics. Thus, for example, the system manager may want to run top in order to obtain statistics about the mail server without seeing the effect of a database server on the same physical machine; placing the two servers in separate virtual machines allows such independent statistics gathering.
Hypervisors can be useful in conjunction with network applications. Since each virtual machine has its own IP address and port number space, it is possible, for example, to run two different web servers that each use port 80 inside different virtual machines. Hypervisors can also be used to provide root privilege to a user on one particular virtual machine. That user can then do anything they want on that virtual machine, without any danger of damaging the host system.
Finally, hypervisors can be used to run different versions of Linux on the same system, or even to run different operating systems (e.g., Linux and Windows) on the same physical machine.
Containers
Glauber noted that all of the above use cases can be handled by hypervisors. But, what about containers? Hypervisors handle these use cases by running multiple kernel instances. But, he asked, shouldn't it be possible for a single kernel to satisfy many of these use cases? After all, the operating system was originally designed to solve resource-isolation problems. Why can't it go further and solve these other problems as well by providing the required isolation?
From a theoretical perspective, Glauber asked, should it be possible for the operating system to ensure that excessive resource usage by one group of processes doesn't interfere with another group of processes? Should it be possible for a single kernel to provide resource-usage statistics for a logical group of processes? Likewise, should the kernel be able to allow multiple processes to transparently use port 80? Glauber noted that all of these things should be possible; there's no theoretical reason why an operating system couldn't support all of these resource-isolation use cases. It's simply that, historically, operating systems were not built with these requirements in mind. The only notable use case above that couldn't be satisfied is for a single kernel to run a different kernel or operating system.
The goal of containers is, of course, to add the missing pieces that allow a kernel to support all of the resource-isolation use cases, without the overhead and complexity of running multiple kernel instances. Over time, various patches have been made to the kernel to add support for isolation of various types of resources; further patches are planned to complete that work. Glauber noted that although all of those kernel changes were made with the goal of supporting containers, a number of other interesting uses had already been found (some of these were touched on later in the talk).
Glauber then looked at some examples of the various resource-isolation features ("namespaces") that have been added to the kernel. Glauber's first example was network namespaces. A network namespace provides a private view of the network for a group of processes. The namespace includes private network devices and IP addresses, so that each group of processes has its own port number space. Network namespaces also make packet filtering easier, since each group of processes has its own network device.
Mount namespaces were one of the earliest namespaces added to the kernel. The idea is that a group of processes should see an isolated view of the filesystem. Before mount namespaces existed, some degree of isolation was provided by the chroot() system call, which could be used to limit a process (and its children) to a part of the filesystem hierarchy. However, the chroot() system call did not change the fact that the hierarchical relationship of the mounts in the filesystem was global to all processes. By contrast, mount namespaces allow different groups of processes to see different filesystem hierarchies.
User namespaces provide isolation of the "user ID" resource. Thus, it is possible to create users that are visible only within a container. Most notably, user namespaces allow a container to have a user that has root privileges for operations inside the container without being privileged on the system as a whole. (There are various other namespaces in addition to those that Glauber discussed, such as the PID, UTS, and IPC namespaces. One or two of those namespaces were also mentioned later in the talk.)
Control groups (cgroups) provide the other piece of infrastructure needed to implement containers. Glauber noted that cgroups have received a rather negative response from some kernel developers, but he thinks that somewhat misses the point: cgroups have some clear benefits.
A cgroup is a logical grouping of processes that can be used for resource management in the kernel. Once a cgroup has been created, processes can be migrated in and out of the cgroup via a pseudo-filesystem API (details can be found in the kernel source file Documentation/cgroups/cgroups.txt).
Resource usage within cgroups is managed by attaching controllers to a cgroup. Glauber briefly looked at two of these controllers.
The CPU controller mechanism allows a system manager to control the percentage of CPU time given to a cgroup. The CPU controller can be used both to guarantee that a cgroup gets a guaranteed minimum percentage of CPU on the system, regardless of other load on the system, and also to set an upper limit on the amount of CPU time used by a cgroup, so that a rogue process can't consume all of the available CPU time. CPU scheduling is first of all done at the cgroup level, and then across the processes within each cgroup. As with some other controllers, CPU cgroups can be nested, so that the percentage of CPU time allocated to a top-level cgroup can be further subdivided across cgroups under that top-level cgroup.
The memory controller mechanism can be used to limit the amount of memory that a process uses. If a rogue process runs over the limit set by the controller, the kernel will page out that process, rather than some other process on the system.
The current status of containers
It is possible to run production containers today, Glauber said, but
not with the mainline kernel. Instead, one can use the modified kernel
provided by the open source OpenVZ project that is
supported by Parallels, the company where Glauber is employed. Over the
years, the OpenVZ project has been working on upstreaming all of its
changes to the mainline kernel. By now, much of that work has been done,
but some still remains. Glauber hopes that within a couple of years
("I would love to say months, but let's get realistic
") it
should be possible to run a full container solution on the mainline kernel.
But, by now, it is already possible to run subsets of container functionality on the mainline kernel, so that some people's use cases can already be satisfied. For example, if you are interested in just CPU isolation, in order to limit the amount of CPU time used by a group of processes, that is already possible. Likewise, the network namespace is stable and well tested, and can be used to provide network isolation.
However, Glauber said, some parts of the container infrastructure are still incomplete or need more testing. For example, fully functional user namespaces are quite difficult to implement. The current implementation is usable, but not yet complete, and consequently there are some limitations to its usage. Mount and PID namespaces are usable, but likewise still have some limitations. For example, it is not yet possible to migrate a process into an existing instance of either of those namespaces; that is a desirable feature for some applications.
Glauber noted some of the kernel changes that are still yet to be merged to complete the container implementation. Kernel memory accounting is not yet merged; that feature is necessary to prevent exploits (such as the dentry example above) that consume excessive kernel memory. Patches to allow kernel-memory shrinkers to operate at the level of cgroups are still to be merged. Filesystem quotas that operate at the level of cgroups remain to implemented; thus, it is not yet possible to specify quota limits on a particular user inside a user namespace.
There is already a wide range of tooling in place that makes use of
container infrastructure, Glauber said. For example, the libvirt library makes it possible to start
up an application in a container. The OpenVZ vzctl tool
is used to manage full OpenVZ containers. It allows for rather
sophisticated management of containers, so that it is possible to do things
such as running containers using different Linux distributions on top of
the same kernel. And "love it or hate it, systemd uses a lot of the
infrastructure
". The unshare command can be used to run a
command in a separate namespace. Thus, for example, it is possible to fire
up a program that operates in an independent mount namespace.
Glauber's overall point is that containers can already be used to
satisfy several of the use cases that have historically been served by
hypervisors, with the advantages that containers don't require the creation
of separate full-blown virtual machines and provide much finer granularity
when controlling what is or is not shared between the processes inside the
container and those outside the container. After many years of work, there
is by now a lot of container infrastructure that is already useful. One can
only hope that Glauber's "realistic" estimate of two years to complete the
upstreaming of the remaining container patches proves accurate, so that
complete container solutions can at last be run on top of the mainline
kernel.
Index entries for this article | |
---|---|
Kernel | Containers |
Kernel | Virtualization/Containers |
Conference | LinuxCon Europe/2012 |
Posted Nov 15, 2012 7:50 UTC (Thu)
by epa (subscriber, #39769)
[Link] (8 responses)
Often a large application will specify a particular Linux distribution or Windows version it is 'certified' to run on. The vendor may even insist that its application be the only thing running on the machine, if you want to get support. It may require particular versions of system libraries because those were the ones it was tested with. And yes, I am talking about big companies here, where stupid things are done for stupid big-organization reasons, and if you use free software and compile from source you are free of this nonsense, blah blah. But bear with me and assume that at least some of the time there is a legitimate reason to require an exact operating system version for running an application. (If you have ever worked on a support desk, you will find this reality easier to accept.)
So what we start to see are 'appliances' where the application is packaged up with its operating system ready to load into a virtual machine. Instead of supplying a program which calls the complex interface provided by the kernel, C library, and other system libraries, the vendor supplies one which expects the 'ABI' of an idealized x86-compatible computer. It has proved easier to agree on that than to agree on the higher level interfaces. Even though, somewhat absurdly, it means that TCP/IP and filesystems and virtual memory are all being reimplemented inside the 'appliance', it works out more robust this way.
Posted Nov 15, 2012 9:55 UTC (Thu)
by robert_s (subscriber, #42402)
[Link] (4 responses)
Only because so far, little communication & cooperation between these appliances has been sought or required. If "appliances" are our new "processes" the fun is going to come when the equivalent of IPC is required.
And let's not even start talking about efficiency.
Posted Nov 15, 2012 9:58 UTC (Thu)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Nov 15, 2012 16:04 UTC (Thu)
by k3ninho (subscriber, #50375)
[Link]
Posted Nov 22, 2012 6:08 UTC (Thu)
by HelloWorld (guest, #56129)
[Link] (1 responses)
Posted Nov 22, 2012 8:13 UTC (Thu)
by Fowl (subscriber, #65667)
[Link]
Plus people want to use containers for more serious "untrusted" isolation.
Posted Nov 15, 2012 15:26 UTC (Thu)
by raven667 (subscriber, #5198)
[Link] (2 responses)
Posted Nov 15, 2012 18:01 UTC (Thu)
by drag (guest, #31333)
[Link] (1 responses)
Posted Nov 15, 2012 20:09 UTC (Thu)
by glommer (guest, #15592)
[Link]
But while you are reusing all of the infrastructure from the OS - awesome - you still have two schedulers, two virtual memory subsystems, two IO dispatchers, etc.
Containers, OTOH, are basically the Operating System taking resource isolation one step further, and allowing you to do all that without resorting to all the resource duplication you have with hypervisors - be your hypervisor your own OS or not.
Which of them suits you better, is up to you, your use cases, and personal preferences.
Posted Nov 15, 2012 15:13 UTC (Thu)
by NAR (subscriber, #1313)
[Link] (4 responses)
Posted Nov 15, 2012 16:04 UTC (Thu)
by Jonno (subscriber, #49613)
[Link] (2 responses)
The CPU controller works slightly differently, you can sets a "shares" value, and *when cpu contention occurs* the cpu resources will be assigned proportionally. As the root cgroup defaults to 1024 shares, if you assign a shares value of 4096 to your cgroup (assuming there is no other cgroups), it will be limited to 80% of cpu time when there is contention, but be allowed to use more if no other process wants to be scheduled.
Posted Nov 15, 2012 20:01 UTC (Thu)
by glommer (guest, #15592)
[Link] (1 responses)
However, there are many scenarios where you actually want to limit the maximum amount of cpu used, even without contention. An example of this, is cloud deployments where you pay for cpu time and value price predictability over performance.
The cpu cgroup *also* allows one to set a maximum quota through the combination of the following knobs:
cpu.cfs_quota_us
If you define your quota as 50 % of your period, you will run for at most 50 % of the time. This is bandwidth based, in units of microseconds. So "use at most 2 cpus" is equivalent to 200 %. IOW, 2 seconds per second.
Equivalent mechanism exists for rt tasks: cpu.rt_quota_us, etc.
Cheers
Posted Nov 19, 2012 3:14 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Posted Nov 15, 2012 16:23 UTC (Thu)
by ottuzzi (guest, #74496)
[Link]
Solaris containers respects imposed limits only when there are competing resources.
Hope I was clear
Posted Nov 15, 2012 18:55 UTC (Thu)
by pbryan (guest, #3438)
[Link] (4 responses)
I was under the impression that it is possible to run production containers today with LXC. What functionality does OpenVZ provide have that is not supported by LXC, i.e. via cgroups, clone(2) isolation flags, devpts isolation mechanisms?
Posted Nov 15, 2012 19:01 UTC (Thu)
by xxiao (guest, #9631)
[Link] (3 responses)
Posted Nov 15, 2012 19:48 UTC (Thu)
by glommer (guest, #15592)
[Link] (2 responses)
There are many things that mainline Linux lacks. One of them, is the kernel memory limitation described in the article, that allows the host to protect against abuse from potentially malicious containers. It is trivial for a container to fill the memory with non-reclaimable objects, so no one else can be serviced.
User namespaces are progressing rapidly, but they are not there yet. Eric Biederman is doing a great job with that, patches are flowing rapidly, but you still lack a fully isolated capability system.
The pseudo file-systems /proc and /sys will still leak a lot of information from the host.
Tools like "top" won't work, because it is impossible to grab per-group figures of cpu usage. And this is not an extensive list.
So if "production" for you rely on any of the above, then no, you can't run LXC. If otherwise, then sure, you can run LXC.
Besides that, a lot of the kernel features that LXC relies on, were contributed for the OpenVZ project. So it is not like we're trying to fork the kernel, and keep people on our branch forever. It's just a quite big amount of work, the trade offs are not always clear for upstream, etc - It is no difference than Android in essence.
The ultimate goal, as stated in the article, is to have all the kernel functionality in mainline, so people can use any userspace tool they want.
Cheers
Posted Nov 16, 2012 12:29 UTC (Fri)
by TRS-80 (guest, #1804)
[Link] (1 responses)
Posted Nov 22, 2012 15:12 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Posted Nov 15, 2012 21:31 UTC (Thu)
by naptastic (guest, #60139)
[Link] (2 responses)
The hosting company I work for (who shall remain nameless) uses OpenVZ for virtual servers. It is wholly inadequate and we have begun transitioning to KVM.
It's still trivial to forkbomb a container running the VZ kernel and take down the whole host. The container model also requires a number of really, really annoying limits (shared memory, locked memory, open files, total files, TCP sockets, total sockets, and on and on) that have to be there because of the fundamental weakness of the container model.
You can get around that by using a container just for one thing, but then you still have to have full operating system just for that one thing. If I want to have memcache by itself in a container, I need a full file system and Linux install to support it. You lose some overhead by using a container instead of a hypervisor, then get it all right back plus some with the requirements of containment.
The only advantage I can see is that you can update the hardware configuration in realtime. Other than that, use cgroups or use full virtualization.
Posted Nov 16, 2012 9:11 UTC (Fri)
by kolyshkin (guest, #34342)
[Link] (1 responses)
> It's still trivial to forkbomb a container running
If that would be true, all the hosting service providers using VZ (i.e. a majority of) would go out of business very soon. If CT resources (user beancounters) are configured in a sane way (and they are configured that way by default -- so unless host system administrator removes some very vital limits), this is totally and utterly impossible.
So, let's turn words into actions. I offer you $100 (one hundred US dollars) for demonstrating a way to bring the whole system down using a fork bomb in OpenVZ (or Virtuozzo, for that matter) container. A reproducible description of what to do to achieve it is sufficient.
> The container model also requires a number of really, really annoying limits
I can feel your pain. Have you ever heard of vswap? Here are some 1-year-old news for you?
In a nutshell, you only need to set RAM and SWAP for container, and keep the rest of really, really annoying limits unconfigured.
> because of the fundamental weakness of the container model
Could you please enlighten us on what exactly is this fundamental weakness?
> You can get around that
Now, I won't be commenting on the rest of your entry because it is based on wrong assumptions.
Posted Nov 20, 2012 1:40 UTC (Tue)
by exel (guest, #87380)
[Link]
There is, at the core of OpenVZ, something that seems absolutely elegant and useful – a more isolating take on the FreeBSD jail concept, which itself was of course chroot on acid. Using it for advanced process isolation looks like a sensible application. The _typical_ application of OpenVZ, in the field right now, though, is that of a poor man's hypervisor. I think that a container technology is just the wrong approach for this.
The big elephant in the room, for me, is security isolation. Containers all run under the same kernel, which means that a kernel compromise is a compromise of all attached security domains. An actual hypervisor setup adds an extra privilege layer that has to be separately broken.
Again, this doesn't mean that OpenVZ cannot be tremendously useful. The most visible way Parallels is selling the technology, however, is not what people are looking for. This pans out in the market place.
Posted Nov 16, 2012 17:22 UTC (Fri)
by tbird20d (subscriber, #1901)
[Link] (2 responses)
If the kernel is lightweight, then it seems like re-using it in recursive sort of way as a hypervisor, a'la KVM, seems like the more tractable long term approach, rather than adding lots of complexity to all these different code paths (basically, almost all of the major resource management paths in the kernel).
Posted Nov 16, 2012 18:27 UTC (Fri)
by dlang (guest, #313)
[Link] (1 responses)
When you run a separate kernel to manage one app, you now have multiple layers of caching for example.
'fixing' this gets very quickly to where it's at least as much complexity.
Posted Nov 17, 2012 16:47 UTC (Sat)
by tom.prince (guest, #70680)
[Link]
http://www.openmirage.org/ and https://github.com/GaloisInc/HaLVM are tools for doing this that come to mind.
Posted Nov 22, 2012 9:15 UTC (Thu)
by hensema (guest, #980)
[Link] (1 responses)
Costa forgets important uses of virtualisation: Surely there's a lot to be won in process or user isolation in Linux itself. That'll both be useful when running on bare metal or on a HV. However Costa seems to want to go back to the hayday of mainframes and minis. That's not going to happen. Sorry.
Posted Nov 22, 2012 9:50 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
That is done acceptably by OpenVZ and Containers.
>Migration of VMs
Ditto. There's an article about checkpoint/restore for Linux containers in this week's LWN issue.
>Easy provisioning.
Posted Nov 24, 2012 14:46 UTC (Sat)
by jch (guest, #51929)
[Link] (9 responses)
That's pretty much orthogonal to resource usage, isn't it?
The issue here is that IP addresses don't obey the usual permissions model: I can chown a directory, thereby giving a user the right to create files and subdirectories within this particular directory, but I cannot chown an IP address, thereby giving a user the right to bind port 80 on this particular address.
I'd be curious to know if I'm the only person feeling that many of the uses of containers and virtualisation would be avoided if the administrator could chown an IP address (or an interface).
-- jch
Posted Nov 24, 2012 18:07 UTC (Sat)
by dlang (guest, #313)
[Link] (8 responses)
it's not as trivial as a chown, but it's possible.
I think you are mixing up the purposes of using containers.
It's not the need to use port 80 that causes you to use containers, it's that so many processes that use port 80 don't play well with each other (very specific version dependencies that conflict) or are not trusted to be well written and so you want to limit the damage that they can do if they run amok (either due to local bugs, or do to external attackers)
containers don't give you as much isolation as full virtualization, but they do give you a lot of isolation (and are improving fairly rapidly), and they do so at a fraction of the overhead (both CPU and memory) of full virtualization.
If you have a very heavy process you are running, you may not notice the overhead of the virtualization, but if you have fairly lightweight processes you are running, the overhead can be very significant.
I'm not just talking about the CPU hit for running in a VM, or the memory hit from each VM having it's own kernel, but also things like the hit from each VM doing it's own memeory management, the hit (both CPU and memory) from each VM needing to run it's own copy of all the basic daemons (systemd/init, syslog, dbus, udev, etc) and so on.
If you are running single digit numbers of VMs on one system, you probably don't care about these overheads, but if you are running dozens to hundreds of VMs on one system, these overheads become very significant.
Posted Nov 24, 2012 18:59 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (7 responses)
For now I'm using nginx as a full-scale HTTP proxy.
That restriction for <1024 ports is by far the most moronic stupid imbecilic UNIX feature ever invented.
Posted Nov 24, 2012 19:09 UTC (Sat)
by bronson (subscriber, #4806)
[Link]
It's true that those days are long gone and it's time for this restriction to disappear.
Posted Nov 24, 2012 19:13 UTC (Sat)
by dlang (guest, #313)
[Link] (5 responses)
I agree that in the modern Internet, that really doesn't make sense, but going back, you had trusted admins (not just of your local box, but of the other boxes you were talking to), and in that environment it worked.
so think naive not moronic
remember, these are the same people who think that firewalls are evil because they break the unlimited end-to-end connectivity of the Internet. :-)
Posted Nov 24, 2012 19:21 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
>I agree that in the modern Internet, that really doesn't make sense, but going back, you had trusted admins (not just of your local box, but of the other boxes you were talking to), and in that environment it worked.
>remember, these are the same people who think that firewalls are evil because they break the unlimited end-to-end connectivity of the Internet. :-)
Posted Nov 24, 2012 19:26 UTC (Sat)
by dlang (guest, #313)
[Link] (2 responses)
> There is no such record (I naively thought so too). You can check the Linux source.
Ok, I thought I remembered seeing it at some point in the past, I may have mixed it up with the ability to bind to IP addresses that aren't on the box <shrug>
I wonder how quickly someone could whip up a patch to add this ;-)
seriously, has this been discussed and rejected, or has nobody bothered to try and submit something like this?
Posted Nov 24, 2012 19:58 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jun 29, 2014 8:50 UTC (Sun)
by stevenp129 (guest, #97662)
[Link]
if user BOB wrote a program to constantly monitor Apache, and the second its PID dies, he was to fire up his own web server on port 80, he could steal sensitive information and password (with great ease).
on a shared hosting service (for example), if somebody neglected to update their CMS to the latest version, and the host runs their webserver without a Chroot... a simple bug or exploit in a website could, in turn, allow a rogue PHP or CGI Script to take over the entire server! not good!
or imagine your DNS server going down! due to a hostile take over... they could redirect traffic to their own off site server, and perform phishing attacks against you and all your clients this way!
Of course there are legitimate reasons to forbid those without privs to bind to ports less that 1024... I'm not sure what is so "stupid" about this idea?
Posted Jan 10, 2013 12:03 UTC (Thu)
by dps (guest, #5725)
[Link]
If both a border firewall blocks some attack traffic then a security bug on an internal system is not immediately fatal and there is time to fix it before the border firewall's security is breached. If that has not happened that implies nobody worthwhile has tried or you can't detect security breaches.
In an ideal world there would be no need for security because nobody would even think of doing a bad dead. The world has never been that way.
Posted Aug 4, 2013 3:44 UTC (Sun)
by rajneesh (guest, #92204)
[Link]
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
Conflicting goals?
Conflicting goals?
Conflicting goals?
cpu.cfs_period_us
This is defaulted to -1, meaning "no upcap"
Conflicting goals?
Conflicting goals?
Say you have container A with a CPU limit of 80 and a container B with a CPU limit of 70: if container A is not using CPU container B can use all the CPU available; same if B is doing anything on CPU A can use all the CPU.
But when both A and B are using all the CPU available then they are balanced proportionally to their weight.
I think Linux is implemented the same way.
Bye
Piero
LXC?
LXC?
LXC?
Having decent userspace tools is something else that's missing from the upstream kernel container implementation. The kernel has all these features now, but no coherent way of managing them nicely yet.
LXC?
LXC?
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
> the VZ kernel and take down the whole host.
- https://plus.google.com/113376330521944789537/posts/5WEzA...
- http://wiki.openvz.org/VSwap
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
OpenVZ actually wins here. Provisioning an OpenVZ container is dead easy. Mass operations are also very easy.
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
>it's not as trivial as a chown, but it's possible.
How? So far I have tried:
1) Iptables - simply DoesNotWork(tm), particularly for localhost.
2) Redirectors - PITA to setup and often no IPv6 support.
3) Capabilities - no way to make it work with Python scripts or Java apps.
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
There is no such record (I naively thought so too). You can check the Linux source.
A good mechanism would haven been to allow users access to a range of ports. Something simple like /etc/porttab with list of port ranges and associated groups would suffice.
I happen to think the same. Security should not be done on network's border, instead all the systems should be secured by local firewalls.
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it
LCE: The failure of operating systems and how we can fix it