Container Basics

A container is like a lightweight VM. Containers offer a higher degree of flexibility and portability and let you run multiple OS's on your host like VMs. Only these OS's are your containers. So you could be on a Debian host and run Redhat, Ubuntu and Alpine Linux in containers simultaneously.

You can log into a container, install apps, and power off just like a VM. Containers can be easily moved across hosts which makes applications installed in containers portable. Portability and flexibility are the core advantages of containers.

Some container implementations like Docker have gone in a different direction with single process containers, layers and ephemeral data. This is a very specific use case of containers and introduces a certain amount of complexity and constraints that need to be understood.

How Do Containers Work

Containers depend on Linux namespaces added in kernel 2.6. Launching a process in its own namespace isolates it from other processes on the system. There are 6 main namespaces including network namespaces used by containers.

Launching an OS init in a namespaced process essentially gives you what we know as a Linux container. The container is also launched in its own network namespace so it gets its own network stack isolated from the host. Cgroups can also be used to limit cpu and memory available to namespaced processes.

Namespaces are still being developed and a lot of kernel subsystems are not namespace aware. Cgroups for instance only became namespace aware in kernel 4.6.

Userland container managers like LXC provide the base OS templates and tooling to launch and manage containers. They also provide out of the box networking support for containers by providing a default NAT bridge that containers can connect to. This is similar to the NAT bridges provided by VM applications.

Containers are basically folders on your host. The base OS templates are various OS file trees. When launching a container the container manager chroots (actually pivot root) into the container folder and launches the container OS init in a namespaced process so you can get a separate OS running as an isolated process. The container is also launched in its own network namespace and thus has its own isolated network stack.

Docker is a bit different. Docker was initially based on LXC untill it recreated its own container manager in Golang. Docker does not launch the OS init but the application directly in the namespaced process hence the lack of multi process support and a non standard OS environment.

VMs vs Containers

VMs are mature, offer a higher degree of isolation compared to containers which are namespaced processes that share the same kernel as the host.

VMs have a wide user base with a rich and mature ecosystem of tooling, support and expertise.

Containers offer several advantages compared to VMs. They are lightweight and portable and are a simple folder on your filesystem. But they are still maturing and in time will become the first choice to deploy applications. Apart from use cases where a far higher degree of isolation or a specific kernel version is required in which case VMs remain the only option.

VMs also remain the only choice for multi tenant workloads by hosting and cloud providers.

LXC vs Docker

LXC is an open source Linux container project in development since 2008. Serge Hallyn, one of the lead developers of LXC, provides a short history of containers here.

LXC provides base OS container templates, a mechanism to launch a container in its own namespaces, container networking support via a nat bridge that containers connect to and container management. LXC also supports various filestems like btrfs and zfs for cloning and snapshots and also layered containers via overlayfs and aufs. It also supports advanced networking options.

Docker then dotcloud was using LXC containers for its PAAS platform. Around 2013 they launched their own container project based on LXC. A container is basically a process launched in its own namespace via kernel support. While LXC lauches an OS init in the namespace so you get a standard OS environment with multi process support like a VM, Docker directly launches an application in the namespace so you get a single process container and a non standard OS environment. Docker also builds containers using layers and has ephmeral storage.

Privileged vs Unprivileged Containers

The difference is simple, privileged containers run as the root user, unprivileged containers run as a normal user. To recap containers are basically namespaced processed made possible with the addition of namespaces to the Linux kernel in 2.6. A lot of Linux subsystems are still not namespace aware, for instance cgroups which are used to limit resources access like cpu or memory to processes were not namespace aware untill kernel 4.6.

The use of unprivileged containers is made possible by the addition of user namespaces to the kernel in 3.8. This lets you launch the container namespace process as a normal user. There are still a large number of issues with running containers as user so a lot of work needs to be done for a seamless experience.

For instance a lot of basic functions needed for containers like creating networking devices for container networking and mounting filesystems can only be done by the root user in Linux. So if you are running the container process as an unprivileged user a lot of workarounds need to be done by the container manager with regard to networking and file mounts.

It's important to distinguish between unprivileged containers and simply having a normal user connect to a container manager that is running as a root daemon to run and launch containers. This is not an unprivileged container, the container is still running as root user.

Unprivileged containers do not necessarily deliver more security. They have been a number of CVEs around this, and distributions like Debian disable user namespaces out of the box. The Linux kernel needs to address a container as an entity before some of these issues can be resolved. In the interim container managers adopt workarounds, some of which tend to be leaky and involved.

Running a process as a normal user in a privileged container also gives you a degree of security, as far as the kernel is concerned that process is not privileged. The main use case for unprivileged containers is multi-tenancy, like VMs which could be useful to cloud providers. But that will require far more isolation that Linux containers can currently provide.