Overlay Networks for Containers and VMs

This article will explore various options on connecting containers across servers. On a single host most containers and VMs are attached to an internal NAT network. We take a closer look at container networking basics here.

An internal NAT network is a private network that exists within the host. All containers can be accessed from the host but cannot be reached from the outside world.

This is why port forwarding is required. For any external system to access a container within a NAT network you must forward ports to the container so when a requests hits say port 80 of the host it gets forwarded to port 80 of the container. The limitation of this model is at any given time only a single host port can be forwarded so you can't have multiple containers on port 80.

There are ways to design the network so containers are not in a private network. The simplest are flat networks and if you control your infrastructure these should be the preferred approach.

Flat Networks

A flat network is when both containers and hosts across servers are on the same subnet or layer 2 network (more on that below). If you control the router which your servers connect to then its relatively simple to build a flat network.

For instance if you bridge the host network and connect containers to this bridge, the containers will be directly connected to the router the host is connected to and get their IPs from it and be on the same subnet as the host. In this case no port forwarding is required as containers are on on the same network as the host.

But you need control of the router for this to work, and in the cloud and with most providers you will typically not have this.

A quick note on layer 2 and layer 3 networks. Hosts on the same subnet are in a layer 2 network. When you connect subnets to each to each other the subnets are in a layer 3 network. Addressing in layer 2 networks happens via mac addresses. Addressing in layer 3 networks happens via IP addresses.

In short hosts on the same subnet communicate over layer 2 and hosts in different subnets communicate over layer 3 via IP addresses.

Overlay Networks

Overlay networks let you build networks on top of existing networks (underlays). You can choose to build layer 2 or layer 3 overlays.

So how should you choose? In most cases a layer 3 network is the most scalable and easiest to setup and use. Stretching layer 2 overlay subnets across hosts should not be done unless absolutely required.

Containers is a private network on a single host are in a layer 2 subnet. For instance 10.0.3.0/24 is a typical private NAT network and containers in the network will get IPs in subnet range ie 10.0.3.10, 10.0.3.11 etc.

We won't get too much into subnets in this piece but typically the number after the slash denotes the subnet range or the number of hosts or unique IPs the network can accomodate. A /24 usually denotes 254 usable IP addresses, a /16 denotes 65534 IP addreses.

The simplest way to connect containers that exist in a private NAT network across hosts is with simple routing. This will only work if the hosts themselves are on the same subnet. A layer 3 network in this case simply creates routes between different subnets.

So if the container subnet in Host A is 10.0.3.0/24 and container subnet in Host B is 10.0.4.0/24 you can simply create a route, and the containers across both hosts will be able to ping each other. The important thing to note for layer 3 networks is the containers subnets need to be different for the routing to work. Both container subnets can't be 10.0.3.0/24 for instance. Then routing will not work.

Let's assume Host A IP is 192.168.1.10 and host B is 192.168.1.20. Container subnet in Host A is 10.0.3.0/24 and container subnet in Host B is 10.0.4.0/24. Since both containers subnets are in a private network in the hosts they cannot ping each other. However we can add a simple route on both hosts.

On Host A

ip route add 10.0.4.0/24 via 192.168.1.20

On Host B

ip route add 10.0.3.0/24 via 192.168.1.10

Now containers on both hosts should be able to ping each other. This is a simple layer 3 network to connect containers across hosts. But add a 3rd host and so on and it can get unwieldy quickly. You have to manually add routes on each of the hosts and keep track of them. This won't scale.

BGP

What if we could have an application that could distribute routes for us across systems and and keep track of them. Luckily there is just such an application called Quagga. Flockport uses Quagga to build layer 3 networks for containers.

Quagga uses the BGP protocol to distribute and keep track of routes. Once you install Quagga on all hosts and connect them, you can share container or VM subnets in the Quagga configuration file on each of the hosts and it distributes these routes to all hosts in the network. Containers or VMs across hosts can talk to each other.

BGP operates on the concept of peers. Peers can share subnets with each other. So Container and VM hosts can be peers and share their respective subnets. The Quagga application will ensure all the necessary routes are created.

This is the most scalable and flexible way to build container networks across hosts.

Setting up BGP Networks with Flockport

Layer 2 Overlays

This involves putting containers across hosts in the same layer 2 subnet. If you use the flat network model mentioned earlier they already are so you can stop reading now.

If you are still reading layer 3 networks are generally easier to build and manage and shoud be the first choice. But if for some reason you want containers across hosts to be on the same subnet you can do it in a number of ways

In this case we will need to build an layer 2 overlay and there are a number of technologies and tools we can use. One is Vxlan, another is using a VPN application like Peervpn or Openvpn.

Vxlan

Vxlan is far more performant compared to other options and in our internal benchmarks can operate at near line speed.

Vxlan is in built into the Linux kernel and lets you build a layer 2 network over a layer 3 network. Once you connect the containers hosts with Vxlan any containers or VMs connected to the Vxlan bridge will be on the same layer 2 subnet across hosts.

You can run a DHCP agent like Dnsmasq on one of the hosts and this can take care of automatically assigning IPs to containers connecting to this network.

How this works is you create a Vxlan device on each host connected to a host network interface over which the vxlan network will be built, then you create a bridge on each host and add the vxlan device to it. Now containers connected to this bridge will be on the same layer 2 network across hosts.

ip link add vxlan0 type vxlan id 42 group 239.1.1.1 dev eth0

This creates a Vxlan device vxlan0 with vxid 42 connected to network eth0. In this case we are assuming the outgoing network interface for the host to the other hosts who are going to be part of this Vxlan network is eth0. Vxlan uses multicast and 239.1.1.1 is the multicast IP. Vxids are used to segment networks when required.

Once the Vxlan device is created the next step is to create a bridge and add the vxlan device to it.

brctl addbr vx0
brctl addif vx0 vxlan0

This creates a new bridge device vx0 and adds the vxlan0 device to it. Now any containers connected to the vx0 bridge will be on the same layer 2 network across hosts.

Vxlan uses multicast which is often not supported on most cloud networks. So its best used on your own networks.

Setting up Vxlan Networks with Flockport

Kernel version 3.14 and up is the minimum recommended for Vxlan networks. A note of Vxlan ports. The Vxlan IANA assigned port is 4789. Older versions of the kernel Vxlan driver defaulted to port 8472. Often if the Vxlan ports are not consistent across hosts the network will not work so its important to use a recent version of the kernel. Also if you are using a firewall please ensure port 4789 is open.

Wireguard

Wireguard is a new open source VPN networking project that lets you build encrypted networks without the overhead and performance penalty.

Wireguard requires choosing a master host and sharing subnets similar to BGP. The Wireguard daemon then takes care of populating routes. Wireguard also has advanced capabilities to segment and isolate networks.

We have a more detailed guide on setting up Wireguard networks here.

The main downside with Wireguard currently is it's not yet in the kernel and installation is a bit involved. However the good news is Linus Torvalds is impressed with the patch and Wireguard is scheduled to be merged into the kernel soon.

Flockport supports Vxlan, BGP and Wireguard and automates the process of setting up these networks. You can find indiviual guides on Vxlan, Wireguard and BGP in the Guides section. A more detailed guide on building networks with BGP using Quagga is in production and it will be posted in the Guides section as soon as its done.


RELATED POSTS