Jail Vnet by Examples
Jail Vnet by Examples
Jail Vnet by Examples
BY OLIVIER COCHARD-LABBÉ
To understand the Virtual Network feature (vnet), not to be confused with vtnet(4) for VirtIO
Ethernet driver, let’s begin with an extract from the vnet(9) man page:
DESCRIPTION
vnet is the name of a technique to virtualize the network stack.
(...).
Each (virtual) network stack is attached to a prison, with vnet0
being the unrestricted default network stack of the base system.
As a related prison feature, let’s check the jail(8) man page section about vnet:
vnet Create the jail with its own virtual network stack, with its own
network interfaces, addresses, routing table, etc. The kernel
must have been compiled with the VIMAGE option for this to be
available. Possible values are “inherit” to use the system
network stack, possibly with restricted IP addresses, and “new”
to create a new network stack.
To resume, it’s a feature that allows each jail to have its own routing table / ARD & NDP
cache and interfaces
Vocabulary
• Host: The system hosting your jails
Examples
These examples use “empty” jails based on the host ‘/’ to focus only on the vnet feature. They
will all be in “persist” mode (because no processes are running).
Concerning the OS requirements:
• The shell used is /bin/sh.
• FreeBSD 12.1 minimum (can be 12-STABLE or even better a -head)
Here is the default network interface assigned and the content of its routing table for this
new jail.
Internet:
Destination Gateway Flags Netif Expire
127.0.0.1 link#1 UH lo0
Internet6:
Destination Gateway Flags Netif Expire
::1 link#1 UH lo0
fe80::%lo0/64 link#1 U lo0
fe80::1%lo0 link#1 UHS lo0
This is a lot better! But we only have the loopback interface running. The next step is to cre-
ate a virtual Ethernet tap interface and assign it to the jail. The ifconfig(8) man page extract:
vnet jail
Move the interface to the jail(8), specified by name or JID. If
the jail has a virtual network stack, the interface will
disappear from the current environment and become visible to the
jail.
Let’s do this:
What happened? Just after we enabled the interface and assigned it to the jail, it disap-
peared! That is the expected behavior because this interface doesn’t belong to your host net-
work stack anymore. You can check its status on the jail and even assign an IP address to it:
The jail can ping its own interface, its own ARP cache is populated with the corresponding
entry, and all of these are isolated from the host networking stack.
It’s the same for the routing table:
Internet:
Destination Gateway Flags Netif Expire
127.0.0.1 link#1 UH lo0
192.0.2.0/24 link#2 U tap0
192.0.2.1 link#2 UHS lo0
198.51.100.0/24 192.0.2.2 UGS tap0
# netstat -4rn | grep 198.51.100.0
#
Before continuing to the next example, we will clean up the existing jail and destroy the
tap interface. We need to use the -R (upper case) option to remove a jail created without a
configuration file. Using option -r (lower case), the vnet interface will not be removed to the
host automatically.
# jail -R useless
# ifconfig $TAP destroy
To solve this problem, the epair(4) interface (Ethernet pair) was created. This special network
interface represents two interfaces (epairXa and epairXb) that will behave like two Ethernet in-
terfaces cross-connected between them. By assigning each side to a different vnet, they will still
exchange frames between them.
host jail1
epair0a epair0b
The host is showing its two new interfaces: epair0a and epair0b.
Create a new jail, named “jvnet” and assign interface epair0b to it.
Interface epair0b no longer belongs to the host system network stack, but the other epair0a
still does! Let’s configure an IP address on epair0a.
# ifconfig -g epair
epair0a
# ifconfig epair0a inet 192.0.2.1/24 up
Then do the same on epair0b belonging to the jail and check their connectivity.
By displaying their MAC address, you will notice that epair are using specific MAC address.
Destroy this jail and the epair pair interfaces before continuing to the next example:
# jail -R jvnet
# ifconfig epair0a destroy
Now that we’ve got a basic setup working, let’s complexify a little more using multiple jails
configured in serial and with routing between them.
Then generate five jails with epair assigned to them with a mix of loop and manual assignment.
5 hop3 /
6 hop4 /
7 hop5 /
Now configure IP addresses, enable routing on some jails, and set the static routes.
Test your setup by pinging the fifth jail from the host network stack.
# ping -c 2 192.0.2.9
PING 192.0.2.9 (192.0.2.9): 56 data bytes
64 bytes from 192.0.2.9: icmp_seq=0 ttl=60 time=0.265 ms
64 bytes from 192.0.2.9: icmp_seq=1 ttl=60 time=0.482 ms
# traceroute -n 192.0.2.9
traceroute to 192.0.2.9 (192.0.2.9), 64 hops max, 40 byte packets
1 192.0.2.1 0.060 ms 0.243 ms 0.244 ms
2 192.0.2.3 0.180 ms 0.202 ms 0.263 ms
3 192.0.2.5 0.050 ms 0.159 ms 0.205 ms
4 192.0.2.7 0.194 ms 0.197 ms 0.191 ms
5 192.0.2.9 0.261 ms 0.201 ms 0.188 ms
Optionally, add a little more fun to this setup (with a big IPv6 range it should be easy to au-
tomate a text-to-traceroute script populating DNS configuration file).
Going Further
Connecting Jails with the Outside World
Multiple choice here:
1. With SR-IOV compliant NIC, generate multiple Virtual NIC and assign them to the jails.
2. Virtual-Interface (drivers specific)
3. Using VLAN and assigning VLAN interfaces to each jail, the limitation is one jail per VLAN
maximum per Ethernet port.
4. Using if_bridge interface (and it’s possible to mix with VLAN, too) is the easiest setup but
there is some performance penalty when using if_bridge.
SR-IOV
This feature, initially designed for virtual machine use, creates multiple Virtual Function (VF =
Virtual NIC in our case). And by using the default non-passthrough mode, it will present multi-
ple virtual NIC to the host, each of which could be attached to vnet-jail.
Here is an example using two Chelsios interfaces (cxl0 and cxl1) to create 10 VF for each.
? }
? EOF
# cat > /etc/iovctl.cxl1.conf <<EOF
? PF {
? device : “cxl1”;
? num_vfs : 10;
? }
? EOF
# iovctl -C -f /etc/iovctl.cxl0.conf
# iovctl -C -f /etc/iovctl.cxl1.conf
# kldload if_cxgbev
# tail /var/log/messages
(...) kernel: t5vf18: <Chelsio T540-CR VF> at device 0.41 on pci4
(...) kernel: cxlv18: <port 0> on t5vf18
(...) kernel: cxlv18: Ethernet address: 06:44:2e:e5:90:18
(...) kernel: cxlv18: 2 txq, 1 rxq (NIC)
(...) kernel: t5vf18: 1 ports, 2 MSI-X interrupts, 4 eq, 2 iq
(...) kernel: t5vf19: <Chelsio T540-CR VF> at device 0.45 on pci4
(...) kernel: cxlv19: <port 0> on t5vf19
(...) kernel: cxlv19: Ethernet address: 06:44:2e:e5:90:19
(...) kernel: cxlv19: 2 txq, 1 rxq (NIC)
(...) kernel: t5vf19: 1 ports, 2 MSI-X interrupts, 4 eq, 2 iq
# ifconfig -l
cxl0 cxl1 igb0 lo0 cxlv0 cxlv1 cxlv2 cxlv3 cxlv4 cxlv5 cxlv6 cxlv7 cxlv8 cxlv9 cxlv10
cxlv11 cxlv12 cxlv13 cxlv14 cxlv15 cxlv16 cxlv17 cxlv18 cxlv19
Now we can keep cxl0 (=physical interface) for the host and all cxlvX interfaces can be as-
signed to a different jvnet-jail.
Notice that:
1. some NIC (like Intel) need more parameters (allow-promisc, allow-set-mac, mac-anti-
spoof) to allow specific usage like CARP with the VF.
2. A FreeBSD 12 with Intel ix(4) drivers couldn’t attach drivers to these VF.
ixv0: <Intel(R) PRO/10GbE Virtual Function Network Driver> at device 0.128 on pci4
ixv0: ...reset_hw() failure: Reset Failed!
ixv0: IFDI_ATTACH_PRE failed 5
device_attach: ixv0 attach returned 5
After a reboot the new interfaces (vcxlX) will be available and show in dmesg as:
Now you can assign these Virtual interfaces (vxclX) to the vnet-jail.
VLAN
Without a NIC supporting SR-IOV or Virtual-Interface features, one other possibility is to create
multiple VLAN and assign the VLAN interface to the vnet-jail. The restriction is that VLAN ID are
unique per interface, so if two vnet-jails need to be assigned a vlan sub-interface in the same
VLAN, you need to use two physical interfaces.
In this example, new interfaces igb0.6, igb0.7, and igb1.6 are available for the vnet-jails.
Bridge + epair
To remove the restriction of unique VLAN ID per physical interface, there is still the classical ap-
proach, but with some performance impact using bridge and epair setup.
FreeBSD host
igb0
bridge0
epair1a epair2a epair3a
epair1a
epair2a
epair3a
# ifconfig bridge0 inet 192.0.2.254/24 addm igb1 addm epair1a addm epair2a \
addm epair3a
# for i in $(jot 3); do jail -c name=jail$i host.hostname=jail$i persist vnet \
vnet.interface=epair${i}b; jexec jail$i ifconfig epair${i}b inet \
192.0.2.${i}/24 up; jexec jail$i ifconfig epair${i}b inet; done
epair1b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
inet 192.0.2.1 netmask 0xffffff00 broadcast 192.0.2.255
epair2b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
inet 192.0.2.2 netmask 0xffffff00 broadcast 192.0.2.255
epair3b: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
inet 192.0.2.3 netmask 0xffffff00 broadcast 192.0.2.255
# ping -c 2 192.0.2.1
PING 192.0.2.1 (192.0.2.1): 56 data bytes
64 bytes from 192.0.2.1: icmp_seq=0 ttl=64 time=0.158 ms
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=0.103 ms
Final Exercise
Now you should be able to set up this kind of lab on your own.
bridge5 epair5a
epair5b
Jail5
This small shell script is used to start 480 jails with a bird OSPF tuned to use large numbers
of neighbors on a shared link:
• MTU increased to 9000, allowing large numbers of neighbors (you can only have about
350 maximum OSPF neighbors with the default 1500 bytes MTU)
• Hello and dead interval increased to reduce multicast storm on the bridge interface
#!/bin/sh
set -eu
dec2dot () {
# $1 is a decimal number
# output is pointed decimal (IP address format)
printf '%d.%d.%d.%d\n' $(printf “%x\n” $1 | sed 's/../0x& /g')
}
# Need to increase some network value a little bit
# to avoid “No buffer space available” messages
# maximum number of mbuf clusters allowed
sysctl kern.ipc.nmbclusters=1000000
sysctl net.inet.raw.maxdgram=16384
sysctl net.inet.raw.recvspace=16384
# Start addressing shared LAN at 192.0.2.0 (in decimal to easily increment it)
ipepairbase=3221225984
# start addressing loopbacks at 198.51.100.0
iplobase=3325256704
ifconfig bridge create name vnetdemobridge mtu 9000 up
for i in $(jot 480); do
ifconfig epair$i create mtu 9000 up
ifconfig vnetdemobridge addm epair${i}a edge epair${i}a
jail -c name=jail$i host.hostname=jail$i persist \
vnet vnet.interface=epair${i}b
ipdot=$( dec2dot $(( iplobase + i)) )
jexec jail$i ifconfig lo1 create inet ${ipdot}/32 up
ipdot=$( dec2dot $(( ipepairbase + i)) )
jexec jail$i ifconfig epair${i}b inet ${ipdot}/20 mtu 9000 up
cat > /tmp/bird.${i}.conf <<EOF
protocol device {}
protocol kernel { ipv4 { export all; }; }
protocol ospf {
area 0 {
interface “epair${i}b” {
hello 60;
dead 240;
};
interface “lo1” {
stub yes;
};
};
}
EOF
Install net/bird2, execute this script, and after the OSPF DR/BDR election and database syn-
chronization, network traffic on the bridge interface should still be quite high with only the
OSPF keep-alives on the bridge interface:
After a few minutes, check number of neighbors detected (should be 479). DR/BDR election
should have chosen jail479 as BDR and jail480 as DR and number of learned routes.
The current system limit of this test is due to 4GB of RAM consumed by all the bird processes.
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
13529 root 1 20 0 32M 20M select 1 0:45 2.04% bird
13553 root 1 20 0 29M 17M select 3 0:41 0.71% bird
16459 root 1 20 0 14M 3568K CPU1 1 0:00 0.62% top
13512 root 1 20 0 20M 7372K select 2 0:03 0.39% bird
8003 root 1 20 0 20M 7316K select 0 0:03 0.38% bird
7913 root 1 20 0 20M 7260K select 0 0:03 0.38% bird
13466 root 1 20 0 20M 7172K select 0 0:03 0.34% bird
7887 root 1 20 0 20M 7288K select 1 0:03 0.33% bird
7832 root 1 20 0 20M 7260K select 2 0:03 0.33% bird
Here is the script to delete/clean up all the jails, but you should reboot because your system
will panic during this cleanup:
#!/bin/sh
set -eu
for i in $(jot 480); do
echo Deleting jail$i
jail -R jail$i
ifconfig epair${i}a destroy
rm /tmp/bird.$i.*
done
ifconfig vnetdemobridge destroy
Firewalls
pf and ipfw are vnet compliant, which allows the building of a multi-tenant firewall in an HA
scenario like the one presented here:
pfsync between
each pairs
Customer 1 Customer 2
(root access on jail 11 and 21) (root access on jail 12 and 22)
Sept/Oct 2019 23