转自:http://140.120.15.179/Presentation/20150120/index.html
Virtual networking: TUN/TAP, MacVLAN, and MacVTap
Purpose
Macvtap is a new device driver meant to simplify virtualized bridged networking. It replaces the combination of the tun/tap and bridge drivers with a single module based on the macvlan device driver. A macvtap endpoint is a character device that largely follows the tun/tap ioctl interface and can be used directly by kvm/qemu and other hypervisors that support the tun/tap interface. The endpoint extends an existing network interface, the lower device, and has its own mac address on the same ethernet segment. Typically, this is used to make both the guest and the host show up directly on the switch that the host is connected to.
Category
- TUN/TAP
- MacVLan/MacVTap
- MacVLan/MacVTap working mode
- Setting up macvtap (macvlan)
- Demo: OpenvSwitch vs MacVTap
- Performance measures
- User-mode Linux(UML) with MacVTap
- User-mode Linux(UML) with OpenvSwitch and MacVTap
1.TUN/TAP
Physical NIC network stack
TUN dervice
Simulate an UDP VPN process.
TAP dervice
Like TUN device, here is a list of the main differences between tun and tap.
- /dev/tunX works on IP layer (ip_forward)
- /dev/tapX work on MAC layer (bridge, MAC broadcast)
2.MacVLAN/MacVTap
MacVLan
Macvlan working mode
Macvlan work with namespace
MacVTap
Use /dev/tapX instead of network stack.
MacVLan and MacVTap are both working on MAC layer.
3.MacVLan/MacVTap working mode
Both macvlan and macvtap can be in one of four modes, defining the communication between macvtap endpoints on a single lower device:
1.Virtual Ethernet Port Aggregator (VEPA), the default mode: data from one endpoint to another endpoint on the same lower device gets sent down the lower device to external switch. If that switch supports the hairpin mode, the frames get sent back to the lower device and from there to the destination endpoint. Most switches today do not support hairpin mode, so the two endpoints are not able to exchange ethernet frames, although they might still be able to communicate using an tcp/ip router. A linux host used as the adjacent bridge can be put into hairpin mode by writing to /sys/class/net/dev/brif/port/hairpin_mode. This mode is particularly interesting if you want to manage the virtual machine networking at the switch level. A switch that is aware of the VEPA guests can enforce filtering and bandwidth limits per MAC address without the Linux host knowing about it.
2.Bridge mode: this works almost like a traditional bridge, in that data received on a macvlan in bridge mode and destined for another macvlan of the same lower device is sent directly to the target (if the target macvlan is also in bridge mode), rather than being sent outside. This of course works well with non-hairpin switches, and inter-VM traffic has better performance than VEPA mode, since the external round-trip is avoided.
- 3.Private mode: this is essentially like VEPA mode, but with the added feature that no macvlans on the same lower device can communicate, regardless of where the packets come from (so even if inter-VM traffic is sent back by a hairpin switch or an IP router, the target macvlan is prevented from receiving it). I haven't tried, but I suppose that it is the operating mode of the target macvlan that determines whether it receives the traffic or not. This mode is useful, of course, if we really want macvlan isolation.
- 4.Passthru mode: this mode was added later, to work around some limitation of macvlans (more details here). I'm not 100% clear on what's the problem passthru mode tries to solve, as I was able to set promiscuous mode, create bridges, vlans and sub-macv{lan,tap} interfaces in KVM guests using a plain macvtap in VEPA mode for their networking (so no need for passthru). Since I'm surely missing something, more information (as usual) is welcome.
4.Setting up macvtap (or macvlan)
A macvtap interface is created an configured using the ip link command from iproute2, in the same way as we configure macvlan or veth interfaces.
Example:
$ sudo ip link add link eth0 name macvtap0 address 52:54:00:b8:9c:58 type macvtap mode bridge
$ sudo ip link set macvtap0 up
$ ip link show macvtap0
Qemu on macvtap
Qemu as of 0.12 does not have direct support for macvtap, so we have to (ab)use the tun/tap configuration interface. To start a guest on the interface from the above example, we need to pass the device node as an open file descriptor to qemu and tell it about the mac address. The scripts normally used for bridge configuration must be disabled. A bash redirect can be used to open the character device in read/write mode and pass it as file descriptor 3.
$ qemu -net nic,model=virtio,addr=1a:46:0b:ca:bc:7b -net tap,fd=3 3<>/dev/tap11
5.Demo: OpenvSwitch vs MacVTap
Define MacVTap network and Openvswitch in liabvirt
$ cat network.xml
<network>
<name>ovs-bridge-eth1</name>
<forward mode='bridge'/>
<bridge name='br1'/>
<virtualport type='openvswitch'>
<parameters interfaceid='bffd2747-4b84-44b5-bdf4-faede6e413c5'/>
</virtualport>
</network>
<network>
<name>macvtap-bridge-eth1</name>
<forward mode="bridge">
<interface dev="eth1"/>
</forward>
</network>
<network>
<name>macvtap-vepa-eth2</name>
<forward mode="vepa">
<interface dev="eth2"/>
</forward>
</network>
<network>
<name>macvtap-bridge-eth2</name>
<forward mode="bridge">
<interface dev="eth2"/>
</forward>
</network>
$ sudo virsh net-define network.xml
A part of libvirt XML definition of MacVTap
... <devices> <interface type='network'> <mac address='52:54:00:0a:8a:c1'/> <source network='ovs-bridge-eth1'/> <model type='rtl8139'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/> </interface> <interface type='network'> <mac address='52:54:00:0a:8a:c2'/> <source network='macvtap-bridge-eth2'/> <model type='rtl8139'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/> </interface> ... </devices> $ sudo virsh create VM0.xml
Connect host via macvlan
$ cat setmacvlan
#! /bin/bash
sudo ip link add link eth2 name vlan0 address 50:e5:49:45:76:db type macvlan mode bridge
sudo ifconfig eth2 0.0.0.0
sudo ifconfig vlan0 172.16.100.101/24
$ cat delmacvlan
#! /bin/bash
sudo ifconfig vlan0 down
sudo ifconfig eth2 172.16.100.101/24
sudo ip link del dev vlan0
OpenVSwitch hairping mode
$ cat hairpin-On.sh
#! /bin/bash
sudo ovs-ofctl add-flow brPrivate2 actions=all,in_port
sudo ovs-ofctl dump-flows brPrivate2
$ cat hairpin-Off.sh
#! /bin/bash
sudo ovs-ofctl del-flows brPrivate2
sudo ovs-ofctl add-flow brPrivate2 priority=0,actions=normal
sudo ovs-ofctl dump-flows brPrivate2
Trace libvirt commands
$ ps aux |grep kvm libvirt+ 4486 1.7 2.8 4676344 237544 ? Sl Jan19 12:32 qemu-system-x86_64 -enable-kvm -name VM2 -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu qemu64 -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 3adebedc-eb5c-4a26-96fe-5933861c4abd -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/VM2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/home/cloud/libvirt/vm2.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:0a:8a:c5,bus=pci.0,multifunction=on,addr=0x3 -netdev tap,fd=27,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:00:0a:8a:c6,bus=pci.0,addr=0x3.0x1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5902,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -chardev spicevmc,id=charredir2,name=usbredir -device usb-redir,chardev=charredir2,id=redir2 -chardev spicevmc,id=charredir3,name=usbredir -device usb-redir,chardev=charredir3,id=redir3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
Performances measures
$ cat start-MacVTap0-AsDaemon
#! /bin/bash
MACaddr='52:54:00:b8:9c:58'
# Don't Edit, File automatically generated by Config-Kvm-vhoston script
if [ $EUID -ne 0 ]
then sudo echo "Super User passwd, please:"
if [ $? -ne 0 ]
then echo "Sorry, need su privilege!"
exit 1
fi
fi
sudo ip link add link eth0 name macvtap0 address ${MACaddr} type macvtap mode bridge
sleep 2
sudo ip link set dev macvtap0 up
TAPNUM=$(< /sys/class/net/macvtap0/ifindex)
sudo chmod 666 /dev/tap${TAPNUM}
vhostOn.sh
mkdir /src3/KVM/network-11586
echo "Starting VM: MacVTap0..., mem=${MEM}"
screen -S MacVTap0 -d -m run-MacVTap0-AsDaemon
$ cat run-MacVTap0-AsDaemon
#! /bin/bash
MEM=512M
MACaddr=$(< /sys/class/net/macvtap0/address)
TAPNUM=$(< /sys/class/net/macvtap0/ifindex)
qemu-system-x86_64 -name MacVTap0 -localtime -curses \
-m ${MEM} -enable-kvm \
-monitor unix:/src3/KVM/network-11586/MonSock,server,nowait \
-netdev tap,fd=3,id=hostnet0,vhost=on \
-net nic,vlan=0,netdev=hostnet0,macaddr=${MACaddr},model=virtio \
-drive index=0,media=disk,if=virtio,file=../img/MacVLan0.img 3<>/dev/tap${TAPNUM}
$ cat stop-MacVTap0-restore-lan
#! /bin/bash
# Don't Edit, File automatically generated by Config-Kvm-vhoston script
if [ $EUID -ne 0 ]
then sudo echo "Super User passwd, please:"
if [ $? -ne 0 ]
then echo "Sorry, need su privilege!"
exit 1
fi
fi
if [ -S /src3/KVM/network-11586/MonSock ]; then
echo "system_powerdown" | socat - unix-connect:/src3/KVM/network-11586/MonSock
echo "Please wait 10 seconds."
sleep 10
else
echo "Socket has been removed! Shutdown by ssh or resotre Lan only."
fi
ping -c 3 192.168.180.200
if [ $? -eq 0 ]; then
echo "MacVTap0 still alive, shut it down. Enter passwd twice!"
ssh -t jssu@192.168.180.200 'sudo init 0'
else
rm -rf /src3/KVM/network-11586
fi
echo "Restore lan..."
if [ -d /proc/sys/net/ipv4/conf/macvtap0 ]; then
sudo ip link set dev macvtap0 down
sudo ip link delete macvtap0
fi
7.User-mode Linux(UML) with MacVTap
$ cat startUML-MacVTap-AsDaemon
#! /bin/bash
MACaddr='50:e5:49:b8:9c:01'
#############################################################
IsThereTapDevice()
{
declare int i=0;
for devices in `find /sys/class/net -type l -name "vtap*"`
do
((i++));
done
if [ ${i} -gt 0 ]
then echo "Yes"
else echo "No"
fi
}
#############################################################
sudo echo Need SU passwd:
if [ `IsThereTapDevice` = "No" ]
then sudo iptables --flush
sudo iptables --table nat --flush
sudo iptables --delete-chain
sudo iptables --table nat --delete-chain
sudo iptables --table nat --append POSTROUTING --out-interface eth1 -j MASQUERADE
fi
#sudo tunctl -u jssu -t tap0
sudo ip link add link eth0 name vtap0 address ${MACaddr} type macvtap mode bridge
sleep 2
sudo ip link set dev vtap0 up
TAPNUM=$(< /sys/class/net/vtap0/ifindex)
sudo chmod 666 /dev/tap${TAPNUM}
sudo sysctl net.ipv4.ip_forward=1
#sudo arp -Ds 192.168.180.101 eth0 pub
sudo screen -S MacVTap -d -m linux.uml \
ubd0=DebJes-MacVTap.ext4 \
eth0=tuntap,tap${TAPNUM} \
mem=1024M \
con=pty con0=fd:0,fd:1 umid=MacVTap
sleep 30
sudo ifconfig tap${TAPNUM} 192.168.180.3 netmask 255.255.255.255 up
sudo sysctl net.ipv4.conf.tap${TAPNUM}.proxy_arp=1
sudo route add -host 192.168.180.101 dev tap${TAPNUM}
$ cat stop-uml-restore-lan-MacVTap
#############################################################
IsThereTapDevice()
{
declare int i=0;
for devices in `find /sys/class/net -type l -name "vtap*"`
do
((i++));
done
if [ ${i} -gt 0 ]
then echo "Yes"
else echo "No"
fi
}
#############################################################
sudo echo Need SU passwd:
sudo uml_mconsole MacVTap sysrq s
sudo uml_mconsole MacVTap sysrq u
sudo uml_mconsole MacVTap sysrq e
sudo uml_mconsole MacVTap halt
TAPNUM=$(< /sys/class/net/vtap0/ifindex)
sudo ifconfig tap${TAPNUM} 192.168.180.3 down
sudo sysctl net.ipv4.conf.tap${TAPNUM}.proxy_arp=0
sudo ip link set dev vtap0 down
sudo ip link delete vtap0
if [ `IsThereTapDevice` = "No" ]; then
sudo sysctl net.ipv4.ip_forward=0
sudo iptables --flush
sudo iptables --table nat --flush
sudo iptables --delete-chain
sudo iptables --table nat --delete-chain
fi
Known issues
- Use tap device rather than tun device.
- We can't configure the tap device until the VM booted on.
- Like uml-switch mode instead of bridge mode.
- Can not run in user mode.
TUNSETIFF failed, errno = 1
SIOCSIFFLAGS: Operation not permitted
- I have got the error message when transfer packets through VDE swtch.
uml_net_start_xmit: failed(-1)
8.User-mode Linux(UML) with OpenvSwitch MacVTap
$ cat startUML-OVS-MacVTap-AsDaemon
#! /bin/bash
MACaddr='50:e5:49:b8:9c:01'
sudo ip link add link brLAN name vtap0 address ${MACaddr} type macvtap mode bridge
sleep 2
sudo ip link set dev vtap0 up
TAPNUM=$(< /sys/class/net/vtap0/ifindex)
sudo chmod 666 /dev/tap${TAPNUM}
# sudo sysctl net.ipv4.ip_forward=1
# sudo arp -Ds 192.168.180.101 eth0 pub
sudo screen -S MacVTap -d -m linux.uml \
ubd0=DebJes-MacVTap.ext4 \
eth0=tuntap,tap${TAPNUM} mem=1024M \
con=pty con0=fd:0,fd:1 umid=MacVTap
sleep 30
sudo ovs-vsctl add-port brLAN tap${TAPNUM}
sudo ifconfig tap${TAPNUM} 0.0.0.0 up
$ cat stop-uml-restore-lan-OVS-MacVTap
#! /bin/bash
sudo echo Need SU passwd:
sudo uml_mconsole MacVTap sysrq s
sudo uml_mconsole MacVTap sysrq u
sudo uml_mconsole MacVTap sysrq e
sudo uml_mconsole MacVTap halt
TAPNUM=$(< /sys/class/net/vtap0/ifindex)
sudo ovs-vsctl del-port brLAN tap${TAPNUM}
sudo ip link set dev vtap0 down
sudo ip link delete vtap0
Known issues
- We couldn't configure the tap device until the VM booted on.
- Need root permission.