TUN/TAP Demystified

May 21, 2016

Usage Overview

In general, using TUN/TAP devices is surprisingly simple. There are a lot of advanced features, but you’ll need to read the actual documentation for more on that. We’re only going to cover the basics here.

The general workflow looks like this:

Open /dev/net/tun.
Call the TUNSETIFF ioctl to select the device mode and options.
Read packets from, and write packets to, the file descriptor.
Close the device.

It’s really that simple – but, as always, the devil is in the details.

TUN Mode

The first of the two modes for the TUN/TAP driver is TUN mode, of course.

When you place the TUN/TAP device in TUN mode, the data you’ll receive from the file descriptor will be in the form of network protocol packets. For example, if your network is based on IP, you’ll receive IPv4 or IPv6 packets. When you write data back to the device, it must also be in the form of valid protocol packets.

This is the primary mode in which VPN tunnels operate, for obvious reasons. It’s a fairly simple matter to take in a packet, ship it over some other network connection, and spit it out on a similar TUN device on a remote machine. The hard part of VPNs isn’t moving the packets around; it’s how to do so securely, which is left up to the developer.

I can also see some other odd use cases for this; for example, if you needed something that could simulate real network traffic for debugging or load testing purposes. If you had a good algorithm for it (not unlike an NPC AI in a video game, I’d guess), then you could create fake traffic and send it over a TUN connection. It would look like real IP traffic as far as the host system is concerned.

The nice thing about TUN interfaces is that you don’t have to worry about lower-level issues like ARP.

TAP Mode

TAP mode, on the other hand, is much more interesting to developers of virtualization solutions. It operates in much the same way as TUN mode, but with one major exception: you get raw ethernet frames instead of protocol packets. As far as the kernel is concerned, your application is just another ethernet device.

If you’re actually processing these packets instead of just shipping them somewhere else, that means that you have to handle everything, including things like ARP requests and replies. The applications that consume TAP devices most often implement their own protocol stacks, and they don’t want the host interfering with their interpretation of things. The TAP device provides the virtualized ethernet connection.

Device Options

When calling TUNSETIFF, there are several options that can be specified (and more besides, but they’re too advanced to cover in this article):

IFF_TUN - Allocate the device in TUN mode.
IFF_TAP - Allocate the device in TAP mode.
IFF_NO_PI - Do not prepend a protocol information header.

The first two are obvious; set IFF_TUN if you want a TUN device, or IFF_TAP if you want a TAP. IFF_NO_PI, on the other hand, is a bit less obvious, and important to understand.

Without IFF_NO_PI, the driver will send you two bytes of flags, two bytes of protocol type, and then the actual network packet that the header corresponds with. Since the first two values are largely redundant, most applications will probably want to set this flag. I could be wrong about this, however; documentation on the TUN/TAP driver is poor, to say the least.

There are a number of other advanced features supported (such as IFF_MULTI_QUEUE), but the documentation is extremely slim, and they’re not needed for a basic use case, which is all I’m intending to cover here.