Basic Namespacing

The problem

Since I want a system in which programs interact via networking protocols (mainly HTTP), I need a way for a bunch of programs to include a server for the same protocol, ideally without having to add any extra requirements for the code and without exposing stuff to the internet. This might sound like a bit of a problem, but Linux has a few nice features that will make it possible.

First are namespaces. Namespaces allow users to isolate some aspect of processes from the rest of the system. There are many namespaces: User, PID, mount, time... Tools like Docker utilize them quite a bit, but for now, I will be focusing only on the network namespace.

Each process will be in its own network namespace with an IP address assigned based on its PID. I originally wanted to use IPv6 only, but since I also need each process to have access to the wider internet, both IPv6 and IPv4 will be used.

But how will I force new processes to get a namespace? Well, this is where the second tool comes into play. As I want the entire OS to be runnable from within docker, I can't interfere with the kernel in any way, but luckily, I don't need to.

Introducing ptrace(2). Ptrace is a special syscall, which allows you to, between other things, intercept syscalls of other programs. Its main use case is within debuggers, but I'll be using it for something a bit different.

When user starts a new session, their login shell will be actually started with this program, let's call it 'overseer', which will put it inside a new network namespace. Overseer will also intercept any syscalls that start a new process and also put them into a new namespace, while also starting to monitor them for syscalls. I'm not sure how well will it play with systemd, but I'm sure I'll figure something out.

So what do I have

Well, I have a repo... Currently on this commit: '850bbd1f45866e1efff3fbd82cbc9f371491a3e2'

Currently, there is no overseer to speak of, just shell scripts. The overseer itself will be written in Go (as I've chosen it as the primary compiled language of the system), and some shell scripts (mainly the pid2ip) will probably be rewritten as well. But namespaces and networking are just best controlled via the shell.

The shell scripts currently needs to be ran as root, but that will be eventually solved via UID bits.

So, where do I start... Probably in 'net-setup'.

#!/bin/bash

# requires in sudoers file
# Defaults !secure_path

PATH="$PATH:$(dirname "$0")"

sysctl net.ipv4.ip_forward=1
sysctl net.ipv6.conf.default.forwarding=1

ip link add name ns-bridge type bridge
ip link set ns-bridge up

# takes place of PID 1, which should not be included anyways

# 240.0.0.0/4 for experimental use
ip addr add 240.0.0.1/8 dev ns-bridge
# I THINK this falls under the depracated IPv4-Compatible block...
ip addr add ::FFFF:1/96 dev ns-bridge

# add NAT, so that we can internet
iptables -P FORWARD ACCEPT
iptables -t nat -A POSTROUTING -s 240.0.0.0/8 -j MASQUERADE
ip6tables -P FORWARD ACCEPT
ip6tables -t nat -A POSTROUTING -s ::/96 -j MASQUERADE

There is mostly just some routing setup, but the main part is the bridge. A bridge interface is a network interface that is used to connect multiple interfaces together. Think of it as a virtual switch.

The 'PATH="$PATH:$(dirname "$0")"' is in all the scripts just so that they can see each other without the need to install them.

Since I don't want to interfere with other networks, I couldn't use any of the local IP ranges, as they might be used by the users home network. Instead, I decided to use a reserved IPv4 address space, as I don't think it will ever be officially used anymore, and a deprecated IPv6 block originally used for IPv4 compatibility. This is the one used in plan9, but as far as my knowledge goes, it has been replaced by new one for some reason.

Also, I don't use '::1', because it's the loopback and I don't use '::', because it's officially not defined, so it could cause some problems.

And yes, the firewall rules are not final, this entire script is there mostly just for development, purposes.

From there, 'pid2ip' sounds about right:

#!/bin/bash

pid="false"
v4="false"

for arg in "$@"; do
  if [ "$arg" = "-4" ]; then
    v4="true"
  elif [ "$arg" = "-6" ]; then
    v4="false"
  else
    pid="$arg"
  fi
done

if [ "$pid" = "" ]; then
  echo "usage: $0 <pid> [-o]"
  echo "  -o  outer address"
  exit 1
fi

pid_hex=$(printf "%06x" $pid)
ip=""

if [ "$v4" = "true" ]; then
  # 240.0.0.0/4 for experimental use
  ip="240.$((0x${pid_hex:0:2})).$((0x${pid_hex:2:2})).$((0x${pid_hex:4:2}))"
else
  # I THINK this falls under the depracated IPv4-Compatible block...
  ip="::${pid_hex:0:2}:${pid_hex:2:4}"
fi

echo "$ip"
exit 0

This generates a unique IP address based on the PID given. Yes, this was mostly done with AI, as my shelling skills are not nearly good enough for this, but it's easy to understand once written. This is not only used within the other scripts, but it will also be used by other processes to get the IP of the process they want to contact, which they will then do via resolving filesystems with simple file I/O operations.

But there will also be a way for processes to register some form of domain name withing the overseer filesystem. (Yes, I'm going real hard on filesystems with this.)

'ns-create':

#!/bin/bash

PATH="$PATH:$(dirname "$0")"

if [ "$#" -ne 1 ]; then
  echo "usage: $0 <pid>"
  exit 1
fi

pid="$1"
ns="ns-$pid"
ipv6="$(pid2ip "$pid")"
ipv4="$(pid2ip "$pid" -4)"

# create network namespace
ip netns add $ns
if [ $? -ne 0 ]; then
  echo "cannot create namespace for PID $pid"
  exit 1
fi

# enable loopback
ip -n "$ns" link set dev lo up

# add a virtual ethernet between namespaces
veth_in="veth-in-$pid"
veth_out="veth-out-$pid"
ip link add "$veth_in" type veth peer name "$veth_out"
ip link set "$veth_in" netns "$ns"

ip link set dev "$veth_out" up
ip -n "$ns" link set dev "$veth_in" up

# bind to bridge
ip link set dev "$veth_out" master ns-bridge

# setup ip addressing
ip -n "$ns" addr add "$ipv6/96" dev "$veth_in" 
ip -n "$ns" addr add "$ipv4/8" dev "$veth_in" 

# and routing
ip -n "$ns" route add default via ::FFFF:1
ip -n "$ns" route add default via 240.0.0.1

echo "$ns"

This one is the main script of them all. It creates the namespace based on PID.

First thing it does is that it creates a new network namespace. Normally, namespaces are handled via unshare(2), but network namespaces are usually handled via ip(8).

After the namespace exists, I create a virtual ethernet with a peer. Think of it as a virtual network cable. I put one end into my virtual switch and the other into the new namespace.

Then just add the IP addresses and some routes and the namespace is ready for its single inhabitant. (well, there will be an option to keep all children of specific PID within its namespace, but that is not the point right now.)

Now just add the process via 'ns-launch':

#!/bin/bash

PATH="$PATH:$(dirname "$0")"

if [ "$#" -lt 2 ]; then
  echo "usage: $0 <user> <command> [command flags]"
fi

ns="$(ns-create $$)"

exec ip netns exec "$ns" sudo -Eu "$@"

Yes, I use sudo(8) to launch the program as a specific user Due to all the networking and namespacing, it can only be operated by root.

This is also why you need

Defaults !secure_path

in sudoers, so that your environment variables persist.

Then there is just the humble 'ns-destroy':

#!/bin/bash

PATH="$PATH:$(dirname "$0")"

if [ "$#" -ne 1 ]; then
  echo "usage: $0 <pid>"
  exit 1
fi

pid="$1"
ns="ns-$pid"

# also destroys the virtual ethernet
ip netns del "$ns"

Yes, this also somehow deletes the virtual ethernet.

That's basically all I have right now, but it's a good start. Now just actually tie it to ptrace, add some filesystems and all the other things.

Resources

Those are some nice resources that helped me understand network namespaces and their operation:

gilesthomas - fun-with-network-namespaces

frfahim - network-namespaces