This document introduces a library operating system approach for using the Linux network stack in userspace. Some key points:
- It describes building the Linux network stack (including components like ARP, TCP/IP, Qdisc, etc) as a library that can be loaded and used in userspace.
- This allows flexible experimentation with and testing of new network stack ideas without modifying the kernel. Code can be added and tested through the library interface.
- Implementations described include directly executing the code (DCE) and using it to integrate with a network simulator, as well as a Network Stack in Userspace (NUSE) that provides a full-featured POSIX-like platform for the network stack in user
1 of 40
Downloaded 276 times
More Related Content
Library Operating System for Linux #netdev01
1. Library Operating
System with Mainline
Linux Network Stack
!
Hajime Tazaki, Ryo Nakamura, Yuji Sekiya
netdev0.1, Feb. 2015
2. Motivation
Why kernel space ?
Packets were expensive in 1970’
Why not userspace ?
well grown in decades, costs degrades
obtain network stack personalization
controllable by userspace utilities
2
3. Userspace network stacks
A lot of userspace network stack
full scratch: mTCP, Mirage, lwIP
Porting: OSv, Sandstorm, libuinet (FreeBSD),
Arrakis (lwIP), OpenOnload (lwIP?)
Motivated by their own problems (specialized NIC,
cloud, high-speed Apps)
Writing a network stack is 1-week DIY,
but writing opera-table network stack is decades
DIY (which is not DIY)
3
4. Questions
How to benefit matured network stack
in userspace ?
How to trivially introduce your idea
on network stack ?
xxTCP, IPvX, etc..
How to flexibly test your code with a
complex scenario ?
4
5. The answers
Using Linux network stack as-is
!
as a userspace Library (library
operating system)
5
6. This talk is about
an introduction of a library
operating system for Linux
and its implementation
with a couple of useful use cases
6
9. Kernel glue code
9
https://github.com/libos-nuse/net-next-nuse/blob/nuse/arch/lib/sched.c
void schedule(void)!
{!
! lib_task_wait();!
}!
signed long schedule_timeout(signed long timeout)!
{!
! u64 ns;!
! struct SimTask *self;!
!
! if (timeout == MAX_SCHEDULE_TIMEOUT) {!
! ! lib_task_wait();!
! ! return MAX_SCHEDULE_TIMEOUT;!
! }!
! lib_assert(timeout >= 0);!
! ns = ((__u64)timeout) * (1000000000 / HZ);!
! self = lib_task_current();!
! lib_event_schedule_ns(ns, &trampoline, self);!
! lib_task_wait();!
! /* we know that we are always perfectly on time. */!
! return 0;!
}
10. POSIX glue code
10
https://github.com/libos-nuse/net-next-nuse/blob/nuse/arch/lib/nuse-glue.c
int nuse_socket(int domain, int type, int protocol)!
{!
! lib_update_jiffies();!
! struct socket *kernel_socket = malloc(sizeof(struct socket));!
! int ret, real_fd;!
!
! memset(kernel_socket, 0, sizeof(struct socket));!
! ret = lib_sock_socket(domain, type, protocol, &kernel_socket);!
! if (ret < 0)!
! ! errno = -ret;!
(snip)!
! lib_softirq_wakeup();!
! return real_fd;!
}!
weak_alias(nuse_socket, socket);
11. Implementations
(Instances)
Direct Code Execution (DCE)
network simulator integration (ns-3)
for more testing
Network Stack in Userspace (NUSE)
gives new platform of Linux network stack
for ad-hoc network stack
11
12. Direct Code Execution
ns-3 integration
deterministic scheduler
single-process model virtualization
dlmopen(3)-like virtualization
full control over multiple network stacks
12
16. Network Stack in
Userspace
Userspace network stack running on
Linux (POSIX) platform
Network stack personalization
Full features by design (full stack)
ARP/ND, UDP/TCP (all cc algorithm), SCTP,
DCCP, QDISC, XFRM, netfilter, etc.
16
30. Limitations
ad-hoc kernel glues required
when we changed a member of a struct,
LibOS needs to follow it
Performance drawbacks on NUSE
adapt known techniques (mTCP)
30
31. (not) Conclusions
An abstraction for multiple benefits
Conservative
Use past decades effort as much
with a small amount of effort
Planing to RFC for upstreaming
31
34. Bug reproducibility
34
Wi-Fi Wi-Fi
Home Agent
AP1 AP2
handoff
ping6
mobile node
correspondent
node
(gdb) b mip6_mh_filter if dce_debug_nodeid()==0
Breakpoint 1 at 0x7ffff287c569: file net/ipv6/mip6.c, line 88.
<continue>
(gdb) bt 4
#0 mip6_mh_filter
(sk=0x7ffff7f69e10, skb=0x7ffff7cde8b0)
at net/ipv6/mip6.c:109
#1 0x00007ffff2831418 in ipv6_raw_deliver
(skb=0x7ffff7cde8b0, nexthdr=135)
at net/ipv6/raw.c:199
#2 0x00007ffff2831697 in raw6_local_deliver
(skb=0x7ffff7cde8b0, nexthdr=135)
at net/ipv6/raw.c:232
#3 0x00007ffff27e6068 in ip6_input_finish
(skb=0x7ffff7cde8b0)
at net/ipv6/ip6_input.c:197
35. Debugging
Memory error detection
among distributed nodes
in a single process
using Valgrind
!
!
35
==5864== Memcheck, a memory error detector
==5864== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==5864== UsingValgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright in
==5864== Command: ../build/bin/ns3test-dce-vdl --verbose
==5864==
==5864== Conditional jump or move depends on uninitialised value(s)
==5864== at 0x7D5AE32: tcp_parse_options (tcp_input.c:3782)
==5864== by 0x7D65DCB: tcp_check_req (tcp_minisocks.c:532)
==5864== by 0x7D63B09: tcp_v4_hnd_req (tcp_ipv4.c:1496)
==5864== by 0x7D63CB4: tcp_v4_do_rcv (tcp_ipv4.c:1576)
==5864== by 0x7D6439C: tcp_v4_rcv (tcp_ipv4.c:1696)
==5864== by 0x7D447CC: ip_local_deliver_finish (ip_input.c:226)
==5864== by 0x7D442E4: ip_rcv_finish (dst.h:318)
==5864== by 0x7D2313F: process_backlog (dev.c:3368)
==5864== by 0x7D23455: net_rx_action (dev.c:3526)
==5864== by 0x7CF2477: do_softirq (softirq.c:65)
==5864== by 0x7CF2544: softirq_task_function (softirq.c:21)
==5864== by 0x4FA2BE1: ns3::TaskManager::Trampoline(void*) (task-manage
==5864== Uninitialised value was created by a stack allocation
==5864== at 0x7D65B30: tcp_check_req (tcp_minisocks.c:522)
==5864==