Kernel
Kernel
Kernel
Rami Rosen
[email protected]
Haifux, August 2007
Disclaimer
Everything in this lecture shall not, under any
circumstances, hold any legal liability whatsoever.
Any usage of the data and information in this document
shall be solely on the responsibility of the user.
This lecture is not given on behalf of any company
or organization.
Warning
The layers that we will deal with (based on the 7 layers model) are:
Link Layer (L2) (ethernet)
Network Layer (L3) (ip)
Transport Layer (L4) (udp,tcp...)
Networking Data Structures
struct dst_entry *dst the route for this sk_buff; this route is
determined by the routing subsystem.
In the usual case, there is only one dst_entry for every skb.
When using IPSec, there is a linked list of dst_entries and only the
last one is for routing; all other dst_entries are for IPSec
transformers ; these other dst_entries have the DST_NOHASH
flag set.
Important members:
net_device - contd
Each protocol has mtu of its own; the default is 1500 for Ethernet.
you can change the mtu with ifconfig; for example,like this:
unsigned int flags - (which you see or set using ifconfig utility):
for example, RUNNING or NOARP.
Most of the nics are PCI devices; there are also some USB
network devices.
The drivers for network PCI devices use the generic PCI calls, like
pci_register_driver() and pci_enable_device().
For more info on nic drives see the article Writing Network
Device Driver for Linux (link no. 9 in links) and chap17 in ldd3.
In order that the nic will work in polling mode it should be built
with a proper flag.
When working with NAPI and when there is a very high load,
packets are lost; but this occurs before they are fed into the
network stack. (in the non-NAPI driver they pass into the stack)
The routing table and the routing cache enable us to find the net
device and the address of the host to which a packet will be sent.
There are two routing tables by default: (non Policy Routing case)
See : include/net/ip_fib.h.
Routing Subsystem - contd.
Routes can be added into the main routing table in one of 3 ways:
By routing daemons.
In case it does not find an entry, it looks in the main FIB table
(ip_fib_main_table).
(instead of CONFIG_IP_FIB_HASH)
...
Policy Routing: add/delete a rule - example
tb_lookup()
tb_insert()
tb_delete()
----------------------
struct fn_zone
struct fn_zone
...
...
struct fn_zone
fib_table
fz_hash
struct fn_zone
33
hlist_head
hlist_head
...
hlist_head
fn_alias
fn_key
struct fib_node
fa_info
struct fib_alias
fib_nh
struct fib_info
fz_divisor
hlist_head
fn_alias
fib_node
fn_key
Routing Tables
See : net/ethernet/eth.c
This division of methods into two stages (where the second has
the same name with the suffix finish or slow, is typical for
networking kernel code.)
ip_rcv_finish() implementation:
if (skb->dst == NULL) {
int err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos,
skb->dev);
...
}
...
return dst_input(skb);
Receiving a packet - contd
ip_route_input():
First performs a lookup in the routing cache to see if there is a
match. If there is no match (cache miss), calls
ip_route_input_slow() to perform a lookup in the routing table.
(This lookup is done by calling fib_lookup()).
If the frame is for local delivery , we will set the input() function
pointer of the route to ip_local_deliver():
rth->u.dst.input= ip_local_deliver;
Prototype:
ip_local_deliver(struct sk_buff *skb) (net/ipv4/ip_input.c).
- calls NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev,
NULL,ip_local_deliver_finish);
Prototype:
(net/ipv4/ip_forward.c)
dst->neighbour->output(skb)
dst->output = ip_output
dst->input = ip_local_deliver
See: net/ipv4/route.c
Multipath routing
It enables:
Filtering
Connection Tracking
Netfilter rule - example
Short example:
ip_rcv()
See /net/netfilter/core.c.
ICMP redirect message
/proc/sys/net/ipv4/conf/all/send_redirects should be 1.
Example:
ICMP redirect message - contd.
ARP table.
Type (2 bytes).
see: include/linux/if_ether.h
Neighboring Subsystem - contd
You can delete and add entries to the arp table; see man arp.
Bridging Subsystem
Example:
brctl show
Bridging Subsystem - contd.
RFC2401
Transformation bundles.
example: pptp
ipsec verify Check your system to see if IPsec got installed and
started correctly.
iph->daddr == 0x0A00A8C0 or
means checking if the address is 192.168.0.10 (C0=192,A8=168,
00=0,0A=10).
Tips for hacking - Contd.
echo 1 >/proc/sys/net/ipv4/icmp_echo_ignore_all
Disable arp: ip link set eth0 arp off (the NOARP flag will be set)
IO/AT.
TCP Offloading.
RDMA.
http://linux-net.osdl.org/index.php/Main_Page
5) netdev mailing list: http://www.spinics.net/lists/netdev/
Links and more info
6) Removal of multipath routing cache from kernel code:
http://lists.openwall.net/netdev/2007/03/12/76
http://lwn.net/Articles/241465/
7) Linux Advanced Routing & Traffic Control :
http://lartc.org/
8) ebtables a filtering tool for a bridging:
http://ebtables.sourceforge.net/
Links and more info
9) Writing Network Device Driver for Linux: (article)
http://app.linux.org.mt/article/writing-netdrivers?locale=en
Links and more info
10) Netconf a yearly networking conference; first was in 2004.
http://vger.kernel.org/netconf2004.html
http://vger.kernel.org/netconf2005.html
http://vger.kernel.org/netconf2006.html
David S. Miller, James Morris , Rusty Russell , Jamal Hadi Salim ,Stephen Hemminger
, Harald Welte, Hideaki YOSHIFUJI, Herbert Xu ,Thomas Graf ,Robert Olsson ,Arnaldo
Carvalho de Melo and others
Links and more info
11) Policy Routing With Linux - Online Book Edition
http://www.policyrouting.org/PolicyRoutingBook/
12) THRASH - A dynamic LC-trie and hash data structure:
Robert Olsson Stefan Nilsson, August 2006
http://www.csc.kth.se/~snilsson/public/papers/trash/trash.pdf
13) IPSec howto:
http://www.ipsec-howto.org/t1.html
Links and more info
14) Openswan: Building and Integrating Virtual Private
Networks , by Paul Wouters, Ken Bantoft
http://www.packtpub.com/book/openswan/mid/061205jqdnh2by
publisher: Packt Publishing.
Linux Kernel Networking-
advanced topics:
Neighboring and IPsec
Rami Rosen
[email protected]
Haifux, January 2008
www.haifux.org
Contents
Neighboring Subsystem
struct neighbour
arp
arp_bind_neighbour() method
Neighbour states
IPsec
Scope
We will not deal with multicast and with ipv6 and with wireless.
04-Dec-2007
The layers that we will deal with (based on the 7 layers model)
are:
Transport Layer (L4) (udp,tcp...)
Network Layer (L3) (ip)
Link Layer (L2) (ethernet)
Short rehearsal (4 slides)
L4 confirmation.
Receive ARP reply for the first time or receiving an ARP reply
in response to an ARP request when in NUD_PROBE state.
arp_queue
This is configurable:/proc/sys/net/ipv4/neigh/default/unres_qlen
struct neigh_table
(/include/net/neighbour.h)
You can delete and add entries to the arp table; see man arp/man
ip.
When using ip neigh add you can specify the state of the entry
which you are adding (like permanent,stale,reachable, etc).
Neighboring Subsystem arp table
Example :
ip neigh show
192.168.0.254 dev eth0 lladdr 00:03:27:f1:a1:31 REACHABLE
192.168.0.152 dev eth0 lladdr 00:00:00:cc:bb:aa STALE
192.168.0.121 dev eth0 lladdr 00:10:18:1b:1c:14 PERMANENT
192.168.0.54 dev eth0 lladdr aa:ab:ac:ad:ae:af STALE
192.168.0.98 dev eth0 INCOMPLETE
Neighboring Subsystem arp
net/ipv4/arp.c
Why ?
arp_send(ARPOP_REPLY,ETH_P_ARP,sip,dev,tip,sha
,dev->dev_addr,sha);
We also update our arp table with the sender entry (ip/mac).
CONFIG_IP_ACCEPT_UNSOLICITED_ARP
Neighboring Subsystem lookup
neigh_table (arp_tbl)
dev (net_device)
__neigh_lookup()
and __neigh_lookup_errno()
Neighboring Subsystem static entries
arp_bind_neighbour(): net/ipv4/arp.c
dst->neighbour=NULL, so it calls__neigh_lookup_errno().
/etc/sysconfig/network-scripts/ifcfg-eth1
...
IPADDR=192.168.0.122
...
and than run:
ifup eth1
Changing IP address - contd.
we will get:
Error, some other host already uses address
192.168.0.122.
But:
works ok !
Why is it so ?
Garbage Collection
neigh_periodic_timer()
neigh_timer_handler()
http://www.linuxvirtualserver.org/
Integrated into the Linux kernel (in 2.4 kernel it was a patch).
Example: 3 Real Servers and the Director all have the same
Virtual IP (VIP).
Real Server 3
Linux Director
Real Server 1
Real Server 2
VIP
VIP
VIP
VIP (Virtual IP)
clients
LVS and ARP
Solutions
1) Set ARP_IGNORE to 1:
NF_ARP_FORWARD ( in br_nf_forward_arp(),
net/bridge/br_netfilter.c)
LVS and ARP
http://ebtables.sourceforge.net/download.html
ipvsadm -A -t DirectorIPAddress:80
ARPD has support for negative entries and for dead hosts.
/proc/sys/net/ipv4/neigh/eth0/app_solicit
Activation:
On some distros, you will get the error db_open: No such file
or directory unless you simply run mkdir /var/lib/arpd/ before
(for the arpd.db file).
In this case you, arp packets are still caught by arpd daemon
get_arp_pkt() (misc/arpd.c)
Allocated by IANA.
Technion (?)
(net/irda/irlan/irlan_eth.c)
neighbour states
Reachable Incomplete
neigh_alloc()
None
Stale
Delay
Probe
Neighboring Subsystem states
NUD states
NUD_NONE
NUD_REACHABLE
NUD_STALE
NUD_DELAY
NUD_PROBE
NUD_FAILED
NUD_INCOMPLETE
Neighboring Subsystem states
Is it a (latent) bug ?
if (!(state & NUD_IN_TIMER)) {
#ifndef CONFIG_SMP
printk(KERN_WARNING "neigh: timer & !nud_in_timer\n");
#endif
goto out;
}
Neighboring Subsystem states
Special states:
NUD_NOARP
NUD_PERMANENT
NUD_IN_TIMER (NUD_INCOMPLETE|NUD_REACHABLE|
NUD_DELAY|NUD_PROBE)
NUD_VALID (NUD_PERMANENT|NUD_NOARP|
NUD_REACHABLE|NUD_PROBE|NUD_STALE|NUD_DELAY)
NUD_CONNECTED (NUD_PERMANENT|NUD_NOARP|
NUD_REACHABLE)
Neighbour states
net/core/neighbour.c
Transformation bundles.
RFC2401
IPSec-cont.
example: pptp
ipsec verify Check your system to see if IPsec got installed and
started correctly.
iph->daddr == 0x0A00A8C0 or
means checking if the address is 192.168.0.10 (C0=192,A8=168, 00=0,0A=10).
echo 1 >/proc/sys/net/ipv4/icmp_echo_ignore_all
Disable arp: ip link set eth0 arp off (the NOARP flag will be set)
Or:
printk("***neigh->primary_key= %u.%u.%u.%u\n",
NIPQUAD(*(u32*)neigh->primary_key));
cat /proc/net/stat/arp_cache
entries allocs destroys hash_grows lookups hits res_failed
rcv_probes_mcast rcv_probes_ucast periodic_gc_runs
forced_gc_runs
periodic_gc_runs: statistics of how many times the
neigh_periodic_timer() is called.
Links and more info
1) Linux Network Stack Walkthrough (2.4.20):
http://gicl.cs.drexel.edu/people/sevy/network/Linux_network_stack_walkthrough.html
2) Understanding the Linux Kernel, Second Edition
By Daniel P. Bovet, Marco Cesati
Second Edition December 2002
chapter 18: networking.
- Understanding Linux Network Internals, Christian benvenuti
Oreilly , First Edition.
Links and more info
3) Linux Device Driver, by Jonathan Corbet, Alessandro Rubini, Greg
Kroah-Hartman
Third Edition February 2005.
http://linux-net.osdl.org/index.php/Main_Page
5) netdev mailing list: http://www.spinics.net/lists/netdev/
Links and more info
6) Removal of multipath routing cache from kernel code:
http://lists.openwall.net/netdev/2007/03/12/76
http://lwn.net/Articles/241465/
7) Linux Advanced Routing & Traffic Control :
http://lartc.org/
8) ebtables a filtering tool for a bridging:
http://ebtables.sourceforge.net/
Links and more info
9) Writing Network Device Driver for Linux: (article)
http://app.linux.org.mt/article/writing-netdrivers?locale=en
Links and more info
10) Netconf a yearly networking conference; first was in 2004.
http://vger.kernel.org/netconf2004.html
http://vger.kernel.org/netconf2005.html
http://vger.kernel.org/netconf2006.html
David S. Miller, James Morris , Rusty Russell , Jamal Hadi Salim ,Stephen
Hemminger , Harald Welte, Hideaki YOSHIFUJI, Herbert Xu ,Thomas Graf ,Robert
Olsson ,Arnaldo Carvalho de Melo and others
Links and more info
11) Policy Routing With Linux - Online Book Edition
http://www.policyrouting.org/PolicyRoutingBook/
12) THRASH - A dynamic LC-trie and hash data structure:
Robert Olsson Stefan Nilsson, August 2006
http://www.csc.kth.se/~snilsson/public/papers/trash/trash.pdf
13) IPSec howto:
http://www.ipsec-howto.org/t1.html
Links and more info
14) Openswan: Building and Integrating Virtual Private
Networks , by Paul Wouters, Ken Bantoft
http://www.packtpub.com/book/openswan/mid/061205jqdnh2by
publisher: Packt Publishing.
15) a book including chapters about LVS:
The Linux Enterprise Cluster- Build a Highly Available Cluster
with Commodity Hardware and Free Software, By Karl
Kopper.
http://www.nostarch.com/frameset.php?startat=cluster
15) http://www.vyatta.com - Open-Source Networking
Links and more info
16) Address Resolution Protocol (ARP)
http://linux-ip.net/html/ether-arp.html
17) ARPWatch a tool for monitor incoming ARP traffic.
Lawrence Berkeley National Laboratory -
ftp://ftp.ee.lbl.gov/arpwatch.tar.gz.
18) arptables:
http://ebtables.sourceforge.net/download.html
19) TCP/IP Illustrated, Volume 1: The Protocols
By W. Richard Stevens
http://www.informit.com/store/product.aspx?isbn=0201633469
Links and more info
20) Unix Network Programming, Volume 1: The Sockets
Networking API (3rd Edition) (Addison-Wesley Professional
Computing Series) (Hardcover)
by W. Richard Stevens (Author), Bill Fenner (Author), Andrew M.
Rudoff (Author)
Questions
Questions ?
Thank You !
IPV6
Linux Kernel Networking (3)-
advanced topics
Rami Rosen
[email protected]
Haifux, April 2008
www.haifux.org
Linux Kernel Networking (3)-
advanced topics
Note:
http://www.haifux.org/lectures/172/
slides:http://www.haifux.org/lectures/172/netLec.pdf
http://www.haifux.org/lectures/180/
slides:http://www.haifux.org/lectures/180/netLec2.pdf
Contents
IPV6
General
ICMPV6
Radvd
Autoconfiguration
Network Namespaces
Bridging Subsystem
Tips
Using mobile IPV6 devices which are not behind a NAT can
avoid the need to send Keep-Alive.
In the end of 1997 IBM's AIX 4.3 was the first commercial
platform that supported IPv6
http://www.m6bone.net/
Freenet6: http://go6.net/4105/freenet.asp
IPV6 in the Linux Kernel
IPV6 Kernel part was started long ago - 1996 (by Pedro
Roque), based on the BSD API; It was Linux kernel 2.1.8.
From time to time, the main networking tree pulls this git tree.
http://clarinet.u-strasbg.fr/~hoerdt/
http://www.kame.net/
Mobile IPV6:
http://www.mobile-ipv6.org/
IPV6 -General
Hitachi
Nortel Networks
http://www.ietf.org/IESG/Implementations/ipv6-implementations.txt
Drawbacks of IPV6
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx
Localhost address:
0000:0000:0000:0000:0000:0000:0000:0001
::1
IPV6 Addresses - contd
Caveat:
Run: ls /proc/net/if_inet6
Managing IP address:
ip -6 addr
tcpdump ip6
or , for example:
tethereal -R ipv6
IPV6 -General
route -A inet6
netstat -A inet6
Special Addresses:
see http://www.iana.org/assignments/ipv6-address-space
http://www.litech.org/radvd/
We can't !
In ipv6_add_addr():
Caveat:
FF02::1
Caveat:
Caveat: You will not notice it, unless your syslog prints
KERN_DEBUG messages (see man syslog.conf)
Caveat 2:
Radvd also send its mac address of itself as part of the options.
Unless /proc/sys/net/ipv6/conf/eth0/accept_ra_defrtr is 0.
This default router has a limited lifetime. It will expire after the
value specified for AdvDefaultLifetime.
This is not the MTU you see in ifconfig, but you see it
in /proc/sys/net/ipv6/conf/eth0/mtu.
http://www.quagga.net/
Supports IPV6
At the end of the process, the host will have two (or more)
IPv6 addresses:
/proc/sys/net/ipv6/conf/eth0/router_solicitations
ICMPV6_PKT_TOOBIG messages.
(net/ipv6/ip6_fib.c)
The parameters for the lookup are the root of the table and
the source and destination IPV6 address. (struct in6_addr)
Enable forwarding:
/proc/sys/net/ipv6/conf/all/mc_forwarding
IPV6 header 40 bytes
include/linux/ipv6.h
Version (4)
Priority/Traffic Class (4) Flow Label (24)
Payload Length
(16)
Next Header (8) Hop Limit (8)
Source Address (128 bits=>16 bytes)
Destination Address (128 bits=>16 bytes)
IPV6 header - contd.
IPV6_DEFAULT_HOPLIMIT is 64.
(ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT...
See include/linux/in6.h
There are some types of Next Headers which cannot have a Next Header
field. For example, ICMPV6, TCP, UDP, no next header (IPPROTO_NONE).
Extension Headers - contd.
Projects:
Dibbler:
http://klub.com.pl/dhcpv6/
https://fedorahosted.org/dhcpv6/
No mailing list...
WIDE-DHCPv6
Unicast
Multicast
net/ipv6/ip6_input.c
ip -6 maddr show
http://www.benedikt-stockebrand.net/hacks_e.html
MLD - contd.
(net/ipv6/mcast.c)
(igmp6_leave_group() in /net/ipv6/mcast.c)
NEXTHDR_HOP in include/net/ipv6.h
OS virtualization.
OS virtualization:
Xen
Example:
brctl show
Simple example
You can add a tap device (but not a tun device) (?)
br_pass_frame_up() calls:
NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN,
skb, indev, NULL,netif_receive_skb);
Bridging Subsystem-contd
iph->daddr == 0x0A00A8C0 or
means checking if the address is 192.168.0.10 (C0=192,A8=168, 00=0,0A=10).
echo 1 >/proc/sys/net/ipv4/icmp_echo_ignore_all
Disable arp: ip link set eth0 arp off (the NOARP flag will be set)
Or:
printk("***neigh->primary_key= %u.%u.%u.%u\n",
NIPQUAD(*(u32*)neigh->primary_key));
cat /proc/net/stat/arp_cache
entries allocs destroys hash_grows lookups hits res_failed
rcv_probes_mcast rcv_probes_ucast periodic_gc_runs
forced_gc_runs
periodic_gc_runs: statistics of how many times the
neigh_periodic_timer() is called.
Links and more info
http://www.linux-ipv6.org/
http://gsyc.es/~eva/IPv6-web/ipv6.html
Books:
Pages: 436
http://www.benedikt-stockebrand.net/books_e.html
Links and more info
1) IPv6 Advanced Protocols Implementation (2007)
2) IPv6 Core Protocols Implementation (2006)
Both books were written by Qing Li, Tatuya Jinmei and Keiichi
Shima
- published by Morgan Kaufmann Series in Networking.
http://www.ipv6.org/
Keio University
USAGI/WIDE Project
http://mirror.linux.org.au/pub/linux.conf.au/2008/slides/131-200801-LCA2008-LinuxIPv6.pdf
Html: http://www.linux-ipv6.org/materials/200801-LCA2008/
Links and more info
Linux Network Stack Walkthrough (2.4.20):
http://gicl.cs.drexel.edu/people/sevy/network/Linux_network_stack_walkthrough.html
Understanding the Linux Kernel, Second Edition
By Daniel P. Bovet, Marco Cesati
Second Edition December 2002
chapter 18: networking.
- Understanding Linux Network Internals, Christian benvenuti
Oreilly , First Edition.
Links and more info
Linux Device Driver, by Jonathan Corbet, Alessandro Rubini, Greg
Kroah-Hartman
Third Edition February 2005.
http://www.linux-foundation.org/en/Net:Main_Page
http://app.linux.org.mt/article/writing-netdrivers?locale=en
Links and more info
Netconf a yearly networking conference; first was in 2004.
http://vger.kernel.org/netconf2004.html
http://vger.kernel.org/netconf2005.html
http://vger.kernel.org/netconf2006.html
David S. Miller, James Morris , Rusty Russell , Jamal Hadi Salim ,Stephen
Hemminger , Harald Welte, Hideaki YOSHIFUJI, Herbert Xu ,Thomas Graf ,Robert
Olsson ,Arnaldo Carvalho de Melo and others
Links and more info
Policy Routing With Linux - Online Book Edition
http://www.policyrouting.org/PolicyRoutingBook/
THRASH - A dynamic LC-trie and hash data structure:
Robert Olsson Stefan Nilsson, August 2006
http://www.csc.kth.se/~snilsson/public/papers/trash/trash.pdf
IPSec howto:
http://www.ipsec-howto.org/t1.html
Links and more info
Openswan: Building and Integrating Virtual Private Networks ,
by Paul Wouters, Ken Bantoft
http://www.packtpub.com/book/openswan/mid/061205jqdnh2by
publisher: Packt Publishing.
a book including chapters about LVS:
The Linux Enterprise Cluster- Build a Highly Available Cluster
with Commodity Hardware and Free Software, By Karl
Kopper.
http://www.nostarch.com/frameset.php?startat=cluster
http://www.vyatta.com - Open-Source Networking
Links and more info
Address Resolution Protocol (ARP)
http://linux-ip.net/html/ether-arp.html
ARPWatch a tool for monitor incoming ARP traffic.
Lawrence Berkeley National Laboratory -
ftp://ftp.ee.lbl.gov/arpwatch.tar.gz.
arptables:
http://ebtables.sourceforge.net/download.html
TCP/IP Illustrated, Volume 1: The Protocols
By W. Richard Stevens
http://www.informit.com/store/product.aspx?isbn=0201633469
Links and more info
Unix Network Programming, Volume 1: The Sockets Networking
API (3rd Edition) (Addison-Wesley Professional Computing Series)
(Hardcover)
by W. Richard Stevens (Author), Bill Fenner (Author), Andrew M.
Rudoff (Author)
Linux Ethernet Bridging mailing list:
http://www.spinics.net/lists/linux-ethernet-bridging/
Questions
Questions ?
Thank You !
Linux Wireless -
Linux Kernel Networking (4)-
advanced topics
Rami Rosen
[email protected]
ai!ux" #arc$ %&&'
www.$ai!ux.org
Linux Kernel Networking (4)-
advanced topics
Note(
$ttp(--www.$ai!ux.org-lectures-./%-
slides($ttp(--www.$ai!ux.org-lectures-./%-netLec.pd!
2) Advanced Linux Kernel Networking -
Neighboring Subsystem and !Sec lecture
$ttp(--www.$ai!ux.org-lectures-.0&-
slides($ttp(--www.$ai!ux.org-lectures-.0&-netLec%.pd!
Linux Kernel Networking (4)-
advanced topics
") Advanced Linux Kernel Networking -
!v# in the Linux Kernel lecture
$ttp(--www.$ai!ux.org-lectures-.0/-
Slides( $ttp(--www.$ai!ux.org-lectures-.0/-netLec+.pd!
1ontents(
2eneral.
,3330&%.. specs.
,n!rastructure mode.
Association.
Scanning.
ostapd
Tips.
Glossary.
Links.
,mages
$ttp(--www.$ai!ux.org-lectures-.+0-
$ttp(--www.$ai!ux.org-lectures-.%4-
$ttp(--devicescape.com-pu>-$ome.do
Apen-s=stem aut$entication
(WL5NH5@)HA83N) is t$e onl= mandator=
aut$entication met$od re*uired >= 0&%....
)r=ing t$is(
http://hostap.epitest.fi/hostapd/
Launc$ing $ostapd(
./hostapd hostapd.conf
ostapd manages(
5ssociation-Gisassociation re*uests.
5ut$entication-deaut$entication re*uests.
5anagement (,3330&%..H6)?83H#2#))
(WL5NH15859,L,)?H344-WL5NH15859,L,)?H,944 in ieee0&%...$.)
5ll in -include-linux-ieee0&%...$.
,3330&%..H4)?83H9351AN
ostapd-cont.
6ontrol /222(&2113748!2364L)
1ata (,3330&%..H6)?83HG5)5)
4ee( include-linux-ieee0&%...$
$ttp(--www.>Beek.com-static-index.$tml
$ttp(--www.lesswatts.org-
8ower)A8 util(
!irmware" or
http://www.hpl.hp.com/personal/;ean_+o#rrilhes/2in#6/+ools.html
6rom net-mac0&%..-tx.c(
ieee80211_t6_h_m#lticast_ps_"#f() <
...
-Q no >u!!ering !or ordered !rames Q-
i! (ieee0&%..H$asHorder($dr-O!rameHcontrol))
return )PH1AN),N@37
net/mac80211/mlme.c
Coining an ,944(
ieee80211_sta_create_i"ss() (mlme.c)
Apen0&%...s
4ponsors(
AL81 proEect.
Nortel
(mesh_path_fi6_ne6thop() in mesh_patht"l.c)
Advantage$
Rapid deplo=ment.
1isadvantage(
?ou can set a wireless device to work in mes$ mode onl= wit$
t$e iw command (?ou cannot per!orm t$is wit$ t$e wireless
tools).
multiple antennas.
Run( iwconfig
5onitor mode
5esh
see( include-linux-nl0&%...$(
enum nl(&2113i)ty-e @
NL(&2113748!23ANS!26721<
NL(&2113748!23A10:6<
NL(&2113748!23S4A4:N<
NL(&2113748!23A!<
NL(&2113748!23A!3BLAN<
NL(&2113748!2391S<
NL(&2113748!235:N4:=<
NL(&2113748!2352S03!:N4<
U
c!g0&%.. and nl0&%..
Wireless-testing
wireless-next-%.;
Wireless-%.;
$ttp(--www.or>it-la>.org-kernel-compat-wireless-%.;-
#ost drivers de!ine a struct !or t$is private area " like
l"tf_pri!ate (#arvell) or iwl_pri! (iwlwi!i o! ,ntel) or
mac80211_hwsim_data in mac0&%..H$wsim.
6or example"
$w-Owip$=-Ointer!aceHmodes I
9,)(NL0&%..H,6)?83H4)5),AN) V
9,)(NL0&%..H,6)?83H58)7
ieee80211_t6_stat#s_ir5safe()
Gata !rames
5ddr+ - G4 in!o
#anagement !rames
5ddr+ - G4 in!o
6irmware
6irmware(
$ttp(--www.ing.uni>s.it-open!ww!-
Written in assem>ler.
$ttp(--linuxwimax.org-
)wo parts(
$ttp(--www.spinics.net-lists-netdev-msg0.'&%.$tml
$ttp(--www.*>ik.c$-us>-devices-
,srael regdomain(
$ttp(--wireless.kernel.org-en-developers-Regulator=-Gata>aseJalp$a%I,L
cat -s=s-class-ieee0&%..-p$=Q-macaddress
$ttp(--wiki.wires$ark.org-1apture4etup-WL5NR$ead->>0+/+e!4'&+!e'da%>0+/<++./%;<4.!>.ad+%d
wlan.fc.t8pe_s#"t8pe e5 8
G4 I Gistri>ution 4=stem
#,#A I #ultiple-,nput-#ultiple-Autput
84 I 8ower 4aving.
$ttp(--standards.ieee.org-getieee0&%-0&%....$tml
$ttp(--www.kernel.org-pu>-linux-kernel-people-mcgro
!-presentations-linux-wireless-status.pd!
$ttp(--wireless.kernel.org-
or $ttp(--linuxwireless.org-
4) 5 >ook(
8u>lis$er( ALReill=
;) $ttp(--www.lesswatts.org-
wlan.!c.t=peHsu>t=pe e* 0
Note)
http)..wwwhai"uxorg.lectures./0%.
slides)http)..wwwhai"uxorg.lectures./0%.netLecpd"
2) Advanced Linux Kernel Networking -
Neighboring Subsystem and !Sec lecture
http)..wwwhai"uxorg.lectures./1&.
slides)http)..wwwhai"uxorg.lectures./1&.netLec%pd"
Linux Kernel Networking (5)(
advanced topics
") Advanced Linux Kernel Networking -
!v# in the Linux Kernel lecture
http)..wwwhai"uxorg.lectures./10.
Slides) http)..wwwhai"uxorg.lectures./10.netLec2pd"
$) %ireless in Linux
http)..wwwhai"uxorg.lectures.%&3.
Slides) http)..wwwhai"uxorg.lectures.%&3.wirelessLecpd"
678 protocol
9ontrol :essages
$ppendixes
R>*6RN ;$L6>
=or open() s5stem call ("or "iles)# we also get a "ile descriptor
as the return value
=rom hostapd)
*5pe)
sock"dEsocket($=B-N>*# S?9KBS*R>$:#!!+,-,.S/-!)<
=or 4luetooth.R=9?::)
socket($=BHL6>*??*!# S?9KBS*R>$:#
H*8R?*?BR=9?::)<
struct socket has only 8 members; struct sock has more than 20,
and is one of the biest structures in the net!orkin stack. "ou
can easily be confused bet!een them. #o the con$ention is this%
SSB=R>>
SSB6N9?NN>9*>7
SSB9?NN>9*-NG
SSB9?NN>9*>7
SSB7-S9?NN>9*-NG
in inetBdgramBops it is ud&_&oll()
in inetBsockrawBops# it is dataram_&oll().
7iagram)
struct inetBsock
struct sock (sk)
struct ipBoptions Kopt<
BBu1 tos<
BBu1 recverr)/<
BBu1 hdrincl)/<
skb_&eek()
rec$from()
rec$ms()
L<
icmpBsend(sk4# -9:8B7>S*B6NR>$9!#
-9:8B8?R*B6NR>$9!# &)<
udpBrcv()
udpBrcv() ( contd
?r# 45)
udpBrcv()
BBudp,Bli4Brcv
:ulticast
BBudp,Bli4BmcastBdeliver
6nicast
BBudp,Bli4BlookupBsk4
=ind a sock in udpta4le
udpB+ueueBrcvBsk4 sockB+ueueBrcvBsk4
7onOt "ind a sock
icmpBsend()
-9:8B7>S*B6NR>$9!#
-9:8B8?R*B6NR>$9!
ud&_rec$ms()%
S?LB-8 is &
Fhen calling recvmsg()# we will parse the msghr like this)
"or (cmptrE9:SGB=-RS*!7R(Qmsg)< cmptrYEN6LL<
cmptrE9:SGBNI*!7R(Qmsg#cmptr))
J
i" (cmptr(DcmsgBlevel EE S?LB-8 QQ cmptr(DcmsgBt5pe EE
-8B8K*-N=?)
J
pktin"o E (struct inBpktin"oK)9:SGB7$*$(cmptr)<
print"(RdestinationEZsSnR# inetBntop($=B-N>*# Qpktin"o(DipiBaddr#
str# siUeo"(str)))<
L
L
sendto()
sendms()
?r adding in .etc.s5sctlcon")
netipv,ipBnonlocalB4indE/
Fhat will happen i" in the a4ove udp client example# we will tr5
setting a 4roadcast address as the destination (instead o"
/'%/31&/%/)# thus)
inetBaton(R%55%55%55%55R#QtargetsinBaddr)<
Wde"ine 678B9?RK /
$dding)
S?BSN7H6=
int valE/<
sk(DskBstate EE *98B>S*$HL-S!>7
,xam&le%
struct sockaddr_in source;
struct siaction handler;
source.sin_family < A7_2.,1;
source.sin_&ort < htons(888);
source.sin_addr.s_addr < htonl(2.A;;?_A.");
ser$#ocket < socket(A7_2.,1, #04-_;*?A), 0);
bind(ser$#ocket,(struct sockaddrG)@source,siAeof(struct
sockaddr_in));
handler.sa_handler < #2*20Handler;
sifillset(@handler.sa_mask);
handler.sa_flas < 0;
siaction(#2*20, @handler, 0);
fcntl(ser$#ocket,7_#,10(., et&id());
fcntl(ser$#ocket,7_#,17>, 0_.0.3>04- I 7A#".4);
$=B6N-I)
678)
R$F)
-9:8B=-L*>R
*98)
*98B9?RK
*98B7>=>RB$99>8*
*98B-N=?
*98BK>>89N*
*98BK>>8-7L>
*98BK>>8-N*;L
*98BL-NG>R%
*98B:$IS>G
*98BN?7>L$N
*98B^6-9K$9K
*98BSNN9N*
*98BF-N7?FB9L$:8
$=B8$9K>*
8$9K>*B$77B:>:H>RS!-8
8$9K>*B7R?8B:>:H>RS!-8
Socket o)tions *or socket level:
S?B7>H6G
S?BR>6S>$77R
S?B*N8>
S?B>RR?R
S?B7?N*R?6*>
S?BHR?$79$S*
S?BSN7H6=
S?BR9;H6=
S?BSN7H6==?R9>
S?BR9;H6==?R9>
S?BK>>8$L-;>
S?B??H-NL-N>
S?BN?B9!>9K
S?B8R-?R-*N
S?BL-NG>R
S?BHS79?:8$*
$ppendix 9) tcp client
Winclude X"cntlhD
Winclude Xstdli4hD
Winclude XerrnohD
Winclude XstdiohD
Winclude XstringhD
Winclude Xs5s.send"ilehD
Winclude Xs5s.stathD
Winclude Xs5s.t5peshD
Winclude XunistdhD
Winclude Xarpa.inethD
int main()
J
tcp client ( contd
struct sockaddrBin sa<
int sd E socket(8=B-N>*# S?9KBS*R>$:# &)<
i" (sdX&)
print"(RerrorR)<
memset(Qsa# &# siUeo"(struct sockaddrBin))<
sasinB"amil5 E $=B-N>*<
sasinBport E htons(152)<
inetBaton(R/'%/31&/%/R#QsasinBaddr)<
i" (connect(sd# (struct sockaddrK)Qsa# siUeo"(sa))X&) J
perror(RconnectR)<
exit(&)<
L
close(sd)<
L
tcp client ( contd
*his simple example demonstrates how to set and get an -8 la5er option)
Winclude XstdiohD
Winclude Xarpa.inethD
Winclude Xs5s.t5peshD
Winclude Xs5s.sockethD
Winclude XstringhD
int main()
J
int s<
int opt<
int res<
int one E /<
int siUe E siUeo"(opt)<
s E socket($=B-N>*# S?9KB7GR$:# &)<
i" (sX&)
perror(RsocketR)<
res E setsockopt(s# S?LB-8# -8BR>9;>RR# Qone# siUeo"(one))<
i" (resEE(/)
perror(RsetsockoptR)<
res E getsockopt(s# S?LB-8# -8BR>9;>RR#Qopt#QsiUe)<
i" (resEE(/)
perror(RgetsockoptR)<
print"(Ropt E ZdSnR#opt)<
close(s)<
L
>xample) record route option
-ncremented in ud&_sendms()
$nother metric)
cat .proc.net.udp
Represents sk(DskBdrops
-ncremented in BBudpB+ueueBrcvBsk4()
net.ipv,.udpc
sk4(DtruesiUe it the siUe (in 45tes) allocated "or the data o"
the sk4 plus the siUe o" skB4u"" structure itsel"
atomicBadd(sk4(DtruesiUe# Qsk(DskBrmemBalloc)<
see) include.net.sockh
sockBr"ree()
atomicBsu4(sk4(DtruesiUe# Qsk(DskBrmemBalloc)<
-t e+uals to
/&roc/sys/net/core/rmem_default entr5
2n skb_set_o!ner_!(), !e ha$e%
...
atomic_add(skb6DtruesiAe, @sk6Dsk_!mem_alloc);
...
(hen the &acket is freed by kfree_skb(), !e decrement
sk_!mem_alloc, in sock_!free() method%
sock_!free()
...
atomic_sub(skb6DtruesiAe, @sk6Dsk_!mem_alloc);
...
*ips
netstat (ae
*hank 5ouY
ramirose@gmailcom
Linux Kernel Networking
advanced topics (6)
Sockets in the kernel
Rami Rosen
ramirose@gmailcom
!ai"ux# $ugust %&&'
wwwhai"uxorg
$ll rights reserved
Linux Kernel Networking (6)(
advanced topics
Note)
*his lecture is a se+uel to the "ollowing , lectures
- gave in !ai"ux)
1) Linux Kernel Networking lecture
http)..wwwhai"uxorg.lectures./0%.
slides)http)..wwwhai"uxorg.lectures./0%.netLecpd"
2) Advanced Linux Kernel Networking -
Neighboring Subsystem and !Sec lecture
http)..wwwhai"uxorg.lectures./1&.
slides)http)..wwwhai"uxorg.lectures./1&.netLec%pd"
Linux Kernel Networking (6)(
advanced topics
") Advanced Linux Kernel Networking -
!v# in the Linux Kernel lecture
http)..wwwhai"uxorg.lectures./10.
Slides) http)..wwwhai"uxorg.lectures./10.netLec2pd"
$) %ireless in Linux
http)..wwwhai"uxorg.lectures.%&6.
Slides) http)..wwwhai"uxorg.lectures.%&6.wirelessLecpd"
&) Sockets in the Linux Kernel
http)..wwwhai"uxorg.lectures.%/0.
Slides) http)..wwwhai"uxorg.lectures.%/0.netLec,pd"
Note
Note) *his is the second part o" the 3Sockets in
the Linux Kernel4 lecture which was given in
!ai"ux in %00&' 5ou ma6 "ind some
7ackground material "or this lecture in its slides)
http)..wwwhai"uxorg.lectures.%/0.netLec,pd"
*89
*89)
R$: Sockets
;N-< =omain Sockets
Netlink sockets
S9*> sockets
$ppendices
Note) $ll code examples in this lecture re"er to
the recent 2'#'"( version o" the Linux kernel
)A% Sockets
*here are cases when there is no inter"ace to create
sockets o" a certain protocol (-9?> protocol# N@*L-NK
protocol) AB use Raw sockets
raw socket creation is done thus# "or example)
sd A socket($CD-N@*# S89KDR$:# &)E
sd A socket($CD-N@*# S89KDR$:#->>R8*8D;=>)E
sd A socket($CD>$9K@*# S89KDR$:# htons(@*!D>D->))E
@*!D>D-> tells to handle all -> packets
:hen using $CD-N@* "amil6# as in the "irst two cases# the
socket is added to kernel R$: sockets hash ta7le (the hash
ke6 is the protocol num7er) *his is done 76 raw_hash_sk()#
(net.ipvF.rawc)# which is invoked 76 inet_create()# when
creating the socket
:hen using $CD>$9K@* "amil6# as in the third case# a socket
is not added to the kernel R$: sockets hash ta7le
See $ppendix C "or an example o" using packet raw socket
Raw socket creation *+S, 7e done as a super
user
-n case an ordinar6 user tr6 to create a raw socket#
6ou will get)
3error) socket) 8peration not permitted4 (-!-)*)
5ou can set the 9$>DN@*DR$: capa7ilit6 to ena7le
non root users to create raw sockets)
setcap capDnetDrawAGep rawserver
;sage o" R$: socket) ping
5ou do not speci"6 ports with R$: socketsE R$:
sockets do not work with ports
:hen the kernel receives a raw packet# it
delivers it to all raw sockets
>ing in "act is sending an -9?> packet
*he t6pe o" this -9?> packet is .*! -./0
)-1+-S,'
Send a ping
implementation(simpli"ied)
Hde"ine I;CS-J@ /,&&
char send7u"KI;CS-J@LE
struct icmp MicmpE
int sock"dE
struct sockaddrDin targetE
int datalenA,6E
targetsinD"amil6 A $CD-N@*E
inetDaton(N/'%/61&/%/N#OtargetsinDaddr)E
icmp A (struct icmp M)send7u"E
icmp(BicmpDt6pe A -9?>D@9!8E
icmp(BicmpDcode A &E
icmp(BicmpDid A getpid()E
memset(icmp(BicmpDdata# &xa,# datalen)E
icmp(BicmpDcksumA&E
sock"dAsocket($CD-N@*# S89KDR$:# ->>R8*8D-9?>)E
res A sendto(sock"d# send7u"# len# &# (struct sockaddrM)Otarget# siPeo"(struct
sockaddrDin))E
( ?issing here is se+uence num7er# checksum computation
( *he de"ault num7er o" data 76tes to 7e sent is ,6E the -9?>
header is 1 76tes So we get 6F 76tes (or 1F 76tes# i" we include
the -> header o" %& 76tes)
Receive a ping(
implementation(simpli"ied)
DDu1 M7u"E
char addr7u"K/%1LE
struct iovec iovE
struct iphdr MiphdrE
int sock"dE
struct icmphdr MicmphdrE
char recv7u"KI;CS-J@LE
char control7u"KI;CS-J@LE
struct msghdr msgE
sock"dAsocket($CD-N@*# S89KDR$:# ->>R8*8D-9?>)E
ioviovD7ase A recv7u"E
ioviovDlen A siPeo"(recv7u")E
memset(Omsg# &# siPeo"(msg))E
msgmsgDname A addr7u"E
msgmsgDnamelen A siPeo"(addr7u")E
msgmsgDiov A OiovE
msgmsgDiovlen A /E
msgmsgDcontrol A control7u"E
msgmsgDcontrollen A siPeo"(control7u")E
n A recvmsg(sock"d# Omsg# &)E
7u" A msgmsgDiov(BiovD7aseE
iphdr A (struct iphdrM)7u"E
icmphdr A (struct icmphdrM)(7u"G(iphdr(BihlMF))E
i" (icmphdr(Bt6pe AA -9?>D@9!8R@>L5)
print"(N-9?>D@9!8R@>L5QnN)E
i" (icmphdr(Bt6pe AA -9?>D=@S*D;NR@$9!)
print"(N-9?>D=@S*D;NR@$9!QnN)E
*he onl6 S8LDR$: option a Raw socket can get
is -9?>DC-L*@R
*his can 7e done thus)
Hde"ine -9?>DC-L*@R /
struct icmpD"ilter R
DDu2% dataE
SE
"iltdata A /TT.*!23-S,2+N)-A./E
res A setsockopt(sock"d# S8LDR$:# .*!24L,-)#
(charM)O"ilt# siPeo"("ilt))E
$dding this code in the receive >ing application
a7ove will prevent 3estination +nreachable
-9?> messages "rom received in user space
76 recvmsg
*here are +uite a lot more -9?> optionsE 76
de"ault# we do N8* "ilter an6 -9?> messages
$mong the other options 6ou can set 76
setsockopt are)
-9?>D@9!8 (echo re+uest)
-9?>D@9!8R@>L5 (echo repl6)
-9?>D*-?@D@<9@@=@=
$nd more (see $ppendix = "or a "ull list)
,raceroute also uses raw sockets
*raceroute changes the **L "ield in the ip header
*his is done 76 ->D**L and control messages in
current Linux traceroute implementation (=mitr6
Iutsko6)
-n the original traceroute (76 Uan Vaco7son) it was
done with the ->D!=R-N9L socket option)
(setsockopt(sndsock# ->>R8*8D-># ->D!=R-N9L#)
*he ->D!=R-N9L tells the -> la6er not not to prepare an -> header
when sending a packet
->D!=R-N9L is also applica7le in ->U6
:hen receiving a packet# the -> header is alwa6s included in the
packet
:hen sending a packet# 76 speci"6ing the the IP_HDRINCL
option you tell the kernel that the IP header is already included
in the packet, so no need to prepare it in the kernel.
raw_send_hdrinc() in netip!"raw.c
#he IP_HDRINCL option is applied only to the $%C&_R'(
type o) protocol.
See Lawrence Ierkele6 National La7orator6 traceroute)
"tp).."tpeel7lgov.traceroutetargP
-" a raw socket was created with protocol t6pe o"
->>R8*8DR$: # this implies ena7ling ->D!=R-N9L)
*hus# this call "rom user space)
socket($CD-N@*#S89KDR$:#->>R8*8DR$:)
invokes this code in the kernel)
i" (S89KDR$: AA sock(Bt6pe) R
inet(Bnum A protocolE
i" (->>R8*8DR$: AA protocol)
inet(Bhdrincl A /E