socket layer coms w6998 spring 2010 erich nahum. outline sockets api refresher linux sockets...

35
Socket Layer COMS W6998 Spring 2010 Erich Nahum

Post on 20-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket Layer

COMS W6998

Spring 2010

Erich Nahum

Page 2: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Outline

Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and

AF_INET Interface between AF_INET and TCP/UDP Receive Path Send Path

Page 3: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

BSD Socket API

Originally developed by UC Berkeley at the dawn of time

Used by 90% of network oriented programs Standard interface across operating systems Simple, well understood by programmers

Page 4: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

User Space Socket API

socket() / bind() / accept() / listen() Initialization, addressing and hand shaking

select() / poll() / epoll() Waiting for events

send() / recv() Stream oriented (e.g. TCP) Rx / Tx

sendto() / recvfrom() Datagram oriented (e.g. UDP) Rx / TX

close(), shutdown() Closing down an association

Page 5: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

socket()

bind()

listen()

accept()

read()

write()

close()

socket()

bind()

connect()

write()

read()

close()

The ‘server’ application

The ‘client’ application

3-way handshake

data flow to server

data flow to client

4-way handshake

Standard Socket Sequence

Page 6: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket() System Call

Creating a socket from user space is done by the socket() system call: int socket (int family, int type, int protocol);

On success, a file descriptor for the new socket is returned.

For open() system call (for files), we also get a file descriptor as the return value.

“Everything is a file” Unix paradigm. The first parameter, family, is also sometimes referred

to as “domain”.

Page 7: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket(): Family A family is a suite of protocols Each family is a subdirectory of linux/net

E.g., linux/net/ipv4, linux/net/decnet, linux/net/packet IPv4: PF_INET IPv6: PF_INET6. Packet sockets: PF_PACKET

Operate at the device driver layer. pcap library for Linux uses PF_PACKET sockets pcap library is in use by sniffers such as tcpdump.

Protocol Family == Address Family PF_INET == AF_INET (in /include/linux/socket.h)

Page 8: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Address/Protocol Families/* Supported address families. */#define AF_UNSPEC 0#define AF_UNIX 1 /* Unix domain sockets */#define AF_LOCAL 1 /* POSIX name for AF_UNIX */#define AF_INET 2 /* Internet IP Protocol */#define AF_AX25 3 /* Amateur Radio AX.25 */#define AF_IPX 4 /* Novell IPX */#define AF_APPLETALK 5 /* AppleTalk DDP */#define AF_NETROM 6 /* Amateur Radio NET/ROM */#define AF_BRIDGE 7 /* Multiprotocol bridge */#define AF_ATMPVC 8 /* ATM PVCs */#define AF_X25 9 /* Reserved for X.25 project */#define AF_INET6 10 /* IP version 6 */#define AF_ROSE 11 /* Amateur Radio X.25 PLP */#define AF_DECnet 12 /* Reserved for DECnet project */#define AF_NETBEUI 13 /* Reserved for 802.2LLC project*/#define AF_SECURITY 14 /* Security callback pseudo AF */#define AF_KEY 15 /* PF_KEY key management API */..#define AF_ISDN 34 /* mISDN sockets */#define AF_PHONET 35 /* Phonet sockets */#define AF_IEEE802154 36 /* IEEE802154 sockets */#define AF_MAX 37 /* For now.. */

include/linux/socket.h

Page 9: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket(): Type

SOCK_STREAM and SOCK_DGRAM are the mostly used types. SOCK_STREAM for TCP, SCTP SOCK_DGRAM for UDP. SOCK_RAW for RAW sockets. There are cases where protocol can be either

SOCK_STREAM or SOCK_DGRAM; for example, Unix domain socket (AF_UNIX).

Page 10: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket(): Protocol

Protocol is protocol number within a family. Internet protocols are assigned by IANA

http://www.iana.org/assignments/protocol-numbers/

For AF_INET, it’s usually 0. IPPROTO_IP is 0, see: include/linux/in.h.

For SCTP: protocol is IPPROTO_SCTP (132)sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);

For UDP-Lite: protocol is IPPROTO_UDPLITE (136)

Page 11: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket Layer Architecture

BSD Socket Layer

User

Kernel

UDP

Hardware

Application

Intel E1000

Ethernet

PF_INET

TCP

Network Device Layer

IPV4

SOCK_STREAM

SOCK_DGRAM SOCK

_RAW

PF_PACKET

SOCK_RAW

SOCK_DGRAM

PF_UNIX PF_IPX

…. ….

SocketInterface

ProtocolLayers

Token Ring PPP SLIP FDDIDeviceLayer

Page 12: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Key Concepts

Function pointer tables (“ops”) In-kernel interfaces for socket functions

Binding between BSD sockets and AF_XXX families Binding between AF_INET and transports (TCP, UDP)

Socket data structures struct socket (BSD socket) struct sock (protocol family socket, network state)

struct packet_sock (PF_PACKET) struct inet_sock (PF_INET)

struct udp_sock struct tcp_sock

Page 13: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket Data Structures For every socket which is created by a user space application,

there is a corresponding struct socket and struct sock in the kernel.

These are confusing. struct socket: include/linux/net.h

Data common to the BSD socket layer Has only 8 members Any variable “sock” always refers to a struct socket

struct sock : include/net/sock/h Data common to the Network Protocol layer (i.e., AF_INET) has more than 30 members, and is one of the biggest structures

in the networking stack. Any variable “sk” always refers to a struct sock.

Page 14: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

struct socketstruct socket { socket_state state; // SS_CONNECTING etc. short type; // SOCK_STREAM etc. unsigned long flags; struct fasync_struct *fasync_list; wait_queue_head_t wait; // tasks waiting struct file *file; // back ptr to inode struct sock *sk; // AF specific state const struct proto_ops *ops; // AF specific

operations};

include/linux/net.h

Page 15: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket Statetypedef enum { SS_FREE = 0, /* not allocated */ SS_UNCONNECTED, /* unconnected to an socket */ SS_CONNECTING, /* in process of connecting */ SS_CONNECTED, /* connected to socket */ SS_DISCONNECTING /* in process of disconnecting */} socket_state;

These states are not layer 4 states (like TCP_ESTABLISHED or TCP_CLOSE).

include/linux/net.h

Page 16: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Socket Types

enum sock_type {

SOCK_STREAM = 1,

SOCK_DGRAM = 2,

SOCK_RAW = 3,

SOCK_RDM = 4,

SOCK_SEQPACKET = 5,

SOCK_DCCP = 6,

SOCK_PACKET = 10,

};

include/linux/net.h

Page 17: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Comment in include/net/sock.h

/*

* This structure really needs to be cleaned up.

* Most of it is for TCP, and not used by any of

* the other protocols.

*/

Page 18: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

struct sock_common/* minimal network layer representation of sockets */

struct sock_common { /* * first fields are not copied in sock_copy() */ union { struct hlist_node skc_node; // main hash linkage for lookup struct hlist_nulls_node skc_nulls_node; // main hash for TCP/UDP }; atomic_t skc_refcnt; int skc_tx_queue_mapping; // tx queue for this connection union { unsigned int skc_hash; // hash value for lookup __u16 skc_u16hashes[2]; }; unsigned short skc_family; // network address family volatile unsigned char skc_state; // Connection state unsigned char skc_reuse; // SO_REUSEADDR setting int skc_bound_dev_if; // bound if !=0 union { struct hlist_node skc_bind_node; // bind hash linkage struct hlist_nulls_node skc_portaddr_node; // bind hash for UDP/Lite }; struct proto *skc_prot; // protocol handlers in a net family};

include/net/sock.h

Page 19: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Outline

Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and

AF_INET Interface between AF_INET and TCP/UDP Receive Path Send Path

Page 20: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

BSD Socket AF Interface

Main data structures struct net_proto_family struct proto_ops

Key function sock_register(struct net_proto_family *ops) Each address family:

Implements the struct net _proto_family. Calls the function sock_register( ) when the protocol

family is initialized. Implement the struct proto_ops for binding the BSD

socket layer and protocol family layer.

Page 21: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

net_proto_family

Describes each of the supported protocol familiesstruct net_proto_family {

int family;

int (*create)(struct net *net, struct socket *sock, int protocol, int kern);

struct module *owner;

}

Specifies the handler for socket creation create() function is called whenever a new socket of this type is

created

BSD Socket Layer

AF Socket Layer

Page 22: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

AF Socket LayerINET and PACKET proto_family

static const struct net_proto_family inet_family_ops = {.family = PF_INET,.create = inet_create,.owner = THIS_MODULE, /* af_inet.c */

};static const struct net_proto_family

packet_family_ops = {.family = PF_PACKET,.create = packet_create,.owner = THIS_MODULE, /* af_packet.c */

};

BSD Socket Layer

Page 23: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

proto_ops

Defines the binding between the BSD socket layer and address family (AF_*) layer.

The proto_ops tables contain function exported by the AF socket layer to the BSD socket layer

It consists of the address family type and a set of pointers to socket operation routines specific to a particular address family.

BSD Socket Layer

AF Socket Layer

Page 24: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

struct proto_opsstruct proto_ops { int family; struct module *owner; int (*release); int (*bind); int (*connect); int (*socketpair); int (*accept); int (*getname); unsigned int (*poll); int (*ioctl); int (*compat_ioctl); int (*listen); int (*shutdown); int (*setsockopt); int (*getsockopt); int (*compat_setsockopt); int (*compat_getsockopt); int (*sendmsg); int (*recvmsg); int (*mmap); ssize_t (*sendpage); ssize_t (*splice_read);};

include/linux/net.h

BSD Socket Layer

AF Socket Layer

Page 25: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

AF Socket LayerPF_PACKET proto_opsstatic const struct proto_ops packet_ops = { .family = PF_PACKET, .owner = THIS_MODULE, .release = packet_release, .bind = packet_bind, .connect = sock_no_connect, .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = packet_getname, .poll = packet_poll, .ioctl = packet_ioctl, .listen = sock_no_listen, .shutdown = sock_no_shutdown, .setsockopt = packet_setsockopt, .getsockopt = packet_getsockopt, .sendmsg = packet_sendmsg, .recvmsg = packet_recvmsg, .mmap = packet_mmap, .sendpage = sock_no_sendpage,};

net/packet/af_packet.c

BSD Socket Layer

Page 26: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

PF_INET proto_opsinet_stream_ops (TCP) inet_dgram_ops (UDP) inet_sockraw_ops (RAW)

.family PF_INET PF_INET PF_INET

.owner THIS_MODULE THIS_MODULE THIS_MODULE

.release inet_release inet_release inet_release

.bind inet_bind inet_bind inet_bind

.connect inet_stream_connect inet_dgram_connect inet_dgram_connect

.socketpair sock_no_socketpair sock_no_socketpair sock_no_socketpair

.accept inet_accept sock_no_accept sock_no_accept

.getname inet_getname inet_getname inet_getname

.poll tcp_poll udp_poll datagram_poll

.ioctl inet_ioctl inet_ioctl inet_ioctl

.listen inet_listen sock_no_listen sock_no_listen

.shutdown inet_shutdown inet_shutdown inet_shutdown

.setsockopt sock_common_setsockopt sock_common_setsockopt sock_common_setsockopt

.getsockopt sock_common_getsockop sock_common_getsockop sock_common_getsockop

.sendmsg tcp_sendmsg inet_sendmsg inet_sendmsg

.recvmsg sock_common_recvmsg sock_common_recvmsg sock_common_recvmsg

.mmap sock_no_mmap sock_no_mmap sock_no_mmap

.sendpage tcp_sendpage inet_sendpage inet_sendpage

.splice_read tcp_splice_read -- --net/ipv4/af_inet.c

BSD Socket Layer

AF Socket Layer

Page 27: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Outline

Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and

AF_INET Interface between AF_INET and TCP/UDP

Binding between IP and TCP/UDP (upcall) Binding between AF_INET and TCP (downcall)

Receive Path Send Path

Page 28: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Transport LayerAF_INET Transport API struct inet_protos

Interface between IP and the transport layer Is the upcall binding from IP to transport Method for demultiplexing IP packets to proper transport

struct proto Defines interface for individual protocols (TCP, UDP, etc) Is the downcall binding for AF_INET to transport Transport-specific functions for socket API

struct inet_protosw Describes the PF_INET protocols Defines the different SOCK types for PF_INET SOCK_STREAM (TCP), SOCK_DGRAM (UDP), SOCK_RAW

AF_INET Layer

Page 29: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

AF Socket LayerRecall IP’s inet_protos

handlerhandler

err_handlererr_handler

net_protocol

gso_send_checkgso_send_check

udp_rcv()udp_err()

igmp_rcv()

Null

inet_protos[MAX_INET_PROTOS]inet_protos[MAX_INET_PROTOS]0

1

MAX_INET_PROTOS

net_protocol

gso_segmentgso_segment

gro_receivegro_receive

gro_completegro_complete

handlerhandler

err_handlererr_handler

net_protocol

gso_send_checkgso_send_check

gso_segmentgso_segment

gro_receivegro_receive

gro_completegro_complete

Receive binding from the IP layer to the transport layer.

init_inet( ) calls inet_add_protocol (p) to add each protocol to the hash queues.

BSD Socket Layer

Page 30: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

struct proto /* Networking protocol blocks we attach to sockets. * socket layer -> transport layer interface */struct proto {

void (*close);int (*connect);int (*disconnect);struct sock * (*accept);int (*ioctl);int (*init);void (*destroy);void (*shutdown);int (*setsockopt);int (*getsockopt);int (*sendmsg);int (*recvmsg);int (*sendpage);int (*bind);int (*backlog_rcv); void (*hash);void (*unhash);int (*get_port);

}

include/linux/net.h

BSD Socket Layer

AF Socket Layer

Page 31: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

udp_prot

net/ipv4/af_inet.c

struct proto udp_prot = { .name = "UDP", .owner = THIS_MODULE, .close = udp_lib_close, .connect = ip4_datagram_connect, .disconnect = udp_disconnect, .ioctl = udp_ioctl, .destroy = udp_destroy_sock, .setsockopt = udp_setsockopt, .getsockopt = udp_getsockopt, .sendmsg = udp_sendmsg, .recvmsg = udp_recvmsg, .sendpage = udp_sendpage, .backlog_rcv = __udp_queue_rcv_skb, .hash = udp_lib_hash, .unhash = udp_lib_unhash, .get_port = udp_v4_get_port, .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem = &sysctl_udp_wmem_min, .sysctl_rmem = &sysctl_udp_rmem_min, .obj_size = sizeof(struct udp_sock), .slab_flags = SLAB_DESTROY_BY_RCU, .h.udp_table = &udp_table,#ifdef CONFIG_COMPAT .compat_setsockopt = compat_udp_setsockopt, .compat_getsockopt = compat_udp_getsockopt,#endif};

BSD Socket Layer

AF Socket Layer

Page 32: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

inet_protoswstatic struct inet_protosw inetsw_array[] ={ { .type = SOCK_STREAM, .protocol = IPPROTO_TCP, .prot = &tcp_prot, .ops = &inet_stream_ops, .no_check = 0, .flags = INET_PROTOSW_PERMANENT | INET_PROTOSW_ICSK, }, { .type = SOCK_DGRAM, .protocol = IPPROTO_UDP, .prot = &udp_prot, .ops = &inet_dgram_ops, .no_check = UDP_CSUM_DEFAULT, .flags = INET_PROTOSW_PERMANENT, }, { .type = SOCK_RAW, .protocol = IPPROTO_IP, /* wild card */ .prot = &raw_prot, .ops = &inet_sockraw_ops, .no_check = UDP_CSUM_DEFAULT, .flags = INET_PROTOSW_REUSE, }};

net/ipv4/af_inet.c

On startup (inet_init()), TCP, UDP, and Raw socket protocols are inserted into the inetsw_array[].

Other protocols call inet_register_protosw()

inet_unregister_protosw()will not remove protocols with PERMANENT set.

BSD Socket Layer

AF Socket Layer

Page 33: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Relationshipsstruct sock

sk_commonsk_commonsk_locksk_lock

sk_backlogsk_backlog......

struct socket

statestatetypetypeflagsflags

fasync_listfasync_list

struct proto_ops

inet_releaseinet_releaseinet_bindinet_bindinet_acceptinet_accept

......

waitwaitfilefilesksk

proto_opsproto_ops

struct proto

udp_lib_closeudp_lib_closeipv4_dgram_connectipv4_dgram_connect

udp_sendmsgudp_sendmsgudp_recvmsgudp_recvmsg

......

af_inet.caf_inet.cPF_INETPF_INET

(*sk_prot_creator)(*sk_prot_creator)sk_socketsk_socket

struct sock_common

skc_nodeskc_nodeskc_refcntskc_refcntskc_hashskc_hash

......skc_protoskc_protoskc_netskc_net

sk_send_headsk_send_head......

Page 34: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Example: inet_accept()int inet_accept(struct socket *sock, struct socket *newsock, int flags){ struct sock *sk1 = sock->sk; int err = -EINVAL; struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, &err);

if (!sk2) goto do_err;

lock_sock(sk2);

WARN_ON(!((1 << sk2->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT | TCPF_CLOSE)));

sock_graft(sk2, newsock);

newsock->state = SS_CONNECTED; err = 0; release_sock(sk2);do_err: return err;}

Page 35: Socket Layer COMS W6998 Spring 2010 Erich Nahum. Outline Sockets API Refresher Linux Sockets Architecture Interface between BSD sockets and AF_INET Interface

Backup