CSE 124 Networked Services
Fall 2009
Lecture 2: Networking architectures and Network Software APIs
B. S. Manoj, Ph.Dhttp://cseweb.ucsd.edu/classes/fa09/cse124
9/29/2009 UCSD CSE 124 Networked Services
Some of these slides are adapted from various sources/individuals including but not limited toProf. Amin Vahdat, Prof. James Kurose, Prof. Keith Ross, and UNIX/Linux software documentation projects and associated sources. Use of these slides other than for pedagogical purpose for CSE 124, may require explicit permissions from the respective sources.
Networking architecture
• There are two popular network architectural models– The TCP/IP architecture
– The OSI (Open Systems Interconnection) reference model from International Organization for Standradization (OSI)
9/29/2009 UCSD CSE 124 Networked Services
A comparison of the two models/architectures
ISO-OSI TCP/IP
A reference model than a successful architecture
A successful architecture
Model is defined before any prototypes existed
A model is retrofitted to the popular TCP/IP protocol suite
No working systems existed while modeling (lessons from TCP/IP model has contributed to the design)
No real model was intended in the original version, but later it was split between TCP and IP making it closer to a model!
Some example systems exist (e.g., X.25) The Internet is a successful and growing working system
Some layers are not essential (session layer) and some important functions are missing (security)
Some of the important functions are not defined (e.g., security)
Design is influenced by administrative bodies
Design is influenced by the popular Internet Technology Development Culture (“We reject kings, presidents, and voting. We believe in rough consensus and a working code.”
9/29/2009 UCSD CSE 124 Networked Services
Layer-wise comparison between OSI mdoel and TCP/IP suite
9/29/2009 UCSD CSE 124 Networked Services
Today’s popular 5-layer protocol stack
FTP HTTP SSH TFTP RTP
TCP UDP
IP
802.11802.3 ATM
DSSS/OFDM SONETEthernet
Hour Glass
9/29/2009 UCSD CSE 124 Networked Services
Hour Glass model
• Hour glass model highlights the critical use of IP as the key integrator – of a variety of diverse applications and – Heterogeneous networks
9/29/2009 UCSD CSE 124 Networked Services
Application
Transport
Network
Datalink
PHY
Network software
Hardware
Application software
Application
software
Software (Kernel modules)
Application
Network
TCPUDP
•Network software is usually implemented as a set of functions in the OS kernel•A part of the MAC and PHY resides in the hardware•In most cases, a part of Network and transport layers are implemented in kernel
•Real implementations is not strictly layer-wise•The sequence of function calls make a layered operation•It is possible for direct communication between application to the network layer
9/29/2009 UCSD CSE 124 Networked Services
Network Software APIs
• Network software is usually part of the OS• A common set of APIs, called Network APIs,
are provided• Applications use Network APIs for accessing
network services• These network software APIs called socket
APIs
9/29/2009 UCSD CSE 124 Networked Services
The main socket APIs are• int socket(int domain, int type, int protocol)• int bind(int socketfd, struct sockaddr* addr, int
addr_len)• int listen(int socketfd, int backlog)• int accept(int socket, struct sockaddr* addr, int
addr_len)• int connect(int socket, struct sockaddr *addr, int
addr_len)• int send(int socket, char* message, int msg_len, int
flags)• int recv(it socket, char *buffer, int buf_len, int flags)• int select(int n, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout)• int close(int socket)9/29/2009 UCSD CSE 124 Networked Services
socket APIs in detail• int socket(int domain, int type, int protocol) • domain
– The domain parameter specifies a communication family domain
– helps to have a single socket() api for a number of protocol families
– this selects the protocol family which will be used for communication. Sometimes called Address Family (AF_xxxx in unix systems)
– PF_INET for Internet IPV4; PF_INET6 for Internet IPV6– PF_UNIX/PF_LOCAL for local communication using Unix pipes– PF_PACKET for direct network access
• Packet sockets are used to receive or send raw packets at the device driver (OSI Layer 2) level.
• They allow the user to implement protocol modules in user space on top of the physical layer.
9/29/2009 UCSD CSE 124 Networked Services
socket APIs in detail• int socket(int domain, int type, int protocol) • type (usually defined in <sys/socket.h> file)
– defines the type of communication; end-to-end communication semantics
– SOCK_STREAM Provides sequenced, reliable, two-way, connection-based byte streams. (usually used for TCP-like reliable transport protocols)
– SOCK_DGRAM Supports datagrams (connectionless, unreliable messages of a fixed maximum length). usually used for UDP like connection less transport protocols
– SOCK_RAW Provides raw network protocol access. (used in association with PF_RAW (or old PF_PACKET) protocol family domains)
– SOCK_RDM Provides a reliable datagram layer that does not guarantee ordering.
– Some socket types may not be implemented by all protocol families; for example, SOCK_SEQPACKET is not implemented for PF_INET.
9/29/2009 UCSD CSE 124 Networked Services
socket APIs in detail• int socket(int domain, int type, int protocol) • protocol
– defines the protocol to be used for communication– Normally only a single protocol exists to support a particular socket
type within a given protocol family, in which case protocol can be specified as 0 or UNSPEC
– usually unused as the protocol to be used for the socket is defined by the domain and type parameters
– e.g, PF_INET and SOCK_STREAM defines implies the use of TCP– PF_INET and SOCK_DGRAM defines the use of UDP etc – However, it is possible that many protocols may exist in a certain
protocol family, in which case a particular protocol must be specified using the protocol field.
• Return value – On success a file descriptor for the new socket is returned. – On error, -1 is returned, and errno is set appropriately.
9/29/2009 UCSD CSE 124 Networked Services
socket API in detail• int bind(int socket_fd, struct sockaddr* addr, int addr_len)
– bind() socket function call binds or attaches the newly created socket with local address addr
– It is necessary to assign a local address using bind() before a SOCK_STREAM socket may receive connections
– At a server, that listens to incoming connections must need bind before it can accept connection requests
• int socket_fd– bind() socket function call applies to the socket defined by the identifier
socket_fd• struct sockaddr* addr:
– this structure contains the address – The actual structure passed for the my_addr argument will depend on the
address family– The sockaddr structure is defined as something like:
• struct sockaddr { • sa_family_t sa_family; • char sa_data[14]; }
• int addr_len specifies the length of the addr field and the length depends on the protocol family9/29/2009 UCSD CSE 124 Networked Services
bind in a code example……int sfd;struct sockaddr_un addr;
sfd = socket(AF_UNIX, SOCK_STREAM, 0); /* socket is opened*/
if (sfd == -1) { perror("socket"); exit(EXIT_FAILURE); }
memset(&addr, 0, sizeof(struct sockaddr_un)); /* Clear structure */ addr.sun_family = AF_UNIX; strncpy(addr.sun_path, MY_SOCK_PATH, sizeof(addr.sun_path) - 1);
/* address binding */ if (bind(sfd, (struct sockaddr *) &addr, sizeof(struct sockaddr_un)) == -1) { perror("bind"); exit(EXIT_FAILURE); }……
9/29/2009 UCSD CSE 124 Networked Services
socket API in detail• int listen(int socketfd, int backlog)• listen() API specifies the willingness to accept incoming
connections and a queue limit for incoming connections on a newly created socket
• required for server side sockets • int socketfd
– the soccket on which listen() is to be carried out• int backlog
– The backlog parameter defines the maximum length the queue of pending connections may grow to.
– If a connection request arrives with the queue full the client may receive an error with an indication of ECONNREFUSED
• returns 0 if success else -1 where the errno will be set with an appropriate error code
9/29/2009 UCSD CSE 124 Networked Services
socket API in detail• int accept(int sockfd, struct sockaddr *addr, socklen_t
*addrlen); – this system call is used with connection-based socket
types (e.g. SOCK_STREAM)– It extracts the first connection request on the queue of
pending connections– Creates a new connected socket– Returns a new file (socket) descriptor referring to that
socket– The original socket sockfd is unaffected by this call– The newly created socket is not in the listening state
• The argument int sockfd – is a socket that has been created with socket(.), bound to a
local address with bind(.), and is listening for connections after a listen(.) socket API call.9/29/2009 UCSD CSE 124 Networked Services
socket API in detail• int accept(int sockfd, struct sockaddr *addr, socklen_t
*addrlen); • The argument struct sockaddr *addr
– a pointer to a sockaddr structure. – This structure is filled in with the address of the peer
(remote host’s) socket that is accepted to the communication session
• The argument socklen_t *addrlen– The addrlen argument is a value-result argument– it should initially contain the size of the structure pointed
to by addr– on return it will contain the actual length (in bytes) of the
address returned– When addr is NULL nothing is filled in
9/29/2009 UCSD CSE 124 Networked Services
• int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen); – a socket can be either blocking or non-blocking– a blocking socket API call does not return until the call is completed
with a result– e.g, accept(.) can block the caller function until a connection is
present (which sometimes can result in a long wait)– a socket can be made non-blocking by system call select(..)– If the socket is marked non-blocking and no pending connections are
present on the queue, accept(.) fails (returns) with the error EAGAIN– the caller function need not infinitely wait for the accept(.) call to
return • Return values
– On success, accept(.) returns a non-negative integer that is a descriptor for the accepted socket
– On error, -1 is returned, and errno is set appropriately
socket API in detail
9/29/2009 UCSD CSE 124 Networked Services
• int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout)
– select() allows • a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become
"ready" for some class of I/O operation (e.g., for read) without blocking. – nfds is the highest-numbered file descriptor in any of the three sets + 1– Readfds
• The file descriptors listed in readfds will be watched to see if characters become available for reading (more precisely, to see if a read will not block
• in particular, a file descriptor is also ready on end-of-file)– writefds
• File descriptors will be watched to see if a write will not block for writing– exceptfds
• File descriptors in this structure will be watched for exceptions. • On exit, the sets are modified in place to indicate which file descriptors actually changed status.
– Each of the three file descriptor sets may be specified as NULL if no file descriptors are to be watched for the corresponding class of events.
– Three macros are provided to manipulate the file descriptor sets. • FD_ZERO() clears a set. • FD_SET() and FD_CLR() respectively add and remove a given file descriptor from a set. • FD_ISSET() tests to see if a file descriptor is part of the set (this is useful after select() returns).
socket API in detail
9/29/2009 UCSD CSE 124 Networked Services
select() examplefd_set read_sockfds; struct timeval tv; int retval; /* Watch read_sock to see when it has input. */
FD_ZERO(&read_sockfds); FD_SET(0, &read_sockfds);
tv.tv_sec = 0; tv.tv_usec = 1000; /* Wait up to 1 milli second. */
retval = select(read_sockfds +1, &read_sockfds, NULL, NULL, &tv); /* Don't rely on the value of tv now! */
if (retval == -1) perror("select()");
else if (retval) printf("Data is available now.\n"); /* FD_ISSET(0, &read_sockfds) will be true. */
else printf("No data within five seconds.\n");
struct timeval { time_t tv_sec; /* seconds */ suseconds_t tv_usec; /* microseconds */ };
9/29/2009 UCSD CSE 124 Networked Services
socket API details• There are two other methods to make a socket
non-blocking– pselect(.)
• Similar to select, except that it can take time in nano seconds, it does not modify the timeval struct and it takes sigmask additional parameter.
– fcntl(.) • int fcntl(int fd, int cmd, long arg);• By setting the arg with O_NONBLOCK flag can make a socket
non-blocking– recv(.) with appropriate flags set
• O_NONBLOCK flag set• May not work on all implementations of network socket API
9/29/2009 UCSD CSE 124 Networked Services
socket API details• send(), sendto(), and sendmsg()
– The system calls send(), sendto(), and sendmsg() are used to transmit a message to another socket.
– The send() call may be used only when the socket is in a connected state (so that the intended recipient is known)
– ssize_t send(int socket_fd, const void *buf, size_t len, int flags);• socket_fd is the socket on which send is to be carried out• *buf carries the data to be sent• len carries the length of the message• flags define special control signals that needs to be considered for transmission (e.g,
bitwise OR of MSG_DONTROUTE, MSG_MORE, MSG_OOB messages) – In non-blocking mode it would return EAGAIN in this case.– Return values
• On success, these calls return the number of characters sent• On error, -1 is returned, and errno is set appropriately
– ssize_t sendto(int socketfd, const void *buf, size_t len, int flags, const struct sockaddr *to, socklen_t tolen);
– ssize_t sendmsg(int socketfd, const struct msghdr *msg, int flags);• Preferred for UDP like connection-less services
9/29/2009 UCSD CSE 124 Networked Services
Socket API details• recv(), recvfrom() and recvmsg() are used for receiving
data from a socket– ssize_t recv(int socketfd, void *buf, size_t len, int flags); – The recv() call is normally used only on a connected socket
where the remote address is known– recv() can be a blocking call unless explicitly made non-
blocking– A blocking recv() can be indefinitely waiting till it gets data
from the socket– In certain implementations flags can help a non-blocking
call– recvfrom() and recvmsg() are mainly for message-based
communications such as for UDP
9/29/2009 UCSD CSE 124 Networked Services
socket API details• int connect(int sockfd, const struct sockaddr
*serv_addr, socklen_t addrlen)– Request a connection using the socket referred to by the
file descriptor sockfd to the address specified by serv_addr
– Usually called by a client host to get connected to a server host
– Can be used both for STREAM and DGRAM sockets
– For DGRAM sockets, the serv_addr is the remote host address to which default data is sent
9/29/2009 UCSD CSE 124 Networked Services
Socket API details• Two ways to close a socket
– close() and shutdown()• int close(int socketfd)
– closes a socket descriptor, so that it no longer refers to any socket and may be reused.
– Not checking the return value of close() is a serious programming error.
• int shutdown(int socketfd, int how); – The shutdown() call causes all or part of a full-duplex connection on
the socket associated with socketfd to be shut down– Argument how determines communication after shutdown
• If how is SHUT_RD, further receptions will be disallowed. • If how is SHUT_WR, further transmissions will be disallowed. • If how is SHUT_RDWR, further receptions and transmissions will be
disallowed. – Return values
• On success, zero is returned. • On error, -1 is returned, and errno is set appropriately. 9/29/2009 UCSD CSE 124 Networked Services
What happens when you click on a web link?
9/29/2009 UCSD CSE 124 Networked Services
UCSD CSE 124 Networked Services
source
application
transportnetwork
linkphysical
HtHn M
segment Ht
datagram
destination
application
transportnetwork
linkphysical
HtHnHl M
HtHn M
Ht M
M
networklink
physical
linkphysical
HtHnHl M
HtHn M
HtHn M
HtHnHl M
router
switch
Encapsulationmessage M
Ht M
Hn
frame
9/29/2009
Major steps in downloading a web page
• Extract hostname from URL– http://www.google.com/index.html to www.google.com
• Use DNS to translate www.google.com to IP address – Used for Internet routing
• Establish a TCP (socket) connection to the IP address (e.g., 66.102.7.104) – Protocol agreement for browser and server to speak HTTP– TCP handle network problems (drops, corruption, etc.)– TCP layered on top of IP/Ethernet
• Internet Routers determine efficient path to 66.102.7.104
9/29/2009 UCSD CSE 124 Networked Services
Address types and their context
• Domain name (e.g. www.google.com)– Global, human readable
• IP Address (e.g. 66.102.7.104)– Global, works across all networks
• Ethernet (e.g. 08-00-2b-18-bc-65)– Local, works on a particular network
9/29/2009 UCSD CSE 124 Networked Services
Name to address translation
9/29/2009 UCSD CSE 124 Networked Services
Address resolution for finding the local address
9/29/2009 UCSD CSE 124 Networked Services
Protocol stack efficiency• Efficiency of a protocol stack depends on its
implementation• There are two ways for implementation of the
program execution in a protocol stack• Process-per-protocol• Process-per-message
• Process-per-protocol– According to this strategy, every protocol in a layer is
implemented as a separate process– One process per protocol in a layer– A process is an abstraction mechanism that enables
concurrent execution of tasks before the OS.– A certain amount of resources such as address and data
space and CPU cycles are reserved for every process– Most applications are executed as a single process
9/29/2009 UCSD CSE 124 Networked Services
• When a message moves up/down the protocol stack– Many context switches
happen – Context switches happen
between two layers/protocols
– Many times memory copy is required
– Severe performance degradation can result
Application
Transport
Network
Datalink
PHY
Protocol
Protocol
Protocol
Protocol
Protocol
A protocol process
Process per protocol model
9/29/2009 UCSD CSE 124 Networked Services
Process-per-message model
• In the process-per-message model, a process is associated with a message
• Each protocol becomes a static piece of code
• At each protocol/layer, the only process responsible for the message calls the layer-specific procedures
Application
Transport
Network
Datalink
PHY
A message processfor transmission
A message processfor reception
9/29/2009 UCSD CSE 124 Networked Services
Protocol stack efficiency
• Modern network protocol stack prefers Process-per-message– Because it creates lower number of context
switches– Because memory is slower than the processor– Memory access is very expensive
9/29/2009 UCSD CSE 124 Networked Services
Summary
• Network Protocol stack• Protocol Stack API• What happens when you click on a web link?• Efficiency issues
9/29/2009 UCSD CSE 124 Networked Services