CS 208 f21 — Network Programming

Table of Contents

arpanet-1969.jpg

ARPANET 1970:

arpanet-1970.png

arpanet-1977.png

These days: >1 billion internet hosts (https://www.statista.com/statistics/264473/number-of-internet-hosts-in-the-domain-name-system/).

1 Client-Server Transaction

  • Most network applications are based on the client-server model:
    • A server process and one or more client processes
    • Server manages some resource
    • Server provides service by manipulating resource for clients
    • Server activated by request from client (think putting an order into a vending machine analogy)

client-server.png

  • Data is sent over the network in units called packets
  • A packet consists of a payload (the content the client is sending to the server or vice versa) and metadata to tell the network where to send the packet
  • How packets are sent and received is determined by the network protocol

2 Global IP Internet

internet-animation.gif

submarine-cables.png

  • An internet is an interconnected set of networks
    • Carleton campus is an example of a network, and it has connections to other networks (i.e., it is connected to the Global IP Internet)
  • The Global IP Internet is based on the TCP/IP protocol family
    • IP (Internet Protocol)
      • Provides basic naming scheme and unreliable delivery capability of packets (datagrams) from host-to-host
    • UDP (Unreliable Datagram Protocol)
      • Uses IP to provide unreliable datagram delivery from process-to-process
    • TCP (Transmission Control Protocol)
      • Uses IP to provide reliable byte streams from process-to-process over connections

network-stack.png

2.1 Programmer's View

  • Hosts are mapped to a set of 32-bit IP addresses1
    • 128.2.203.179
    • 127.0.0.1 (always the local machine, or localhost)
  • The set of IP addresses is mapped to a set of identifiers called Internet domain names
    • 35.227.227.189 is mapped to www.carleton.edu
  • By convention, each byte in a 32-bit IP address is represented by its decimal value and separated by a period
    • IP address: 0x8002C2F2 = 128.2.194.242
  • A process on one Internet host can communicate with a process on another Internet host over a connection

domain-hierarchy.png

3 Anatomy of a Connection

  • Clients and servers communicate by sending streams of bytes over connections. Each connection is:
    • Point-to-point: connects a pair of processes.
    • Full-duplex: data can flow in both directions at the same time,
    • Reliable: stream of bytes sent by the source is eventually received by the destination in the same order it was sent.
  • A socket is an endpoint of a connection
    • Socket address is an IPaddress:port pair
  • A port is a 16-bit integer that identifies a process:
    • Ephemeral port: Assigned automatically by client kernel when client makes a connection request.
    • Well-known port: Associated with some service provided by a server (e.g., port 80 is associated with Web servers)
  • A connection is uniquely identified by the socket addresses of its endpoints (socket pair)
    • (cliaddr:cliport, servaddr:servport)

network-connection.png

  • A client can use the port to indicate what service (process) they are sending a request to:

    port-use.png

3.1 Sockets Interface

  • Set of system-level functions used in conjunction with Unix I/O to build network applications.
  • Created in the early 80's as part of the original Berkeley distribution of Unix that contained an early version of the Internet protocols.
  • Available on all modern systems
    • Unix variants, Windows, OS X, IOS, Android, ARM
  • What is a socket?
    • To the kernel, a socket is an endpoint of communication
    • To an application, a socket is a file descriptor that lets the application read/write from/to the network
    • Remember: All Unix I/O devices, including networks, are modeled as files
  • Clients and servers communicate with each other by reading from and writing to socket descriptors

socket-fds.png

The main distinction between regular file I/O and socket I/O is how the application "opens" the socket descriptors

3.1.1 Echo Server Example

Consists of an echo server and client

  • Server
    • Accepts connection request
    • Repeats back lines as they are typed
  • Client
  • Requests connection to server
  • Repeatedly:
    • Read line from terminal
    • Send to server
    • Read reply from server
    • Print line to terminal

echoserver-structure.png

For a more detailed version2

Echo client:

#include "csapp.h"

int main(int argc, char **argv)
{
    int clientfd;
    char *host, *port, buf[MAXLINE];
    rio_t rio;

    host = argv[1];
    port = argv[2];

    clientfd = Open_clientfd(host, port); // open a connection to host:port
    Rio_readinitb(&rio, clientfd); // initialize rio struct to be ready for buffered reading from clientfd

    while (Fgets(buf, MAXLINE, stdin) != NULL) { // read in up to MAXLINE characters from stdin
        Rio_writen(clientfd, buf, strlen(buf)); // write that string to the socket connected to the server
        Rio_readlineb(&rio, buf, MAXLINE); // read a line from the socket
        Fputs(buf, stdout); // print what we read to stdout
    }
    Close(clientfd); 
    exit(0);
}

Echo server, iterative main routine:

#include "csapp.h"
void echo(int connfd);

int main(int argc, char **argv)
{
    int listenfd, connfd;
    socklen_t clientlen;
    struct sockaddr_storage clientaddr; /* Large enough to accommodate all supported protocol-specific address structures */
    char client_hostname[MAXLINE], client_port[MAXLINE];

    // create a listening socket on the port passed in as a command-line argument
    listenfd = Open_listenfd(argv[1]);
    // loop forever, accepting connections from clients
    while (1) {
        clientlen = sizeof(struct sockaddr_storage); /* Need to tell Accept how big the clientaddr struct is */
        connfd = Accept(listenfd, (SA *)&clientaddr, &clientlen);

        // use getnameinfo library function to convert from clientaddr struct
        // to hostname and port strings
        Getnameinfo((SA *) &clientaddr, clientlen, 
                    client_hostname, MAXLINE, client_port, MAXLINE, 0);
        printf("Connected to (%s, %s)\n", client_hostname, client_port);

        // call echo function, passing file descriptor for the socket connected to client
        echo(connfd);
        // close the connection
        Close(connfd);
    }
    exit(0);
}

Echo server, echo function:

void echo(int connfd)
{
    size_t n;
    char buf[MAXLINE];
    rio_t rio;

    Rio_readinitb(&rio, connfd); // initialize the rio struct for buffered reading from connfd
    while((n = Rio_readlineb(&rio, buf, MAXLINE)) != 0) { // read one line of up to MAXLINE characters from client
        printf("server received %d bytes\n", (int)n);
        Rio_writen(connfd, buf, n); // write the input read from client back to the client
    }
}

4 Web Servers

  • Clients and servers communicate using the HyperText Transfer Protocol (HTTP)
    • Client and server establish TCP connection
    • Client requests content
    • Server responds with requested content
    • Client and server close connection (eventually)

http.png

  • Web servers return content to clients
    • content: a sequence of bytes with an associated MIME (Multipurpose Internet Mail Extensions) type
    • Content is identified by its URL (Uniform Resource Locator)

Example MIME types:

MINE type meaning
text/html HTML document
text/plain Unformatted text
image/gif Binary image encoded in GIF format
image/png Binary image encoded in PNG format
image/jpeg Binary image encoded in JPEG format

4.1 URLs

  • Unique name for a file: URL (Universal Resource Locator)
  • Example URL: http://www.carleton.edu:80/index.html
  • Clients use prefix (http://www.carleton.edu:80) to infer:
    • What kind (protocol) of server to contact (HTTP)
    • Where the server is (www.carleton.edu)
    • What port it is listening on (80)
  • Servers use suffix (/index.html) to:
    • Determine if request is for static or dynamic content.
      • No hard and fast rules for this
      • One convention: executables reside in cgi-bin directory
  • Find file on file system
    • Initial "/" in suffix denotes home directory for requested content.
    • Minimal suffix is "/", which server expands to configured default filename (usually, index.html)

4.2 HTTP Requests

  • HTTP request is a request line, followed by zero or more request headers
  • Request line: <method> <uri> <version>
    • <method> is one of GET, POST, OPTIONS, HEAD, PUT, DELETE, or TRACE
    • <uri> is the content being requested
      • A URL is a type of URI (Uniform Resource Identifier)
    • <version> is HTTP version of request (HTTP/1.0 or HTTP/1.1)
  • Request headers: <header name>: <header data>
    • Provide additional information to the server
  • A blank line ("\r\n") indicates the end of the request

4.3 HTTP Responses

  • HTTP response is a response line followed by zero or more response headers, possibly followed by content, with blank line ("\r\n") separating headers from content.
  • Response line: <version> <status code> <status msg>
    • <version> is HTTP version of the response
    • <status code> is numeric status
    • <status msg> is corresponding English text

      code msg description
      200 OK Request was handled without error
      301 Moved Provide alternate URL
      404 Not found Server couldn’t find the file
  • Response headers: <header name>: <header data>
    • Provide additional information about response
    • Content-Type: MIME type of content in response body
    • Content-Length: Length of content in response body

4.4 Proxies

  • A proxy is an intermediary between a client and an origin server
    • To the client, the proxy acts like a server
    • To the server, the proxy acts like a client

proxy1.png

  • Can perform useful functions as requests and responses pass by
    • Examples: Caching, logging, anonymization, filtering, transcoding

proxy2.png

Footnotes:

1
  • The original Internet Protocol, with its 32-bit addresses, is known as Internet Protocol Version 4 (IPv4)
  • 1996: Internet Engineering Task Force (IETF) introduced Internet Protocol Version 6 (IPv6) with 128-bit addresses
    • Intended as the successor to IPv4
  • Majority of Internet traffic still carried by IPv4 (https://www.google.com/intl/en/ipv6/statistics.html) ipv6-adoption.png
  • We will focus on IPv4, but will show you how to write networking code that is protocol-independent.
2

open_clientfd and open_listenfd use the collection of sockets system calls to set up a client or a listening socket. We don't have time to go through this interface in detail—the specific calls are shown in the diagram below, and the textbook has more information.

echoserver-structure-sockets.png