Thursday, August 5, 2010

Socket Programming In C

Today we will discuss about Sockets programming paradigm, elements of Sockets applications, and the Sockets API. The Sockets API allows to develop applications that communicate over a network. The network can be a local private network or the public Internet. An important item about Sockets programming is that it's neither operating system specific nor language specific. Sockets applications can be written in the Ruby scripting language on a GNU/Linux host or in C on an embedded controller. This freedom and flexibility are the reasons that the BSD4.4 Sockets API is so popular.


Layered Model of Networking



Sockets programming uses the layered model of packet communication as shown in the figure below. At the top is the application layer, which is where the applications exist (those that utilize Sockets for communication). Below is the application layer defines the Sockets layer. This isn't actually a layer, but it is shown to illustrate where the API is located. The Sockets layer sits on top of the transport layer. The transport layer provides the transport protocols. Next is the network layer, which provides among other things routing over the Internet. This layer is occupied by the Internet Protocol, or IP. Finally, is the physical layer driver, which provides the means to introduce packets onto the physical network.



Sockets API Summary



The networking API for C provides a mixed set of functions for the development of client and server applications. Some functions are used by only server-side sockets, whereas others are used solely by client-side sockets (most are available to both).

Creating and Destroying Sockets

The first step of any Sockets-based application is to create a socket.The socket function provides the following prototype:

Code:
int socket( int domain, int type, int protocol );
The socket object is represented as a simple integer and is returned by the socket function. Three parameters must be passed to define the type of socket to be created. Right now, you are interested primarily in stream (TCP) and datagram (UDP) sockets, but many other types of sockets can be created. In addition to stream and datagram, a raw socket is also illustrated by the following code snippets:

Code:
myStreamSocket = socket( AF_INET, SOCK_STREAM, 0 );
myDgramSocket = socket( AF_INET, SOCK_DGRAM, 0 );
myRawSocket = socket( AF_INET, SOCK_RAW, IPPROTO_RAW );
The AF_INET symbolic constant indicates that we are using the IPv4 Internet protocol. After this, the second parameter (type) defines the semantics of communication. For stream communication (using TCP), you use the SOCK_STREAM type, and for datagram communication (using UDP), you specify SOCK_DGRAM. The third parameter can define a particular protocol to use, but only the types exist for stream and datagram, so this third parameter is left as zero in those cases.

When we've finished with a socket, we must close it. The close prototype is defined as follows:

Code:
int close( sock );
After close is called, no further data can be received through the socket. Any data queued for transmission is given some amount of time to be sent before the connection physically closes.

Socket Addresses

For socket communication over the Internet (domain AF_INET), we use the sock-addr_in structure for naming purposes.

Code:
struct sockaddr_in 
{
    int16_t sin_family;
    uint16_t sin_port;
    struct in_addr sin_addr;
    char sin_zero[8];
};
struct in_addr  
{
    uint32_t s_addr;
};
For Internet communication, we use AF_INET solely for sin_family. Field sin_port defines your specified port number in network byte order. Therefore, we must use htons to load the port and ntohs to read it from this structure. Field sin_addr is, through s_addr, a 32-bit field that represents an IPv4 Internet address.IPv4 addresses are 4-byte addresses. Often the sin_addr is set to INADDR_ANY, which is the wildcard. When you're accepting connections (server socket), this wildcard accepts connections from any available interface on the host. For client sockets, this is commonly left blank. For a client, sin_addr is set to the IP address of a local interface, this restricts outgoing connections to that interface.

Now let us take look at a quick example of addressing for both a client and a server. First, in this example we create the socket address (later to be bound to your server socket) that permits incoming connections on any interface and port 48000.
Code:
int servsock;
struct sockaddr_in servaddr;
servsock = socket( AF_INET, SOCK_STREAM, 0);
memset( &servaddr, 0, sizeof(servaddr) );
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons( 48000 );
servaddr.sin_addr.s_addr = inet_addr( INADDR_ANY );
Next, we create a socket address that permits a client socket to connect to your previously created server socket.

Code:
int clisock;
struct sockaddr_in servaddr;
clisock = socket(AF_INET, SOCK_STREAM, 0);
memset(&servaddr, 0, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons(48000);
servaddr.sin_addr.s_addr = inet_addr("192.168.1.1");
Note the similarities between these two code segments. The difference,is that the server uses the address to bind to itself as an advertisement. The client uses this information to define to whom it wants to connect.

Socket Primitives



Now I will look at a number of other important server-side socket control primitives.

bind

The bind function provides a local naming capability to a socket. This can be used to name either client or server sockets, but it is used most often in the server case.

The bind function is provided by the following prototype:

Code:
int bind( int sock, struct sockaddr *addr, int addrLen );
The socket to be named is provided by the sock argument, and the address structure previously defined is defined by addr. Note that the structure here differs from the address structure discussed previously. The bind function can be used with a variety of different protocols, but when we are using a socket created with AF_INET, we must use the sockaddr_in. Therefore, as shown in the following example, we cast our sockaddr_in structure as sockaddr.

Code:
err = bind( servsock, (struct sockaddr *)&servaddr,sizeof(servaddr));
Using the address structure created in the server example in the previous address section, we bind the name defined by servaddr to our server socket servsock.

Recall that a client application can also call bind to name the client socket. This isn't used often, because the Sockets API dynamically assigns a port to us.

listen

Before a server socket can accept incoming client connections, it must call the listen function to declare this willingness. The listen function is provided by the following function prototype:

Code:
int listen( int sock, int backlog );
The sock argument represents the previously created server socket, and the backlog argument represents the number of outstanding client connections that might be queued. Within GNU/Linux, the backlog parameter (post 2.2 kernel version) represents the number of established connections pending on accept for the application layer protocol. Other operating systems might treat this differently.

accept

The accept call is the final call made by servers to accept incoming client connections. Before accept can be called, the server socket must be created, a name must be bound to it, and listen must be called. The accept function returns a socket descriptor for a client connection and is provided by the following function prototype:

Code:
int accept( int sock, struct sockaddr *addr, int *addrLen );
In practice, two examples of accept are commonly seen. The first represents the case in which we need to know who connected to us. This requires the creation of an address structure that is not initialized.
Code:
struct sockaddr_in cliaddr;
int cliLen;
cliLen = sizeof( struct sockaddr_in );
clisock = accept( servsock, (struct sockaddr *)cliaddr, &cliLen );
The call to accept blocks until a client connection is available. Upon return, the clisock return value contains the value of the new client socket, and cliaddr represents the address for the client peer (host address and port number).

The alternate example is commonly found when the server application isn't interested in the client information. This one typically appears as follows:

Code:
cliSock = accept( servsock, (struct sockaddr *)NULL, NULL );
In this case, NULL is passed for the address structure and length. The accept function then ignores these parameters.

connect

The connect function is used by client Sockets applications to connect to a server. Clients must have created a socket and then defined an address structure containing the host and port number to which they want to connect. The connect function is provided by the following function prototype:

Code:
int connect( int sock, (struct sockaddr *)servaddr, int addrLen );
The sock argument represents the client socket, created previously with the Sockets API function. The servaddr structure is the server peer to which you want to connect . Finally, we must pass in the length of your servaddr structure so that connect knows we are passing in a sockaddr_in structure.

The following code shows a complete example of connect:

Code:
int clisock;
struct sockaddr_in servaddr;
clisock = socket( AF_INET, SOCK_STREAM, 0);
memset( &servaddr, 0, sizeof(servaddr) );
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons( 48000 );
servaddr.sin_addr.s_addr = inet_addr( "192.168.1.1" );
connect( clisock, (struct sockaddr_in *)&servaddr, sizeof(servaddr) );
The connect function blocks until either an error occurs or the three-way handshake with the server finishes. Any error is returned by the connect function.

Sockets I/O



A variety of API functions exist to read data from a socket or write data to a socket. Two of the API functions (recv, send) are used exclusively by sockets that are connected (such as stream sockets), whereas an alternative pair (recvfrom, sendto) is used exclusively by sockets that are unconnected (such as datagram sockets).

Connected Socket Functions

The send and recv functions are used to send a message to the peer socket endpoint and to receive a message from the peer socket endpoint. These functions have the following prototypes:

Code:
int send( int sock, const void *msg, int len, unsigned int flags );
int recv( int sock, void *buf, int len, unsigned int flags );
The send function takes as its first argument the socket descriptor from which to send the msg. The msg is defined as a (const void *) because the object referenced by msg is not altered by the send function. The number of bytes to be sent in msg is contained by the len argument. Finally, a flags argument can alter the behavior of the send call. An example of sending a string through a previously created stream socket is shown as follows:

Code:
strcpy( buf, "Hello\n");
    send( sock, (void *)buf, strlen(buf), 0);
In this example, our character array is initialized by the strcpy function. This buffer is then sent through sock to the peer endpoint, with a length defined by the string length function, strlen. To see flags use let us take a look at one side effect of the send call. When send is called, it can block until all of the data contained within buf has been placed on the socket's send queue. If not enough space is available to do this, the send function blocks until space is available. If we want to avoid this blocking behavior and instead want the send call to simply return if sufficient space is available, we can set the MSG_DONTWAIT flag, such as follows:

Code:
send( sock, (void *)buf, strlen(buf), MSG_DONTWAIT);
The return value from send represents either an error (less than 0) or the number of bytes that were queued to be sent. Completion of the send function does not imply that the data was actually transmitted to the host, only that it is queued on the socket's send queue waiting to be transferred.
The recv function mirrors the send function in terms of an argument list. Instead of sending the data pointed to be msg, the recv function fills the buf argument with the bytes read from the socket. We must define the size of the buffer so that the network protocol stack doesn't overwrite the buffer, which is defined by the len argument. Finally, we can alter the behavior of the read call using the flags argument. The value returned by the recv function is the number of bytes now contained in the msg buffer, or -1 on error. An example of the recv function is as follows:

Code:
#define MAX_BUFFER_SIZE        50
char buffer[MAX_BUFFER_SIZE+1];
...
numBytes = recv( sock, buffer, MAX_BUFFER_SIZE, 0);
At completion of this example, numBytes contains the number of bytes that are contained within the buffer argument.
We can peek at the data that's available to read by using the MSG_PEEK flag. This performs a read, but it doesn't consume the data at the socket. This requires another recv to actually consume the available data. An example of this type of read is illustrated as follows:

Code:
numBytes = recv( sock, buffer, MAX_BUFFER_SIZE, MSG_PEEK);
This call requires an extra copy (the first to peek at the data, and the second to actually read and consume it). More often than not, this behavior is handled instead at the application layer by actually reading the data and then determining what action to take.

Unconnected Socket Functions

The sendto and recvfrom functions are used to send a message to the peer socket endpoint and receive a message from the peer socket endpoint. These functions have the following prototypes:

Code:
int sendto( int sock, const void *msg, int len,unsigned int flags,const struct sockaddr *to, int tolen );
int recvfrom( int sock, void *buf, int len,unsigned int flags,struct sockaddr *from, int *fromlen );
The sendto function is used by an unconnected socket to send a datagram to a destination defined by an initialized address structure. The sendto function is similar to the previously discussed send function, except that the recipient is defined by the to structure. An example of the sendto function is shown in the following code:

Code:
struct sockaddr_in destaddr;
    int sock;
    char *buf;
    ...
    memset( &destaddr, 0, sizeof(destaddr) );
    destaddr.sin_family = AF_INET;
    destaddr.sin_port = htons(581);
    destaddr.sin_addr.s_addr = inet_addr("192.168.1.1");
    sendto( sock, buf, strlen(buf), 0,(struct sockaddr *)&destaddr, sizeof(destaddr) );
In this example, the datagram (contained with buf) is sent to an application on host 192.168.1.1, port number 581. The destaddr structure defines the intended recipient for the datagram.
As with the send function, the number of characters queued for transmission is returned, or -1 if an error occurs.

The recvfrom function provides the ability for an unconnected socket to receive datagrams. The recvfrom function is again similar to the recv function, but an address structure and length are provided. The address structure is used to return the sender of the datagram to the function caller. This information can be used with the sendto function to return a response datagram to the original sender.

An example of the recvfrom function is shown in the following code:

Code:
#define MAX_LEN 100
    struct sockaddr_in fromaddr;
    int sock, len, fromlen;
    char buf[MAX_LEN+1];
    ...
    fromlen = sizeof(fromaddr);
    len = recvfrom( sock, buf, MAX_LEN, 0,(struct sockaddr *)&fromaddr, &fromlen );
This blocking call returns when either an error occurs (represented by a -1 return) or a datagram is received (return value of 0 or greater). The datagram is contained within buf and has a length of len. The fromaddr contains the datagram sender, specifically the host address and port number of the originating application.

Socket Options

Socket options permit an application to change some of the modifiable behaviors of sockets and the functions that manipulate them. For example, an application can modify the sizes of the send or receive socket buffers or the size of the maximum segment used by the TCP layer for a given socket.

The functions for setting or retrieving options for a given socket are provided by the following function prototypes:

Code:
int getsockopt( int sock, int level, int optname,void *optval, socklen_t *optlen );
int setsockopt( int sock, int level, int optname,const void *optval, socklen_t optlen );
First, we define the socket of interest using the sock argument. Next, we must define the level of the socket option that is being applied.

The level argument can be :
  • SOL_SOCKET for socket-layer options,
  • IPPROTO_IP for IP layer options, and
  • IPPROTO_TCP for TCP layer options.
The specific option within the level is applied using the optname argument. Arguments optval and optlen define the specifics of the value of the option. optval is used to get or set the option value, and optlen defines the length of the option. This slightly complicated structure is used because structures can be used to define options.

Now let us take a look at an example for both setting and retrieving an option. In the first example, we retrieve the size of the send buffer for a socket.

Code:
int sock, size, len;
    ...
    getsockopt( sock, SOL_SOCKET, SO_SNDBUF, (void *)&size,
    (socklen_t *)&len );
    printf( "Send buffer size is &d\n", size );
Now let us take a look at a slightly more complicated example. In this case, we're going to set the [b]Socket linger option[b]. Socket linger allows you to change the behavior of a stream socket when the socket is closed and data is remaining to be sent. After close is called, any data remaining attempts to be sent for some amount of time. If after some duration the data cannot be sent, then the data to be sent is abandoned. The time after the close when the data is removed from the send queue is defined as the linger time. This can be set using a special structure called linger, as shown in the following example:

Code:
struct linger ling;
    int sock;
    ...
    ling.l_onoff = 1; /* Enable */
    ling.l_linger = 10; /* 10 seconds */
    setsockopt( sock, SOL_SOCKET, SO_LINGER,(void *)&ling, sizeof(struct linger) );
After this call is performed, the socket waits 10 seconds after the socket close before aborting the send.

Other Miscellaneous Functions

Now it's time to look at a few miscellaneous functions from the Sockets API and the capabilities they provide. The three function prototypes discussed in this section are shown in the following code:

Code:
struct hostent *gethostbyname( const char *name );
    int getsockname( int sock, struct sockaddr *name, socklen_t*namelen );
    int getpeername( int sock, struct sockaddr *name, socklen_t*namelen );
Function gethostbyname provides the means to resolve a host and domain name (otherwise known as a fully qualified domain name, or FQDN) to an IP address. For example, the FQDN of www.microsoft.com might resolve to the IP address 64.4.31.252. Converting an FQDN to an IP address is important because all of the Sockets API functions work with number IP addresses (32-bit addresses) rather than FQDNs.

An example of the gethostbyname function is shown below:

Code:
struct hostent *hptr;
    hptr = gethostbyname( "www.microsoft.com");
    if (hptr == NULL) // can't resolve...
    else 
     {
       printf("Binary address is %x\n", hptr-> h_addr_list[0]);
     }
Function gethostbyname returns a pointer to a structure that represents the numeric IP address for the FQDN (hptr->h_addr_list[0]). Otherwise, gethostbyname returns a NULL, which means that the FQDN could not be resolved by the local resolver. This call blocks while the local resolver communicates with the configured DNS servers.

Function getsockname permits an application to retrieve information about the local socket endpoint. This function, for example, can identify the dynamically assigned ephemeral port number for the local socket.

An example of its use is shown in the following code:

Code:
int sock;
    struct sockaddr localaddr;
    int laddrlen;
    // Socket for sock created and connected.
    ...
    getsockname( sock, (struct sockaddr_in *)&localaddr, &laddrlen );
    printf( "local port is %d\n", ntohs(localaddr.sin_port) );
The reciprocal function of getsockname is getpeername. This permits you to gather addressing information about the connected peer socket. An example, similar to the getsockname example, is shown in the following code:

Code:
int sock;
    struct sockaddr remaddr;
    int raddrlen;
    // Socket for sock created and connected.
    ...
    getpeername( sock, (struct sockaddr_in *)&remaddr, &raddrlen );
    printf( "remote port is %d\n", ntohs(remaddr.sin_port) );
In both examples, the address can also be extracted using the sin_addr field of the sockaddr structure.

0 comments:

Post a Comment