3-1. Overview
3-2. Creating, configuring, and destroying sockets
3-2-1. Interface routines
3-2-2. Socket callbacks
3-3. Establishing and breaking connections
3-3-1. Outgoing connections
3-3-2. Incoming connections
3-3-3. Disconnecting
3-4. Sending and receiving data
3-4-1. Sending data
3-4-2. Receiving data
3-4-3. Muting sockets
3-5. Retrieving socket information
3-6. The socket polling routine
Previous section: Core Services functionality | Table of Contents | Next section: The module system
In Services, all network communication is performed through the socket subsystem, defined in the files sockets.c and sockets.h. This subsystem provides routines for managing sockets, which have a type of Socket *, connecting to or accepting connections from the network, and sending and receiving data.
All network communication in Services is performed asynchronously. Send (write) operations buffer the data given and return immediately; receive (read) operations are performed via callbacks, functions called by the central polling routine when particular events occur, and the actual reading routines operate on a read buffer which is filled by the polling routine as data is received from the remote host. The polling routine itself is called by the Services main loop.
The socket subsystem is designed so that it can be used in other programs as well with minimal changes. (In fact, I currently use it in the HTTP server I wrote for my personal domain.) Note, however, that only TCP connections over IPv4 are supported. The socket subsystem uses the following functions from other Services source files, so replacements or stubs will need to be provided: log(), log_perror(), log_debug(), log_perror_debug(), pack_ip().
Below, section 3-2 discusses the creation and management of socket objects, including a list of callbacks; section 3 discusses connecting to (and listening for connections from) remote hosts; section 3-4 discusses sending and receiving data; section 3-5 discusses functions for retrieving information about sockets; and section 3-6 discusses the socket polling routine in detail.
All socket routines which can fail return a valid error code in the system variable errno on failure. The error codes returned are generally the same as those returned by the equivalent system calls or standard library functions; special note of particular error codes is included where appropriate.
There are a few preprocessor constants and macros used in the source that are worth mentioning:
Before any other operations can be performed, a socket object must be created, either explicitly or by accepting a connection from a listener socket (see section 3-3-2. Socket objects are the means by which the socket subsystem keeps track of sockets, and are used in place of system socket descriptors with all relevant functions.
Before actually establishing a connection, certain aspects of socket behavior can be configured. In particular, it is important to set up proper callback routines (see section 3-2-2) or you will be unable to do anything useful with the socket! Sockets can also be set to be blocking (applying only to write operations occurring then the write buffer is full), and a write timeout can be set, causing the socket to be automatically disconnected if a certain amount of time passes without being able to send any data to the remote host. All configuration routines can be called regardless of whether the socket is connected or not.
When a socket object is no longer needed, it should be destroyed in order to free resources used by the socket. If a connected socket is destroyed, it is automatically disconnected first.
The following routines are used for creating, configuring, and destroying sockets:
There are two other routines which operate on the socket subsystem as a whole, rather then on a particular socket:
As mentioned above, the socket subsystem makes use of callbacks for socket activity processing. Rather than having the caller code explicitly read from the socket, potentially blocking if no data is available, a polling routine (discussed in section 3-6) monitors all sockets for activity, "calling back" to the caller when certain events occur. The functions to be called for each event can be set independently for each socket, using the aforementioned sock_setcb() routine; events to which no function is assigned (the default state) are ignored. Two parameters are passed to the callback function: a Socket * parameter indicating the socket on which the event occured, and a void * parameter whose meaning depends on the particular callback (some callbacks do not use it at all). The exact callback function signature is:
This type (a pointer to it, rather) is defined as SocketCallback in sockets.h.
The following events (listed by the name of the constant used as the which parameter to sock_setcb()) can have callback functions attached to them:
Internally, callback functions are called using the local do_callback() routine. This routine sets the SF_CALLBACK flag on the socket while the callback is in progress, to ensure that the socket is not destroyed while it is still in use. The routine also checks after the callback returns whether any flags were set indicating that the connection was broken (SF_BROKEN) or that the socket should be destroyed (SF_DELETEME) or disconnected locally (SF_DISCONNECT). The function returns 0 if the socket was disconnected, 1 otherwise. It also returns 1 if either the socket or the callback is unspecified (NULL or zero), to simplify the caller's logic.
From the point of view of any particular host, there are two general ways in which a connection to a remote host can be established: either by actively connecting to the remote host (outgoing), or by waiting for the remote host to request a connection (incoming). The socket subsystem supports both of these, the former via the conn() routine and the latter via listener sockets. Outgoing connections are handled asynchronously, like all other socket operations, and the SCB_CONNECT callback is used to inform the caller when the connection has completed.
Once a connection has been established, it can also be broken at either the local or the remote side. The socket subsystem provides the disconn() routine for deliberately closing a connection, and notifies the caller of a connection closed by the remote host through the SCB_DISCONNECT callback.
To establish a connection to a remote host, the caller should first create a socket, configure it as appropriate (including, at minimum, the SCB_CONNECT and SCB_DISCONNECT callbacks), then call the conn() routine:
This routine attempts to establish a TCP connection to the host specified by host on TCP port number port. The hostname may be specified as either a hostname, which is passed to the system's gethostbyname() address lookup function (if the call returns multiple addresses, the first one returned by the system will be used), or a numeric IPv4 address, which is parsed directly. The address to be used for the local side of the socket can also be specified with the lhost and lport parameters; values of NULL and zero, respectively, leave the choice of the corresponding parameter to the system.
If an error is encountered in setting up the connection, such as an invalid host name or port number, the conn() routine returns -1 and sets errno appropriately; the value in errno may be negative, indicating a hostname resolution failure (pass the negative of this value to hstrerror() to obtain an error message string). Otherwise, the routine returns zero, signifying that the connection is in process.
When the connection completes, the socket subsystem calls the socket's SCB_CONNECT callback function to notify the caller that the connection is ready for use. Alternatively, the connection may be refused by the remote host, in which case the socket's SCB_DISCONNECT callback is called with a parameter of DISCONN_CONNFAIL.
Note that when a connection is made to the local host or a sufficiently close remote host, the connection may complete or be rejected immediately. If the connection is rejected, conn() will return an error, but if it is accepted, the SCB_CONNECT callback will be called immediately, before conn() returns; the caller must therefore perform all necessary setup for the callback function before calling conn().
Internally, conn() sets the SF_CONNECTING flag on the socket while the connection is being processed by the system; the socket polling routine then watches for the connection to complete or fail, setting the SF_CONNECTED flag in the former case and calling the appropriate callback. If the connection completes immediately, conn() itself sets the SF_CONNECTED flag.
In order to accept connections from remote hosts, a listener socket, which "listens" for connections on a specified port, must first be created. This is done with the open_listener() routine:
This function turns the given socket into a listener socket which will accept connections on the given TCP port. If host is not NULL, connections will only be accepted on the corresponding IP address (processed the same way as for conn()—note in particular that if the hostname has multiple addresses associated with it, only one will be used). The backlog value is passed directly to the system's listen() function, and indicates how many connections the system should allow to be pending (recognized by the system but not yet accepted by the program).
In order to actually accept connections, a function must be assigned to the socket's SCB_ACCEPT callback (if no function is assigned, any connections received by the socket will be dropped immediately). The void * parameter passed to this function is a new socket object that has been created for the connection, already connected to the remote host. The new socket is initialized in the same manner as sockets created with sock_new(), so the accept callback will need to configure the socket appropriately (setting callback functions and so on). The new socket can be used in the same manner as sockets explicitly created with sock_new(), except that the socket will be automatically destroyed when disconnected (thus the caller must be careful not to continue using the socket after the connection is closed).
Internally, the SF_LISTENER flag is used to mark an active listener socket. When a read event occurs on such a socket, the do_accept() internal routine accepts the connection, creates a new socket object (with the SF_SELFCREATED flag set, to indicate that the socket was created internally rather than an external sock_new() call and should be destroyed upon disconnection), sets up the new socket with the accepted connection, and calls the listener socket's SCB_ACCEPT callback function.
When a connection is no longer needed, the disconn() routine can be called to disconnect a connected socket from the remote host:
This routine first flushes the socket's write buffer if any unsent data remains, then calls the socket's SCB_DISCONNECT callback with a parameter of DISCONN_LOCAL. The return value is zero on success, -1 on error (such as an invalid or listener socket). If there is unsent data remaining in the write buffer and it cannot be immediately sent to the remote host, however, then disconn() will return successfully without calling the SCB_DISCONNECT callback; in this case, the callback will be called by the socket polling routine once all the data has been sent.
A connection may also be closed by the remote host; in this case, the socket subsystem will automatically return the socket to an unconnected state, calling the SCB_DISCONNECT callback with a parameter of DISCONN_REMOTE. In this case, there is no point in attempting to send any data remaining in the write buffer, and it is simply discarded. If the disconnection is detected while a socket callback is being processed (for example, when an attempt to send data to the remote host fails), the disconnection will be postponed until the callback completes, so callback functions do not need to worry about checking the socket's status after every send operation.
Listener sockets cannot be closed with disconn(); a separate function, close_listener(), is used to stop the socket from accepting new connections and return it to an unused (unconnected) state.
Internally, disconnection is handled by the do_disconn() routine, which is called for all types of disconnections—local, remote, and connection failures—with the code appropriate to the disconnection reason. The code may be bitwise OR'd with DISCONN_RESUME_FLAG, a local flag (masked out before calling the disconnect callback) indicating that the call to do_disconn() is being made to continue a disconnection in progress. The routine performs the following operations:
Note that the values used for the disconnection codes do not include zero; this is to avoid unexpected consequences when the value is converted to a pointer for use as the void * argument to the callback function. (Theoretically, conversions both ways should handle the zero and NULL values appropriately, but there's always the possibility of a broken compiler . . .)
The socket subsystem includes several routines for sending data to and receiving data from remote systems, which more or less mimic the standard system and library functions for reading and writing data. These routines never block, however; send operations store the given data in the socket's write buffer and return immediately (with the exception discussed under sock_set_blocking() in section 3-2-1), while receive operations return an end-of-file condition if there is not enough data in the read buffer to satisfy the operation.
There are two main families of data sending routines: string-based (stdio-like) routines and buffer-based routines. The interfaces are as follows:
The first three functions perform the actual writing using the internal buffered_write() routine. This function first resets the timestamp used for detecting send timeouts if there is no data waiting to be sent (either in the write buffer or in mapped buffers, as described below), then copies the caller's data into the socket's write buffer up to the current buffer size and calls the flush_write_buffer() routine to send out any buffered data that can be sent without blocking; these two steps are repeated until all of the caller's data has been buffered (and possibly sent).
The flush_write_buffer() routine, in turn, first checks for mapped write buffers and write triggers, as described below, then calls the system's send() function to send a single contiguous block of data to the remote host. The data actually sent (which may be none at all, if the system is not ready to accept any more data for the socket) is removed from the write buffer, and the number of bytes sent is returned. The routine returns -1 on system error (or invalid parameter), and -2 if the socket was in the middle of a disconnect and there is no more data to send. flush_write_buffer() also manages the set of file descriptors to watch for write-ready events, used in the polling routine, and shrinks the socket's buffers if a send operation removes all pending data from the write buffer.
If an attempt to flush the write buffer fails when the buffer is full, buffered_write() attempts to expand the buffer. This is done by calling resize_how_much() to find out how much the buffer should be expanded by (the current implementation uses a constant 10%, rounded up to the next multiple of SOCK_MIN_BUFSIZE), then performs the actual resize operation. If resize_how_much() returns zero, meaning that trying to expand the buffer would exceed the per-connection or total buffer size limit, or if the attempt to resize the buffer fails, then buffered_write() either returns an EAGAIN error or blocks until some buffer space can be freed, depending on whether the socket has been set blocking via sock_set_blocking() or not.
In order to avoid the overhead of moving data around on every send operation, the write buffer is used circularly, through the use of four pointers in the Socket structure:
The wbuf and wtop pointers remain constant (except for changes in the location or size of the buffer itself) for the life of the socket, while the wptr and wend pointers advance circularly through the buffer space as data is added and removed. The amount of data in the buffer can be computed as wend - wptr, modulo the buffer size; note that this difference will be negative if wend has wrapped around to the beginning of the buffer but wptr has not, so the buffer size (wtop - wbuf) must be added to the result, as is done in the write_buffer_len() function. Thus an empty buffer is indicated by wend == wptr, while a full buffer is indicated by wend == wptr-1 (again, modulo the buffer size), leaving a one-byte pad to avoid a full buffer being mistakenly treated as an empty one.
The last two functions for sending data, swritemap() and swrite_trigget(), record their data in the write-map list, a singly-linked list of structures containing information on mapped buffers and write triggers. The structure is struct wmapinfo, defined within the Socket structure definition in sockets.c, and contains the following fields:
The head of the list is stored in the writemap field of the socket object; the tail of the list is also recorded in the writemap_tail field for efficiency reasons. If the list is empty, both fields are NULL.
When swritemap() or swrite_trigger() is called, a new structure is created and appended to the list, with the map and maplen fields set to the buffer pointer and length (or the trigger data and zero), the pos field set to zero, and the wait field set to the current length of the write buffer. The flush_write_buffer() then checks at the beginning of the routine whether the first write-map structure (if one exists) is ready for processing (has a wait value of zero). If so, data is sent from the mapped buffer instead of the socket's write buffer; in the case of a write trigger, the SF_WTRIGGER flag is set, to cause the polling routine to call the socket's write trigger callback function and to prevent any further buffer flushes from occurring before then. Otherwise, data is sent from the socket's write buffer as usual, and every write-map structure's wait field is decremented by the number of bytes sent. Implementation note: This roundabout method is a result of adding the swritemap() and swrite_trigger() fields after the initial socket subsystem design was complete (in fact, Services does not use them at all—I added them for my HTTP server). A more intelligent design would use a write-map or similar structure for every block of data to be sent, possibly pointing into a common buffer like the current write buffer.
Like send operations, there are two main groups of receive routines, string- (or character-) based and buffer-based:
All four of these functions operate on the socket's read buffer. This buffer is filled by the socket polling routine (see section 3-6) when data has been received by the system and is available for reading from the socket. The routines are intended to be called from the SCB_READ or SCB_READLINE callbacks, which are called by the polling routine when data is available.
The use of two distinct callbacks for reading data is to facilitate the processing of both binary and textual data. When data has been received on the socket's connection and stored in the buffer, the socket subsystem first calls the SCB_READ callback function, passing the number of bytes available for reading (an integer value, cast to void * for the call). If the callback function leaves some data in the buffer (or no function is assigned), and if at least one newline character is present in the buffer, the SCB_READLINE callback function is then called, again with the number of bytes available for reading. If there is still data left in the buffer, both callbacks are called again in order, repeating until either all data has been consumed or the SCB_READ/SCB_READLINE pair does not read any data from the socket's buffer.
The read buffer is handled in the same manner as the write buffer; four fields (rbuf, rptr, rend, and rtop) are used to manage data insertion and removal, and like the write buffer, the read buffer is used circularly to avoid overhead from moving data around.
If the caller does not want to receive socket events for a certain socket, the socket can be muted. The socket subsystem will not attempt to receive any data from the remote host for a muted socket, and will not call the SCB_READ or SCB_READLINE callbacks. Listener sockets can also be muted, causing the socket subsystem to not accept any connections from remote hosts or call the SCB_ACCEPT callback (connection attempts will be left waiting in the operating system's queue). Note that muting a socket does not have any effect on the operating system's low-level data processing; if the OS automatically accepts connection attempts at the protocol level, for example, the remote host will still see the connection established. Also note that a socket in the process of connecting or disconnecting will still call the SCB_CONNECT or SCB_DISCONNECT callback when the operation completes, and the write buffer will still be flushed normally, causing the SCB_TRIGGER callback to be called if a write trigger is encountered.
The functions for muting and unmuting sockets are:
The following routines can be used to retrieve information on sockets:
The final socket interface routine, check_sockets(), is the workhorse of the socket subsystem. The routine waits for activity to occur, reading data from any sockets on which new data has been received, flushing the write buffer when a socket becomes ready for sending more data, and accepting connections from listener sockets on which a new connection request is received. If a timeout has been set with sock_set_rto() (see section 3-2-1), the routine returns control to the caller if no activity occurs within that time (the name "rto", for "read timeout", is something of a misnomer, since it applies to all types of activity: read-ready, write-ready, and new connection).
When called, check_sockets() first sets up the file descriptor sets and timeout value used in the select() system call (the file descriptor sets are actually initialized and modified by other routines as appropriate; check_sockets() makes a copy of them so the originals are not modified by select(). The timeout is ordinarily the value given to sock_set_rto(), but if one or more sockets with a write timeout set has data pending in the write buffer, the timeout is reduced to the time until the earliest timeout (with second resolution). Then select() is invoked; signals are enabled only for the duration of the select() call, and disabled immediately after select() returns. (It is assumed that signals are disabled when check_sockets() is called; init_signals() disables signals before it returns, so this prerequisite is fulfulled.) The select() call is made in a loop to ensure that a received signal is not interpreted as an error.
After select() returns (and if it does not return an error), check_sockets() then loops through each file descriptor used in the select() call. The Socket structure corresponding to each file descriptor is found from the sockets[] array, maintained separately from the linked list of sockets for this purpose.
A write-ready event on a socket indicates either that a deferred connection has completed or failed, or that a connected socket has is ready to accept data for sending. In the latter case, check_sockets() simply calls flush_write_buffer() to send data out; flush_write_buffer() takes care of removing the file descriptor from the set to check for write-ready events if all data is flushed from the socket's write buffer. In the former case, check_sockets() retrieves the value of the socket's SO_ERROR option, which indicates the status of the connection attempt. A value of zero indicates success, while nonzero is an errno-style error number; the appropriate callback (SCB_CONNECT or SCB_DISCONNECT) is called, the socket's descriptor is removed from the write-ready set, and if the connection was successful and the socket is not muted, the descriptor is then added to the read-ready set so that it is checked on the next call to check_sockets().
After processing any write-ready event for a socket, the socket is checked for write trigger events as indicated by the SF_WTRIGGER socket flag, and if the flag is set, the SCB_TRIGGER is called repeatedly until the flag is no longer set. (The flag is cleared before each call to the callback function, but the callback function may set a new write trigger which is triggered before the function returns.)
A read-ready event on a socket indicates either that a connected socket has received data or a disconnection event, or that a listener socket received a connection request. The latter case is simple; check_sockets() calls do_accept() to accept the connection, then proceeds to the next file descriptor. In the former case, the read buffer is first expanded if it is currently full, then fill_read_buffer() is called to actually receive data from the socket into the read buffer. fill_read_buffer() returns the number of bytes read from the socket, or -1 on error, in which case the socket is disconnected with the DISCONN_REMOTE code. Implementation note: As documented in the code, if data arrives on a connection but the socket's read buffer is full and has reached the per-socket or total buffer size limit, the data will be left alone, causing select() to return immediately the next time it is called; this results in the program "busy-waiting" until either data is removed from the read buffer or space is made available to expand the buffer.
After reading data from the socket, check_sockets() calls the SCB_READ and SCB_READLINE callbacks in turn, as described in section 3-4-2. Even if the socket was not returned in the read-ready set from select(), the callbacks are still called if the SF_UNMUTED flag is set; this flag is set by sock_unmute() to indicate that a socket has just been unmuted, and cleared by check_sockets() before calling the read callbacks.
Finally, check_sockets() checks whether a write timeout has occurred on sockets that have a timeout set. If so, the socket is disconnected with the DISCONN_REMOTE code, on the assumption that the remote host is no longer reachable.
Previous section: Core Services functionality | Table of Contents | Next section: The module system