IRC Services Technical Reference Manual

3. Communication (socket) handling

3-1. Overview
3-2. Creating, configuring, and destroying sockets
    3-2-1. Interface routines
    3-2-2. Socket callbacks
3-3. Establishing and breaking connections
    3-3-1. Outgoing connections
    3-3-2. Incoming connections
    3-3-3. Disconnecting
3-4. Sending and receiving data
    3-4-1. Sending data
    3-4-2. Receiving data
    3-4-3. Muting sockets
3-5. Retrieving socket information
3-6. The socket polling routine

Previous section: Core Services functionality | Table of Contents | Next section: The module system

3-1. Overview

In Services, all network communication is performed through the socket subsystem, defined in the files sockets.c and sockets.h. This subsystem provides routines for managing sockets, which have a type of Socket *, connecting to or accepting connections from the network, and sending and receiving data.

All network communication in Services is performed asynchronously. Send (write) operations buffer the data given and return immediately; receive (read) operations are performed via callbacks, functions called by the central polling routine when particular events occur, and the actual reading routines operate on a read buffer which is filled by the polling routine as data is received from the remote host. The polling routine itself is called by the Services main loop.

The socket subsystem is designed so that it can be used in other programs as well with minimal changes. (In fact, I currently use it in the HTTP server I wrote for my personal domain.) Note, however, that only TCP connections over IPv4 are supported. The socket subsystem uses the following functions from other Services source files, so replacements or stubs will need to be provided: log(), log_perror(), log_debug(), log_perror_debug(), pack_ip().

Below, section 3-2 discusses the creation and management of socket objects, including a list of callbacks; section 3 discusses connecting to (and listening for connections from) remote hosts; section 3-4 discusses sending and receiving data; section 3-5 discusses functions for retrieving information about sockets; and section 3-6 discusses the socket polling routine in detail.

All socket routines which can fail return a valid error code in the system variable errno on failure. The error codes returned are generally the same as those returned by the equivalent system calls or standard library functions; special note of particular error codes is included where appropriate.

There are a few preprocessor constants and macros used in the source that are worth mentioning:

SOCK_MIN_BUFSIZE: Sets both the minimum socket buffer size and the increment by which the buffer size is increased when necessary. Defined in sockets.h.
WARN_ON_BUFSIZE: If defined, causes a warning message to be logged when a socket's read or write buffer cannot be expanded due to the per-connection or total buffer size limit (as set by sock_set_buflimits()). Only one warning is logged per socket. Optionally defined in sockets.c.
DISABLE_SWRITEMAP: If defined, disables the swritemap() function (see section 3-4), removing a dependency on the munmap() system call and the sys/mman.h header file. Optionally defined in sockets.c.
TRACE_CALLS: If defined, enables tracing of function calls (entry and exit) via log_debug(). Optionally defined in sockets.c.
ENTER(fmt,...) ENTER_WITH(retfmt, fmt,...): Used in function call tracing to log entry to a function, along with the values of any parameters. fmt is a printf()-style format string (which must be a string literal) for formatting the function parameter values (passed as extra parameters to the macro), and should cause each parameter to be written in an appropriate format separated by commas. ENTER_WITH() is used for functions which return a value, and takes an extra parameter, retfmt, which is the format string to use to display the return value, such as "%d" or "%p" (the parameter is passed here to avoid having to include it at every return point). These macros do nothing if function call tracing is not enabled. Defined in sockets.c.
RETURN RETURN_WITH(val): Used in function tracing to log exit from a function (and perform the return as well). RETURN is used for void functions, while RETURN_WITH() is used for functions that return a value. If function call tracing is not enabled, these macros simply perform a return. Defined in sockets.c.

3-2. Creating, configuring, and destroying sockets

Before any other operations can be performed, a socket object must be created, either explicitly or by accepting a connection from a listener socket (see section 3-3-2. Socket objects are the means by which the socket subsystem keeps track of sockets, and are used in place of system socket descriptors with all relevant functions.

Before actually establishing a connection, certain aspects of socket behavior can be configured. In particular, it is important to set up proper callback routines (see section 3-2-2) or you will be unable to do anything useful with the socket! Sockets can also be set to be blocking (applying only to write operations occurring then the write buffer is full), and a write timeout can be set, causing the socket to be automatically disconnected if a certain amount of time passes without being able to send any data to the remote host. All configuration routines can be called regardless of whether the socket is connected or not.

When a socket object is no longer needed, it should be destroyed in order to free resources used by the socket. If a connected socket is destroyed, it is automatically disconnected first.

3-2-1. Interface routines

The following routines are used for creating, configuring, and destroying sockets:

Socket *sock_new(): Creates and returns a new, unconnected socket object. The socket has no callbacks associated with it, and defaults to no write timeout and non-blocking mode. The socket's internal read and write buffers are each created with a size of SOCK_MIN_BUFSIZE bytes.
void sock_setcb(Socket *s, SocketCallbackID which, SocketCallback func): Sets the function for the callback event selected by which (one of the SCB_* constants) to the function func for the given socket. If func is NULL, then no function will be called for the selected callback event.
sock_set_wto(Socket *s, int seconds): Sets the write timeout for the given socket to seconds seconds. If data is buffered for sending and no data can be sent within the specified interval, the socket will automatically be disconnected, as if the remote host had closed the connection. A value of zero for seconds clears the timeout, allowing data to be buffered indefinitely.
sock_set_blocking(Socket *s, int blocking): Sets whether write operations should block or return an error (EAGAIN) if no buffer space is available for data passed to one of the send routines and the remote host is not ready to accept more data. Note that if sufficient buffer space is available, write operations will always return immediately regardless of this setting.
void sock_free(Socket *s): Destroys the given socket, freeing all resources used by the socket. If the socket is connected to a remote host, the connection is automatically terminated, as if disconn() (see section 3-3-3) had been called.

There are two other routines which operate on the socket subsystem as a whole, rather then on a particular socket:

sock_set_buflimits(uint32 per_conn, uint32 total): Sets the maximum combined send and receive buffer size allowed for a single socket (first parameter) and for all sockets as a whole (second parameter), in bytes. Any data that arrives when one of these limits has been reached will not be received until buffer space becomes available; any send operations performed will fail if no space is available. A value of zero for either parameter disables the respective limit; any other value is rounded down to a multiple of SOCK_MIN_BUFSIZE (values smaller than SOCK_MIN_BUFSIZE are rounded up).
sock_set_rto(int msec): Sets the read timeout used by the socket polling routine, in milliseconds. If no socket activity occurs during this interval, the polling routine will return control to its caller. Zero is a valid value, and tells the polling routine to return immediately if no data has been received by the system. A negative value disables the timeout, causing the polling routine to wait indefinitely for socket activity.

3-2-2. Socket callbacks

As mentioned above, the socket subsystem makes use of callbacks for socket activity processing. Rather than having the caller code explicitly read from the socket, potentially blocking if no data is available, a polling routine (discussed in section 3-6) monitors all sockets for activity, "calling back" to the caller when certain events occur. The functions to be called for each event can be set independently for each socket, using the aforementioned sock_setcb() routine; events to which no function is assigned (the default state) are ignored. Two parameters are passed to the callback function: a Socket * parameter indicating the socket on which the event occured, and a void * parameter whose meaning depends on the particular callback (some callbacks do not use it at all). The exact callback function signature is:

void callback(Socket *s, void *param)

This type (a pointer to it, rather) is defined as SocketCallback in sockets.h.

The following events (listed by the name of the constant used as the which parameter to sock_setcb()) can have callback functions attached to them:

SCB_CONNECT

Called when a connection initiated by calling the conn() routine (see section 3-3-1) completes successfully. The void * parameter is not used.

SCB_DISCONNECT

Called when an existing connection is broken, either by the remote host or by calling the disconn() routine (see section 3-3-3), or when a connection initiated by calling the conn() routine fails. The void * parameter indicates the type of disconnection:

DISCONN_LOCAL: The connection was broken locally by calling the disconn() routine.
DISCONN_REMOTE: The connection was broken by the remote host.
DISCONN_CONNFAIL: The connection attempt was rejected by the remote host or otherwise failed.

For remote disconnection events (DISCONN_REMOTE and DISCONN_CONNFAIL), the global errno variable indicates the cause of the disconnection if known, zero otherwise.

SCB_ACCEPT

Called when a remote host connects to a listener socket (see section 3-3-2). The primary socket parameter to the callback is the listener socket, and the void * parameter is the newly-created socket (of type Socket *). Note that listener sockets will immediately drop all incoming connections if no function is assigned to this callback.

SCB_READ

Called when data has been received on the socket and is available for reading (see section 3-4-2). The void * parameter, cast to uint32, is the number of bytes of data available for reading.

SCB_READLINE

Called when data has been received on the socket and is available for reading, much like SCB_READ; however, this callback is only called when a full line of data (containing a newline) is available to be read. If both this callback and SCB_READ have functions assigned to them, both functions will be called in turn until there is no more data to process; see section 3-4-2 for details.

SCB_TRIGGER

Called when a write trigger is encountered on the socket. Write triggers are created using the swrite_trigger() routine, described in section 3-4-1. A write trigger causes this callback to be called when all data before the trigger has been successfully sent to the remote host, but before any data beyond the trigger has been sent. The void * parameter is the arbitrary value passed to swrite_trigger().

Internally, callback functions are called using the local do_callback() routine. This routine sets the SF_CALLBACK flag on the socket while the callback is in progress, to ensure that the socket is not destroyed while it is still in use. The routine also checks after the callback returns whether any flags were set indicating that the connection was broken (SF_BROKEN) or that the socket should be destroyed (SF_DELETEME) or disconnected locally (SF_DISCONNECT). The function returns 0 if the socket was disconnected, 1 otherwise. It also returns 1 if either the socket or the callback is unspecified (NULL or zero), to simplify the caller's logic.

3-3. Establishing and breaking connections

From the point of view of any particular host, there are two general ways in which a connection to a remote host can be established: either by actively connecting to the remote host (outgoing), or by waiting for the remote host to request a connection (incoming). The socket subsystem supports both of these, the former via the conn() routine and the latter via listener sockets. Outgoing connections are handled asynchronously, like all other socket operations, and the SCB_CONNECT callback is used to inform the caller when the connection has completed.

Once a connection has been established, it can also be broken at either the local or the remote side. The socket subsystem provides the disconn() routine for deliberately closing a connection, and notifies the caller of a connection closed by the remote host through the SCB_DISCONNECT callback.

3-3-1. Outgoing connections

To establish a connection to a remote host, the caller should first create a socket, configure it as appropriate (including, at minimum, the SCB_CONNECT and SCB_DISCONNECT callbacks), then call the conn() routine:

int conn(Socket *s, const char *host, int port, const char *lhost, int lport)

This routine attempts to establish a TCP connection to the host specified by host on TCP port number port. The hostname may be specified as either a hostname, which is passed to the system's gethostbyname() address lookup function (if the call returns multiple addresses, the first one returned by the system will be used), or a numeric IPv4 address, which is parsed directly. The address to be used for the local side of the socket can also be specified with the lhost and lport parameters; values of NULL and zero, respectively, leave the choice of the corresponding parameter to the system.

If an error is encountered in setting up the connection, such as an invalid host name or port number, the conn() routine returns -1 and sets errno appropriately; the value in errno may be negative, indicating a hostname resolution failure (pass the negative of this value to hstrerror() to obtain an error message string). Otherwise, the routine returns zero, signifying that the connection is in process.

When the connection completes, the socket subsystem calls the socket's SCB_CONNECT callback function to notify the caller that the connection is ready for use. Alternatively, the connection may be refused by the remote host, in which case the socket's SCB_DISCONNECT callback is called with a parameter of DISCONN_CONNFAIL.

Note that when a connection is made to the local host or a sufficiently close remote host, the connection may complete or be rejected immediately. If the connection is rejected, conn() will return an error, but if it is accepted, the SCB_CONNECT callback will be called immediately, before conn() returns; the caller must therefore perform all necessary setup for the callback function before calling conn().

Internally, conn() sets the SF_CONNECTING flag on the socket while the connection is being processed by the system; the socket polling routine then watches for the connection to complete or fail, setting the SF_CONNECTED flag in the former case and calling the appropriate callback. If the connection completes immediately, conn() itself sets the SF_CONNECTED flag.

3-3-2. Incoming connections

In order to accept connections from remote hosts, a listener socket, which "listens" for connections on a specified port, must first be created. This is done with the open_listener() routine:

int open_listener(Socket *s, const char *host, int port, int backlog)

This function turns the given socket into a listener socket which will accept connections on the given TCP port. If host is not NULL, connections will only be accepted on the corresponding IP address (processed the same way as for conn()—note in particular that if the hostname has multiple addresses associated with it, only one will be used). The backlog value is passed directly to the system's listen() function, and indicates how many connections the system should allow to be pending (recognized by the system but not yet accepted by the program).

In order to actually accept connections, a function must be assigned to the socket's SCB_ACCEPT callback (if no function is assigned, any connections received by the socket will be dropped immediately). The void * parameter passed to this function is a new socket object that has been created for the connection, already connected to the remote host. The new socket is initialized in the same manner as sockets created with sock_new(), so the accept callback will need to configure the socket appropriately (setting callback functions and so on). The new socket can be used in the same manner as sockets explicitly created with sock_new(), except that the socket will be automatically destroyed when disconnected (thus the caller must be careful not to continue using the socket after the connection is closed).

Internally, the SF_LISTENER flag is used to mark an active listener socket. When a read event occurs on such a socket, the do_accept() internal routine accepts the connection, creates a new socket object (with the SF_SELFCREATED flag set, to indicate that the socket was created internally rather than an external sock_new() call and should be destroyed upon disconnection), sets up the new socket with the accepted connection, and calls the listener socket's SCB_ACCEPT callback function.

3-3-3. Disconnecting

When a connection is no longer needed, the disconn() routine can be called to disconnect a connected socket from the remote host:

void disconn(Socket *s)

This routine first flushes the socket's write buffer if any unsent data remains, then calls the socket's SCB_DISCONNECT callback with a parameter of DISCONN_LOCAL. The return value is zero on success, -1 on error (such as an invalid or listener socket). If there is unsent data remaining in the write buffer and it cannot be immediately sent to the remote host, however, then disconn() will return successfully without calling the SCB_DISCONNECT callback; in this case, the callback will be called by the socket polling routine once all the data has been sent.

A connection may also be closed by the remote host; in this case, the socket subsystem will automatically return the socket to an unconnected state, calling the SCB_DISCONNECT callback with a parameter of DISCONN_REMOTE. In this case, there is no point in attempting to send any data remaining in the write buffer, and it is simply discarded. If the disconnection is detected while a socket callback is being processed (for example, when an attempt to send data to the remote host fails), the disconnection will be postponed until the callback completes, so callback functions do not need to worry about checking the socket's status after every send operation.

Listener sockets cannot be closed with disconn(); a separate function, close_listener(), is used to stop the socket from accepting new connections and return it to an unused (unconnected) state.

Internally, disconnection is handled by the do_disconn() routine, which is called for all types of disconnections—local, remote, and connection failures—with the code appropriate to the disconnection reason. The code may be bitwise OR'd with DISCONN_RESUME_FLAG, a local flag (masked out before calling the disconnect callback) indicating that the call to do_disconn() is being made to continue a disconnection in progress. The routine performs the following operations:

Ensures that the socket pointer is not NULL and does not reference a listener socket, returning an error (-1 with EINVAL) in either case.
Checks whether the socket's SF_DISCONNECTING flag, indicating a disconnection operation in progress, is set. If the flag is set and the DISCONN_RESUME_FLAG flag is not set in the disconnection code, the disconnection request is ignored and success (zero) is returned.
Checks whether the socket's SF_DISCONN_REQ flag, indicating a disconnection request in progress, is set. If so, and if the disconnection code is DISCONN_LOCAL, the request is ignored. (Remote disconnects and connection failures are passed through, overriding local disconnects, so that if a remote disconnect is detected while disconn() is flushing the write buffer, for example, the disconnect callback will see the code DISCONN_REMOTE rather than DISCONN_LOCAL.
Checks the socket's SF_CONNECTING and SF_CONNECTED flags. If neither flag is set, the socket is not connected, so there is nothing to do, and the request is ignored.
Sets the socket's SF_DISCONN_REQ flag.
Clears the socket's file descriptor from the set of descriptors to check for read status (see section 3-6).
If the disconnection code is DISCONN_LOCAL and there is unsent data in the write buffer, calls flush_write_buffer() (see section 3-4-1) to send the data out. If the data cannot be sent immediately, the socket's SF_DISCONNECT flag is set, and success is returned; do_disconn() must be called again by the socket polling routine with the DISCONN_RESUME_FLAG flag set in the code once all data has been sent.
Sets the socket's SF_DISCONNECTING flag.
Shuts down communications on the socket at the system level, by calling shutdown() and close(). (The actual closing of the file descriptor is handled by sock_closefd(), an internal routine that takes care of clearing socket object fields and file descriptor set bits appropriately.)
Clears out the socket's write map list (ses section 3-4-1).
Calls the socket's SCB_DISCONNECT callback function, if one is set, passing the disconnection code with the DISCONN_RESUME_FLAG internal flag cleared.
Clears the socket's SF_DISCONNECTING flag.
If the socket's file descriptor is no longer unset (meaning that the disconnect callback function reconnected the socket), aborts further processing and returns success.
Clears the socket's SF_CONNECTING and SF_CONNECTED flags.
If the socket's SF_SELFCREATED flag, indicating a socket created by accepting a connection, or SF_DELETEME flag, indicating a delayed destroy operation, is set, destroys the socket; otherwise frees all buffer space used by the socket.

Note that the values used for the disconnection codes do not include zero; this is to avoid unexpected consequences when the value is converted to a pointer for use as the void * argument to the callback function. (Theoretically, conversions both ways should handle the zero and NULL values appropriately, but there's always the possibility of a broken compiler . . .)

3-4. Sending and receiving data

The socket subsystem includes several routines for sending data to and receiving data from remote systems, which more or less mimic the standard system and library functions for reading and writing data. These routines never block, however; send operations store the given data in the socket's write buffer and return immediately (with the exception discussed under sock_set_blocking() in section 3-2-1), while receive operations return an end-of-file condition if there is not enough data in the read buffer to satisfy the operation.

3-4-1. Sending data

There are two main families of data sending routines: string-based (stdio-like) routines and buffer-based routines. The interfaces are as follows:

int sputs(const char *str, Socket *s): Sends the given null-terminated string to the given socket, like fputs(). (Does not write a trailing newline.) Returns the number of bytes written, or -1 on failure.
int sockprintf(Socket *s, const char *fmt, ...) int vsockprintf(Socket *s, const char *fmt, va_list args: Formats the variadic argument list according to the format string fmt and sends it to the given socket, like fprintf(). Returns the number of bytes written, or -1 on failure.
int32 swrite(Socket *s, const char *buf, int32 len): Sends data from buffer buf of length len to the given socket, like write(). Returns the number of bytes written, or -1 on failure.
int32 swritemap(Socket *s, const char *buf, int32 len): Sends data to the given socket, like swrite(); the buffer buf is taken to be a region of memory (of length len) mapped with mmap(), and will be freed automatically with munmap() when all of the data has been sent to the socket. If DISABLE_SWRITEMAP (see section 3-1) is defined, returns the error ENOSYS.
int swrite_trigger(Socket *s, void *data): Inserts a write trigger at the socket's current write buffer position, causing the socket's SCB_TRIGGER callback function to be called when all data written prior to the swrite_trigger() call has been successfully sent to the remote host, and before any data written subsequently has been sent. The value passed as the data parameter is passed on unmodified to the callback function.

The first three functions perform the actual writing using the internal buffered_write() routine. This function first resets the timestamp used for detecting send timeouts if there is no data waiting to be sent (either in the write buffer or in mapped buffers, as described below), then copies the caller's data into the socket's write buffer up to the current buffer size and calls the flush_write_buffer() routine to send out any buffered data that can be sent without blocking; these two steps are repeated until all of the caller's data has been buffered (and possibly sent).

The flush_write_buffer() routine, in turn, first checks for mapped write buffers and write triggers, as described below, then calls the system's send() function to send a single contiguous block of data to the remote host. The data actually sent (which may be none at all, if the system is not ready to accept any more data for the socket) is removed from the write buffer, and the number of bytes sent is returned. The routine returns -1 on system error (or invalid parameter), and -2 if the socket was in the middle of a disconnect and there is no more data to send. flush_write_buffer() also manages the set of file descriptors to watch for write-ready events, used in the polling routine, and shrinks the socket's buffers if a send operation removes all pending data from the write buffer.

If an attempt to flush the write buffer fails when the buffer is full, buffered_write() attempts to expand the buffer. This is done by calling resize_how_much() to find out how much the buffer should be expanded by (the current implementation uses a constant 10%, rounded up to the next multiple of SOCK_MIN_BUFSIZE), then performs the actual resize operation. If resize_how_much() returns zero, meaning that trying to expand the buffer would exceed the per-connection or total buffer size limit, or if the attempt to resize the buffer fails, then buffered_write() either returns an EAGAIN error or blocks until some buffer space can be freed, depending on whether the socket has been set blocking via sock_set_blocking() or not.

In order to avoid the overhead of moving data around on every send operation, the write buffer is used circularly, through the use of four pointers in the Socket structure:

wbuf: Points to the base address of the buffer.
wptr: Points to the first byte of valid data in the buffer.
wend: Points to the first byte after the last byte of valid data in the buffer.
wtop: Points to one byte beyond the last byte of the buffer (for convenience).

The wbuf and wtop pointers remain constant (except for changes in the location or size of the buffer itself) for the life of the socket, while the wptr and wend pointers advance circularly through the buffer space as data is added and removed. The amount of data in the buffer can be computed as wend - wptr, modulo the buffer size; note that this difference will be negative if wend has wrapped around to the beginning of the buffer but wptr has not, so the buffer size (wtop - wbuf) must be added to the result, as is done in the write_buffer_len() function. Thus an empty buffer is indicated by wend == wptr, while a full buffer is indicated by wend == wptr-1 (again, modulo the buffer size), leaving a one-byte pad to avoid a full buffer being mistakenly treated as an empty one.

The last two functions for sending data, swritemap() and swrite_trigget(), record their data in the write-map list, a singly-linked list of structures containing information on mapped buffers and write triggers. The structure is struct wmapinfo, defined within the Socket structure definition in sockets.c, and contains the following fields:

next: A pointer to the next structure in the list.
wait: The number of bytes left to send from the write buffer before processing this structure.
map, maplen: The buffer pointer and buffer length (for write triggers, the data for the callback function and zero).
pos: The current position within the buffer (the number of bytes sent from the buffer so far).

The head of the list is stored in the writemap field of the socket object; the tail of the list is also recorded in the writemap_tail field for efficiency reasons. If the list is empty, both fields are NULL.

When swritemap() or swrite_trigger() is called, a new structure is created and appended to the list, with the map and maplen fields set to the buffer pointer and length (or the trigger data and zero), the pos field set to zero, and the wait field set to the current length of the write buffer. The flush_write_buffer() then checks at the beginning of the routine whether the first write-map structure (if one exists) is ready for processing (has a wait value of zero). If so, data is sent from the mapped buffer instead of the socket's write buffer; in the case of a write trigger, the SF_WTRIGGER flag is set, to cause the polling routine to call the socket's write trigger callback function and to prevent any further buffer flushes from occurring before then. Otherwise, data is sent from the socket's write buffer as usual, and every write-map structure's wait field is decremented by the number of bytes sent. Implementation note: This roundabout method is a result of adding the swritemap() and swrite_trigger() fields after the initial socket subsystem design was complete (in fact, Services does not use them at all—I added them for my HTTP server). A more intelligent design would use a write-map or similar structure for every block of data to be sent, possibly pointing into a common buffer like the current write buffer.

3-4-2. Receiving data

Like send operations, there are two main groups of receive routines, string- (or character-) based and buffer-based:

int sgetc(Socket *s): Reads a single byte (character) from the socket, returning the value of the byte read or EOF if no data is available. Assumes the socket passed in is valid.
int sgets(char *buf, int32 len, Socket *s): Reads a line of data (ending with an LF character, value 0x0A) from the given socket, storing it in buf. If the string (including the null terminator) requires more than len bytes, it is truncated to len-1 bytes, but the entire string (to the newline) is removed from the socket'S buffer. Returns buf, or NULL if no complete line is available to read or an error occurs.
int sgets2(char *buf, int32 len, Socket *s): Reads a line of data from the given socket and stores it in buf, like sgets(); however, a trailing LF or CR/LF pair will be stripped from the string before it is returned.
int sread(Socket *s, char *buf, int32 len): Reads a block of data from the socket, returning the number of bytes successfully read (which may be less than len, or zero, if insufficient data is available in the socket's buffer to satisfy the requsst). Returns -1 on error.

All four of these functions operate on the socket's read buffer. This buffer is filled by the socket polling routine (see section 3-6) when data has been received by the system and is available for reading from the socket. The routines are intended to be called from the SCB_READ or SCB_READLINE callbacks, which are called by the polling routine when data is available.

The use of two distinct callbacks for reading data is to facilitate the processing of both binary and textual data. When data has been received on the socket's connection and stored in the buffer, the socket subsystem first calls the SCB_READ callback function, passing the number of bytes available for reading (an integer value, cast to void * for the call). If the callback function leaves some data in the buffer (or no function is assigned), and if at least one newline character is present in the buffer, the SCB_READLINE callback function is then called, again with the number of bytes available for reading. If there is still data left in the buffer, both callbacks are called again in order, repeating until either all data has been consumed or the SCB_READ/SCB_READLINE pair does not read any data from the socket's buffer.

The read buffer is handled in the same manner as the write buffer; four fields (rbuf, rptr, rend, and rtop) are used to manage data insertion and removal, and like the write buffer, the read buffer is used circularly to avoid overhead from moving data around.

3-4-3. Muting sockets

If the caller does not want to receive socket events for a certain socket, the socket can be muted. The socket subsystem will not attempt to receive any data from the remote host for a muted socket, and will not call the SCB_READ or SCB_READLINE callbacks. Listener sockets can also be muted, causing the socket subsystem to not accept any connections from remote hosts or call the SCB_ACCEPT callback (connection attempts will be left waiting in the operating system's queue). Note that muting a socket does not have any effect on the operating system's low-level data processing; if the OS automatically accepts connection attempts at the protocol level, for example, the remote host will still see the connection established. Also note that a socket in the process of connecting or disconnecting will still call the SCB_CONNECT or SCB_DISCONNECT callback when the operation completes, and the write buffer will still be flushed normally, causing the SCB_TRIGGER callback to be called if a write trigger is encountered.

The functions for muting and unmuting sockets are:

void sock_mute(Socket *s): Mutes the given socket, disabling all read and accept events. Does nothing if the socket was already muted.
void sock_unmute(Socket *s): Unmutes the given socket, allowing read and accept events to occur. Does nothing if the socket was not muted. If any data is present in the socket's read buffer, the SCB_READ and SCB_READLINE callbacks will be called once regardless of whether any new data has arrived on the socket.

3-5. Retrieving socket information

The following routines can be used to retrieve information on sockets:

int sock_isconn(const Socket *s): Returns whether the socket is currently connected to a remote host (nonzero if connected, else zero).
int sock_remote(const Socket *s, struct sockaddr *sa, int *lenptr): Retrieves the remote host's address if the socket is connected, returning 0 on success, -1 on failure. Equivalent to the system call getpeername() on ordinary sockets.
int sock_get_blocking(const Socket *s): Returns whether the given socket is in blocking mode or not. The return value is positive if the socket is in blocking mode, zero if it is in non-blocking mode, or -1 if the socket parameter is invalid.
uint32 read_buffer_len(const Socket *s) uint32 write_buffer_len(const Socket *s): Returns the given socket's read or write buffer length (the amount of data received but not processed, or buffered but not yet sent, respectively), in bytes. The socket parameter must point to a valid socket (it is not checked for validity).
uint32 sock_rwstat(const Socket *s, uint64 *read_ret, uint64 *written_ret): Returns the amount of data received and sent on the given socket. The amount of data received is stored in the location pointed to by read_ret, and the amount sent is stored in the location pointed to by write_ret, both in bytes; the amount of data sent does not include data stored in the write buffer but not yet sent to the remote host. Either pointer can be NULL if the corresponding value is not needed. The routine returns 0 on success, -1 on failure.
int sock_bufstat(const Socket *s, uint32 *socksize_ret, uint32 *totalsize_ret, int *ratio1_ret, int *ratio2_ret): Returns buffer size information about the given socket (if not NULL) and about the socket subsystem as a whole. socksize_ret is set to the number of bytes allocated for the given socket's read and write buffers, and ratio1_ret is set to the ratio of this value to the per-socket buffer size limit set with sock_set_buflimits(), expressed as a percentage rounded up to the nearest integer; if a NULL value is passed for the socket, socksize_ret will be left unmodified, and ratio1_ret will be set to zero. Likewise, totalsize_ret is set to the total number of bytes allocated for socket buffers, and ratio2_ret is set to the percentage ratio of this value to the total buffer size limit. The function's return value is the larger of the two ratios given above, also as a percentage. If the per-socket or total buffer size limit is disabled, the corresponding ratio will be set to zero. Any of the return talue parameters can be set to NULL, in which case the corresponding value will not be returned.

3-6. The socket polling routine

The final socket interface routine, check_sockets(), is the workhorse of the socket subsystem. The routine waits for activity to occur, reading data from any sockets on which new data has been received, flushing the write buffer when a socket becomes ready for sending more data, and accepting connections from listener sockets on which a new connection request is received. If a timeout has been set with sock_set_rto() (see section 3-2-1), the routine returns control to the caller if no activity occurs within that time (the name "rto", for "read timeout", is something of a misnomer, since it applies to all types of activity: read-ready, write-ready, and new connection).

When called, check_sockets() first sets up the file descriptor sets and timeout value used in the select() system call (the file descriptor sets are actually initialized and modified by other routines as appropriate; check_sockets() makes a copy of them so the originals are not modified by select(). The timeout is ordinarily the value given to sock_set_rto(), but if one or more sockets with a write timeout set has data pending in the write buffer, the timeout is reduced to the time until the earliest timeout (with second resolution). Then select() is invoked; signals are enabled only for the duration of the select() call, and disabled immediately after select() returns. (It is assumed that signals are disabled when check_sockets() is called; init_signals() disables signals before it returns, so this prerequisite is fulfulled.) The select() call is made in a loop to ensure that a received signal is not interpreted as an error.

After select() returns (and if it does not return an error), check_sockets() then loops through each file descriptor used in the select() call. The Socket structure corresponding to each file descriptor is found from the sockets[] array, maintained separately from the linked list of sockets for this purpose.

A write-ready event on a socket indicates either that a deferred connection has completed or failed, or that a connected socket has is ready to accept data for sending. In the latter case, check_sockets() simply calls flush_write_buffer() to send data out; flush_write_buffer() takes care of removing the file descriptor from the set to check for write-ready events if all data is flushed from the socket's write buffer. In the former case, check_sockets() retrieves the value of the socket's SO_ERROR option, which indicates the status of the connection attempt. A value of zero indicates success, while nonzero is an errno-style error number; the appropriate callback (SCB_CONNECT or SCB_DISCONNECT) is called, the socket's descriptor is removed from the write-ready set, and if the connection was successful and the socket is not muted, the descriptor is then added to the read-ready set so that it is checked on the next call to check_sockets().

After processing any write-ready event for a socket, the socket is checked for write trigger events as indicated by the SF_WTRIGGER socket flag, and if the flag is set, the SCB_TRIGGER is called repeatedly until the flag is no longer set. (The flag is cleared before each call to the callback function, but the callback function may set a new write trigger which is triggered before the function returns.)

A read-ready event on a socket indicates either that a connected socket has received data or a disconnection event, or that a listener socket received a connection request. The latter case is simple; check_sockets() calls do_accept() to accept the connection, then proceeds to the next file descriptor. In the former case, the read buffer is first expanded if it is currently full, then fill_read_buffer() is called to actually receive data from the socket into the read buffer. fill_read_buffer() returns the number of bytes read from the socket, or -1 on error, in which case the socket is disconnected with the DISCONN_REMOTE code. Implementation note: As documented in the code, if data arrives on a connection but the socket's read buffer is full and has reached the per-socket or total buffer size limit, the data will be left alone, causing select() to return immediately the next time it is called; this results in the program "busy-waiting" until either data is removed from the read buffer or space is made available to expand the buffer.

After reading data from the socket, check_sockets() calls the SCB_READ and SCB_READLINE callbacks in turn, as described in section 3-4-2. Even if the socket was not returned in the read-ready set from select(), the callbacks are still called if the SF_UNMUTED flag is set; this flag is set by sock_unmute() to indicate that a socket has just been unmuted, and cleared by check_sockets() before calling the read callbacks.

Finally, check_sockets() checks whether a write timeout has occurred on sockets that have a timeout set. If so, the socket is disconnected with the DISCONN_REMOTE code, on the assumption that the remote host is no longer reachable.

Previous section: Core Services functionality | Table of Contents | Next section: The module system