IRC Services Technical Reference Manual

2. Core Services functionality

2-1. How does Services work?
2-2. Utility headers and functions
    2-2-1. Header file overview
    2-2-2. Compatibility functions
    2-2-3. Memory allocation
    2-2-4. List and array macros
    2-2-5. Generic hash tables
    2-2-6. Other utility functions
2-3. Program startup and termination
    2-3-1. Initialization
    2-3-2. Configuration files
    2-3-3. The main loop
    2-3-4. Signals
    2-3-5. Termination
2-4. Logging
2-5. Message sending and receiving
    2-5-1. Sending messages
    2-5-2. Receiving messages
    2-5-3. Processing messages
    2-5-4. The ignore list
2-6. Servers, clients, and channels
    2-6-1. Servers
    2-6-2. Clients
    2-6-3. Channels
    2-6-4. Client and channel modes
    2-6-5. High-level actions
2-7. Timed events
2-8. Multilingual support
    2-8-1. Overview
    2-8-2. Using multilingual strings
    2-8-3. Modifying the string table at runtime
    2-8-4. The language file compiler
2-9. Module interfaces
    2-9-1. Encryption
    2-9-2. Database storage
2-10. Module command list maintenance

Previous section: About this manual | Table of Contents | Next section: Communication (socket) handling

2-1. How does Services work?

Services is, at its simplest, simply an IRC server with built-in "bots" ("pseudoclients"—fake clients—in this documentation). The core of Services consists of code to connect to a given IRC server and register in the same way as an ordinary IRC server would, then process IRC messages that arrive from the remote server; instead of listening for client connections and mediating client-to-client conversation, however, Services instead passes received messages to its pseudoclients, which take appropriate action based on the message (for example, sending a /msg containing "REGISTER mypassword" to NickServ, the nickname registration pseudoclient, causes NickServ to register the nickname of the user who sent the message). In this sense, Services can be considered an extension of the traditional IRC bot, but since its many capabilities require knowledge of the state of the entire IRC network—information not available to clients—it is implemented as a server instead.

Services is composed of a core set of functionality, discussed in this section and in sections 3 and 4, on top of which sit modules implementing features such as pseudoclients and database storage, discussed in sections 5 through 8. This section discusses the overall flow of execution, and the implementation details of each set of core functions.

The source code for the core functionality is located in the top source directory. The style guidelines used in writing the Services code can be found in Appendix D; one point that should be noted in particular is that each source file contains a trailer instructing the Emacs and Vim text editors to indent properly and not use tab characters, and this trailer should be appended to any new source files created:

/* * Local variables: * c-file-style: "stroustrup" * c-file-offsets: ((case-label . *) (statement-case-intro . *)) * indent-tabs-mode: nil * End: * * vim: expandtab shiftwidth=4: */

2-2. Utility headers and functions

Before beginning a discussion of the code itself, it is worth noting the common header files used by most of the source code. The various utility routines implemented in Services, are also mentioned, as these are often used in place of traditional C library functions.

2-2-1. Header file overview

While many of the core function groups have their own header files, as noted below, some of the more common routines and structure definitions are collected into a few main header files to reduce file clutter. These files are:

services.h: The main Services header file, included by every source file. This file includes the following header files automatically:
- config.h
- defs.h
- memory.h
- list-array.h
- log.h
- sockets.h
- send.h
- modes.h
- users.h
- channels.h
- servers.h
- extern.h
services.h also declares type names for several common data structures, as well as constants for the clear_channel() function (see section 2-6-5).
config.h: Contains basic information about the compilation and execution environment, along with compilation options selected by the user. Generated automatically by the configure script (see section 10-2). Among the definitions in this file are types for integers of specific sizes: int8, int16, and int32 for signed 8-bit, 16-bit, and 32-bit integers respectively, and uint8, uint16, and uint32 for unsigned integers. (See section 11-1 for why the standard int8_t and similar types were not used.)
defs.h: Contains basic constants and macros used by Services. The top part of this file, through the line that reads "There should be no need to modify anything below this line.", contains settings that can be edited by users with special needs but are considered esoteric enough that they do not warrant an extra option to the configure script. The NICKMAX, CHANMAX, and PASSMAX constants, in particular, set the size of buffers (including the trailing null on strings) used for nicknames, channels, and passwords; the databases rely on these remaining constant for a given set of data, and changing them will result in the data files becoming unusable. For the convert-db utility (see section 9), they are defined to values large enough to handle any data found in other programs' data files, and should never be changed. The latter half of this file consists of including proper system header files, based on the contents of config.h, and ensuring that some basic system constants are defined. A few other simple macros are defined as well:
- sizeof(v) is redefined to return a signed int value, to avoid unnecessary warnings about signed/unsigned conversion.
- lenof(a) gives the length of an array in elements (this must be a C array, not a pointer).
- sgn(n) returns -1 if its parameter is negative, 1 if positive, and 0 if zero; note that n may be evaluated twice.
- FORMAT(type,fmt,start) encapsulates GCC's "format" attribute, used for checking arguments to functions that take format strings, without causing errors on other compilers.
- FUNCPTR and E_FUNCPTR are used to get around an apparent GCC problem in attributes on function pointers.
- PTR_INVALID is a pointer value that can be used when an invalid pointer value other than NULL is required.
extern.h: Contains extern declarations for core source files which do not have their own separate header files. Also defines E as an abbreviation for extern.

2-2-2. Compatibility functions

While most modern compilation environments have a fairly wide range of standard functions included, such functions may not be available on some platforms, or their implementations may contain bugs. To work around such problems, Services includes local versions of several common functions in compat.c, and enables them as necessary based on the configuration results stored in config.h. These functions are:

hstrerror() (strerror() for hostname resolution)
snprintf() / vsnprintf() (defined in vsnprintf.c)
strtok()
stricmp() / strnicmp()
strdup()
strspn() / strcspn()
strerror()
strsignal()

strtok(), in particular, bears mentioning as its behavior in certain cases does not seem to be well-defined by the standard (see, for example, the definition in IEEE Std 1003.1-2001 [www.opengroup.org]). The Services pseudoclients use strtok() to parse commands from clients; in some cases, where a final parameter may contain space characters, this results in the following sequence of calls:

char *param1 = strtok(NULL, " "); char *param2 = strtok(NULL, "");

For conciseness, Services does not check the value of each strtok() call, assuming that if at some point the end of the string is reached, all subsequent calls will return NULL. However, if the remainder of the string contains multiple space characters, some implementations will return the remaining whitespace for the second call despite returning NULL for the first (others, such as old versions of glibc, have been known to crash on the second call). I have not confirmed whether this difference in behavior still has an effect on Services, but it did cause problems at one point; hence this behavior is checked for, and the compatibility strtok() is enabled if the system strtok() does not behave as Services expects.

Also, the stricmp() and strnicmp() functions are alternate names for the POSIX strcasecmp() and strncasecmp() functions (the "i" is for "case-insensitive"). I prefer the former pair of names because I find them to be both concise and clearer about the function's purpose—to me, "case" says "case-sensitive", and I have to recall that strcmp() itself is case-sensitive to avoid confusion. Some compilation environments do in fact provide stricmp() and strnicmp() functions, and they are used if present; if the strcasecmp() pair is instead found, stricmp and strnicmp are defined to be aliases for them.

2-2-3. Memory allocation

Services implements wrappers for the four primary memory allocation functions:

smalloc(long size)
scalloc(long els, long elsize)
srealloc(void *oldptr, long newsize)
sstrdup(const char *s)

The "s" prefix in these function names is short for "safe": if one of these functions fails to allocate memory, it will abort the program by generating a SIGUSR1 signal (see section 2-3-4) rather than returning NULL, so the caller can safely assume that if the memory size requested was not zero, the return value will not be NULL. (This concept is carried over from the earliest days of Services development, when it was known that memory allocation would never fail barring a program bug; however, it is arguably a bad design and could be improved. See section 11-1.) These functions are implemented in the file memory.c, with declarations in memory.h (included by services.h).

Services also has a simple memory misuse checker, activated by the -memchecks option to configure; this code is not very thorough, but can detect some cases of access to unallocated memory, such as trying to free an already-free block of memory, and report the source code file and line where the problem occurred via macros in memory.h. In addition, if the -showallocs option is given to configure, these functions will log every memory allocation and release to the log file, again with the relevant source code file and line, and report on exit whether any memory was leaked. If a leak is found, the log file can be parsed to find allocations which were not reallocated or freed.

The FILELINE macro used in the definitions of smalloc() and related functions is used to add filename and line number parameters only when memory checking is enabled; if so, the actual functions receive an extra two parameters, const char *file and int line, which are passed to the corresponding allocation function (MCmalloc(), etc.). Macros are used in memory.h to pass the current file and line (__FILE__, __LINE__) in these parameters, so that the external interface does not change.

2-2-4. List and array macros

The header file list-array.h defines several macros useful in handling lists and variable-length arrays, with macros for adding and removing elements, iterating over lists and arrays, and searching for an element with a given key (either scalar or complex).

The list-related macros implement a doubly-linked list. The list parameter to each of these macros is assumed to be an lvalue (that is, a variable, structure field, pointer indirection, etc.) with no side effects of the same type as the individual list nodes; this parameter is modified by the insertion and removal macros. The nodes are assumed to be (pointers to) structures containing at least next and prev fields, which are used by these macros to implement the list. The macros are:

LIST_INSERT(node, list): Inserts node into the beginning of list. Insertion is performed in constant time.
LIST_APPEND(node, list): Appends node to the end of of list. Insertion is performed in linear time with the length of the list.
LIST_INSERT_ORDERED(node, list, compare, field): Inserts node into list so that list maintains its order as determined by the function compare called on the field field of each node. field must be a field of node, and compare must be a function that takes two field values and returns -1, 0, or 1 indicating whether the first argument is ordered before, equal to, or after the second (strcmp(), for example). If an equal node is found, node is inserted after it. Insertion is performed in linear time with the length of the list, disregarding the execution time of the comparison function.
LIST_REMOVE(node, list): Removes node from list. node is assumed to already be a part of list. Removal is performed in constant time.
LIST_FOREACH(iter, list): Iterates over every element in list, using iter as the iterator. The macro has the same properties as a for() loop; see the implementation of LIST_SEARCH for an example of usage. iter must be an lvalue.
LIST_FOREACH_SAFE(iter, list, temp): Iterates over list using an extra variable (temp) to hold the next element, ensuring proper operation even when the current element is deleted. iter and temp must be lvalues.
LIST_SEARCH(list, field, target, compare, result): Searches list for a node with field equal to target (as evaluated by compare) and places a pointer to the node found, or NULL if none found, in result. field must be a field of the nodes in list; target must be an expression of the type of field with no side effects; result must be an lvalue; and compare must be a strcmp()-like functio (see LIST_INSERT_ORDERED). The search is performed in linear time, disregarding the execution time of the comparison function.
LIST_SEARCH_SCALAR(list, field, target, result): Searches list as LIST_SEARCH does, but for a scalar value. The search is performed in linear time.
LIST_SEARCH_ORDERED(list, field, target, compare, result): Searches list as LIST_SEARCH does, but for a list known to be ordered. The search is performed in linear time, disregarding the execution time of the comparison function.
LIST_SEARCH_ORDERED_SCALAR(list, field, target, result): Searches list as LIST_SEARCH_ORDERED does, but for a scalar value. The search is performed in linear time.

The variable-length array macros are similar in nature; however, since arrays require both a pointer and an element count, the base macros take two arguments designating the array, array (the pointer) and count (the count of elements), both of which must be lvalues. These macros are named ARRAY2_*, indicating that the array to be operated on is specified by two arguments. A shorthand form of each macro, named ARRAY_*, is also available; this form assumes that the element count is stored in a variable (or field, etc.) named with the name of the array suffixed with "_count". Thus, for example, ARRAY_EXTEND(mystruct->some_array) is exactly equivalent to ARRAY2_EXTEND(mystruct->some_array, mystruct->some_array_count). Note that this implies that if the array pointer is itself an array element (with the element counts presumably stored in a separate array), then the two-argument forms of the macros must be used. As with lists, the array pointer and element count must be lvalues. The macros (only the one-argument forms are shown for conciseness) are as follows:

ARRAY_EXTEND(array): Extends a variable-length array by one entry. Execution time is no greater than linear with the length of the array (depending on whether realloc() has to move the array data).
ARRAY_INSERT(array, index): Inserts a slot at position index in a variable-length array. Execution time is linear with the length of the array.
ARRAY_REMOVE(array, index): Deletes entry number index from a variable-length array. Execution time is linear with the length of the array.
ARRAY_FOREACH(array, iter): Iterates over every element in a variable-length array.
ARRAY_SEARCH(array, field, target, compare, result): Searches a variable-length array for a value. Operates like LIST_SEARCH. result must be an integer lvalue. If nothing is found, result will be set equal to the array's element count (array_count). The search is performed in linear time, disregarding the execution time of the comparison function.
ARRAY_SEARCH_PLAIN(array, target, compare, result): Searches a variable-length array for a value, when the array elements do not have fields. The search is performed in linear time, disregarding the execution time of the comparison function.
ARRAY_SEARCH_SCALAR(array, field, target, result): Searches a variable-length array for a scalar value. The search is performed in linear time.
ARRAY_SEARCH_PLAIN_SCALAR(array, target, result): Searches a variable-length array for a scalar value, when the array elements do not have fields. The search is performed in linear time.

2-2-5. Generic hash tables

The header file hash.h defines macros that can be used to implement a simple hash table, and is used by the core code to maintain the network client, channel, and server lists, as well as by modules such as NickServ and ChanServ for in-memory databases. The file is set up so that a hash table can be defined with a single macro, DEFINE_HASH (for a string key) or DEFINE_HASH_SCALAR (for a scalar key), using these formats:

DEFINE_HASH(name, type, keyfield) DEFINE_HASH_SCALAR(name, type, keyfield, keytype)

The name parameter to the macros gives the name to be used in the hash table's access functions (see below). type gives the data type of the nodes to be stored in the hash table, which must be a structured type containing at least next and prev fields (for maintaining the hash table's collision lists), and keyfield specifies which field of type contains each node's key value. For scalar keys, the additional parameter keytype gives the type of keyfield (string keys are always of type char *).

These macros each define the following functions (parameters to the DEFINE_HASH or DEFINE_HASH_SCALAR macros are given in underlined italic to differentiate them from the function parameters):

void add_name(type *node): Adds the given node to the hash table.
void del_name(type *node): Removes the given node from the hash table.
type *get_name(const char *key) type *get_name(keytype *key): If an element with the given key is stored in the hash table, returns a pointer to that element; otherwise, returns NULL. The first format is used for hashes with string keys, while the second is used for hashes with scalar keys.
type *first_name() type *next_name(): Iterate over all elements in the hash table. For hashes with string keys, elements are returned in lexical order by key if HASH_SORTED is defined (see below). first_name() initializes the iterator to the first element in the hash table and returns it; next_name() returns subsequent elements, one at a time, until all elements have been returned, at which point it returns NULL until first_name() is called again. If there are no elements in the hash table, first_name() will return NULL (as will next_name()). It is safe to delete elements, including the current element, while iterating. If an element is added while iterating, it is undefined whether that element will be returned by next_name() before the end of the hash table is reached.

The following preprocessor macros can be defined to modify the behavior of the hash table functions. Except as otherwise noted, these macros take effect when the DEFINE_HASH and DEFINE_HASH_SCALAR macros are invoked.

EXPIRE_CHECK(node): Returns a boolean value (zero if false, nonzero if true) indicating whether the given node has expired. If the macro evaluates to a true (nonzero) value, the get_name(), first_name(), and next_name() macros will ignore the corresponding node when processing. (This is used, for example, by NickServ and ChanServ to automatically delete nicknames and channels which have expired; in these cases, EXPIRE_CHECK is set to a function which deletes the record and returns nonzero if the record has expired.) Defaults to 0, i.e., no expiration.
HASH_STATIC: Controls whether the hash table functions are defined as static or global functions. This macro is prefixed directly to the function definitions, so it should be defined to either static or an empty value (not an empty string). Defaults to nothing, making the functions globally visible.
HASHFUNC(key): Hashes the given key to a value used as an index into the hash table. Defaults to DEFAULT_HASHFUNC(key), defined by hash.h.
HASHSIZE: Sets the size of the hash table. Should be set to the range of values returned by HASHFUNC(). Defaults to DEFAULT_HASHSIZE, defined by hash.h.
HASH_SORTED: Controls whether the first_name() and next_name() functions return elements in lexical order, as described above; if defined to a nonzero value, lexical sorting for hash tables with string keys is enabled. This macro affects the DEFAULT_HASHFUNC() and DEFAULT_HASHSIZE macros, and must be defined before hash.h is included. Ordinarily, this is set in config.h by the -sorted-lists option to the configure script (see section 10-2).

Internally, the hash table itself is stored in an array defined as type *hashtable_name[HASHSIZE], with each element pointing to list of elements that hash to the value of the array index. This array is defined by the DEFINE_HASHTABLE macro, invoked via DEFINE_HASH or DEFINE_HASH_SCALAR. The add, del, get, and first/last functions are likewise defined by the DEFINE_HASH_ADD (or DEFINE_HASH_ADD_SCALAR), DEFINE_HASH_DEL, DEFINE_HASH_GET (or DEFINE_HASH_GET_SCALAR), and DEFINE_HASH_ITER; the iterator functions (first/next) are defined first, so that the del function can advance the iterator if the element pointed to by the iterator is removed.

The add, del, and get functions are fairly straightforward, adding to, removing from, or searching the appropriate list as given by the hash value of the relevant element's key. The first and next functions are implemented in terms of a common iterator subfunction, _next_name(), which advances the iterator (stored as a hash value in hashpos_name and a pointer within that hash value's list in hashiter_name) to the next element in the hash, leaving the pointer to that element in hashiter_name. The first function initializes the iterator's hash value to -1 and pointer to NULL (a NULL pointer triggers the iterator to advance to the next hash value), calls the iterator function once to load the first element into the iterator, then returns the return value of the next function. The next function saves the current pointer value of the iterator, advances the iterator, then returns the saved pointer value.

The default hash function works only for string keys, and varies depending on whether HASH_SORTED is set. If it is, the hash function uses an internal lookup table (__hashlookup[]) to convert the first two characters of the key to 5-bit values and concatenates those to form a 10-bit value, with the first character's hash value in the upper five bits. The lookup table uses values that increase from 0 to 31 in lexical order, as modified by the RFC 1459 case-treatment rules; since the string-key add function keeps the hash table lists in order, this ensures that the iterator returns all elements in lexical order. The hash table size in this case is 1024, the range of the 10-bit hash value. If HASH_SORTED is not set, the function instead uses a hash table of 65537 (2¹⁶+1) entries, and computes a hash value over all characters of the key string using an internal lookup table (__hashlookup_unsorted[]) based on the irc_lowertable[] array in misc.c (the same table used by the irc_tolower() function, as described in section 2-2-6). This provides more balanced usage of hash table entries, but loses the ability to iterate through the elements in lexical order.

2-2-6. Other utility functions

The remainder of the utility functions are defined in misc.c, and can be broken down into several groups:

String functions

unsigned char irc_tolower(char c): Returns the lower-case version of the given character, like tolower(), but follows IRC protocol rules; unless modified by the protocol module, the three characters [ \ ] are translated to { | }, as required by RFC 1459.
int irc_stricmp(const char *s1, const char *s2) int irc_strnicmp(const char *s1, const char *s2, int max): Versions of stricmp() and strnicmp() that use IRC protocol rules for upper/lower case conversion.
char *strscpy(char *d, const char *s, size_t len): Copies a string safely (the "s" in strscpy) into a buffer. Similar to strncpy(), except that the string is always null-terminated (so that at most len-1 characters of s are copied to d), and if s is shorter than len-1 characters, d is not padded with nulls. Returns d.
char *strbcpy(char *d, const char *s): A shortcut macro for using strscpy() with a buffer declared as a character array (char buffer[N]), intended to reduce the potential for buffer overflows due to size mismatches. Equivalent to strscpy(d, s, sizeof(d)).
char *strmove(char *d, const char *s): A version of strcpy() that can handle overlapping memory regions (for example, deleting characters from the beginning of a string). Returns d.
char *stristr(const char *s1, const char *s2): A case-insensitive version of strstr(). Searches case-insensitively for s2 inside s1, returning the first match found or NULL if no match is found.
char *strupper(char *s): Converts the given string to upper case. Returns s.
char *strlower(char *s): Converts the given string to lower case. Returns s.
char *strnrepl(char *s, int32 size, const char *old, const char *new): Replaces all occurrences of old with new within s. Stops replacing if the result would exceed size-1 bytes. Returns s.
char *strtok_remaining(): Returns any remaining text in the string currently being processed by strtok(), like strtok(NULL,""), with any leading or trailing whitespace stripped.
char *merge_args(int argc, char **argv): Joins the arguments in the given argument array with spaces, and returns the result in a static buffer.
int match_wild(const char *pattern, const char *str) int match_wild_nocase(const char *pattern, const char *str): Returns whether the given string str matches the wildcard pattern (case-sensitively or case-insensitively, respectively). The * (match zero or more characters) and ? (match one character) wildcards are recognized.
int valid_nick(const char *str) int valid_chan(const char *str) int valid_domain(const char *str) int valid_email(const char *str) int valid_url(const char *str): Checks whether the given string is a valid nickname, channel name, domain name, E-mail address, or URL, respectively. Nickname and channel checking behavior default to the behavior defined by the reference IRC server implementation (note that this differs slightly from RFC 1459 for nicknames; the reference implementation is treated as canonical), but may be modified by protocol modules.
int rejected_email(const char *email): Checks whether the given E-mail address matches any address masks given with the RejectEmail configuration directive.

Time-related functions

uint32 time_msec(): Returns the current time to millisecond resolution. The epoch is arbitrary, so returned values can only be used to measure time differences.
time_t strtotime(const char *str, char **endptr): Converts a string to a time_t value, assuming base 10, and sets *endptr to the first character after the parsed time value, as for strtol() and similar functions. Sets errno to ERANGE if the parsed value cannot be represented in a time_t.
int dotime(const char *s): Returns the number of seconds represented by the given time string, which is an integer followed by a unit specifier: "s" for seconds, "m" for minutes, "h" for hours, or "d" for days. Multiple time strings can be concatenated, such as "1h30m". Returns -1 if the string is not a valid time string.

IP address-related functions

uint8 *pack_ip(const char *ipaddr): Converts an IPv4 address string into a 4-byte binary address, and returns a pointer to the packed address (stored in a static buffer), or NULL if the given string does not represent a valid IPv4 address.
char *unpack_ip(const uint8 *ip): Converts a packed IPv4 address into an address string, and returns a pointer to that string (stored in a static buffer).
uint8 *pack_ip6(const char *ipaddr) char *unpack_ip6(const uint8 *ip): IPv6 versions of pack_ip() and unpack_ip().

Base64 encoding and decoding

int encode_base64(const void *in, int insize, char *out, int outsize): Encodes the buffer in of size insize bytes into the buffer out as a base64 string, truncating the result at outsize-1 bytes and appending a null terminator. Returns the number of bytes needed to encode the entire input buffer. The required output buffer size can be determined with encode_base64(in, insize, NULL, 0).
int decode_base64(const char *in, void *out, int outsize): Decodes the base64 string in into the buffer out of size outsize, truncating the output if necessary. Returns the number of bytes needed to store the entire decoded output. The required output buffer size can be determined with decode_base64(in, NULL, 0).

Other functions

int process_numlist(const char *numstr, int *count_ret, range_callback_t callback, ...): Processes a number list of the form "n1[-n2][,n3[-n4]...]", calling the given callback function once for each number contained in the list. Returns the sum of all values returned from the callback function, and stores the number of times the callback function was called in count_ret if it is not NULL. If the callback routine returns -1, process_numlist() aborts processing and returns immediately (the -1 is not included in the sum of the callback return values). The list is sorted so that the values passed to the callback function for a particular list are in strictly increasing order, with no duplicates. Values outside the range 0 through 65536 are discarded to avoid excessive consumption of resources. The callback function type is defined in extern.h as:
int (*range_callback_t)(int num, va_list args)
where num is the number currently being processed and args are the additional arguments passed to process_numlist().
long atolsafe(const char *s, long min, long max): Converts a string in base 10 to a long value, ensuring that the string contains no invalid characters and that it is within the inclusive range min through max. On error, sets errno to EINVAL if the string contains invalid characters or ERANGE if the value is outside of the specified range, and returns min-1.

2-3. Program startup and termination

As with most C programs, Services starts execution at the main() routine, in main.c. This routine performs program initialization (see section 2-3-1), executes the main program loop (see section 2-3-3), and performs cleanup when the main loop terminates (see section 2-3-5). main() takes three parameters from the operating system: ac, the command-line argument count (called argc by some programs); av, the command-line argument vector (called argv by some programs); and envp, the environment pointer.

2-3-1. Initialization

The bulk of initialization is performed by the init() routine, located in init.c. This routine:

Initializes the logging subsystem, by calling open_memory_log() (the log file itself is not opened at this time; any log messages are saved in a memory buffer until a log file is available).
Initializes the memory subsystem (see section 2-2-3).
Parses basic command-line options by calling parse_options(ac,av,0), exiting if an error occurs.
Changes the current directory to the Services data directory, as specified by the -dir command-line option or set by the configure script.
Reads the primary configuration file, ircservices.conf in the data directory (see section 2-3-2).
Re-parses basic command-line options, to override configuration file settings.
Opens the log file, as specified in the configuration file or by the -log command-line option, writing a warning to standard error if the file cannot be opened.
Writes a greeting message to the log file.
Records the current time in the start_time variable (this variable is used in responding to IRC INFO and STATS requests).
If Services was started in read-only mode (with the -readonly option), closes the log file again.
If configured to (with configure -dumpcore), attempts to remove any core dump size limits to ensure that a core dump is written when a segmentation fault occurs.
Initializes the system pseudo-random number generator, using a seed value based on the current time and the process IDs of the Services process and its parent process.
Initializes the socket subsystem (see section 3).
Initializes the module subsystem (see section 4).
Registers callbacks used in init.c and main.c (callbacks are described in section 4-5):
- command line
- introduce_user
- connect
- save data complete
Calls other subsystems' initialization routines.
Initializes multilingual support (see section 2-8), and loads any external language files specified by the LoadLanguageText configuration directive, in the order they were encountered in the configuration file.
Loads all modules specified by the LoadModule configuration directive, in the order they were encountered in the configuration file.
Checks for unrecognized command-line options and passes them to modules via the "command line" callback.
Checks that a protocol module has been loaded, writing an error message to standard error and exiting if not.
If the -nofork option was not specified, writes a message to standard error indicating that initialization succeeded.
If the -nofork option was not specified, calls the system's fork() function to spawn a new process; if successful, the parent process immediately exits with code 0 (success), while execution continues in the child process. Note that from here on down, no fatal errors can occur; any unexpected conditions are reported to the log file.
Writes the process ID of the Services process to the file specified in the PIDFilename configuration directive.
Initializes signal handling (see section 2-3-4).
Creates a socket for communication with the remote server, and initiates the server connection (the connection is performed asynchronously, with the connect_callback() and disconnect_callback() functions in main.c handling connection success and failure, respectively).

Once init() has successfully completed its work, main() initializes three timestamp variables used as second-resolution timers: last_send (defined in send.c), indicating when data was last sent to the server; last_update, indicating when the databases were last written to persistent storage; and last_check, indicating when timed events (see section 2-7) were last checked for timeouts.

Finally, main() initializes an error trap via sigsetjmp(), to which signal handlers can return (via siglongjmp()) when a signal causing program termination is received. Ideally, this call would be located in the do_sigsetjmp() function in signals.c, along with the rest of the signal handling code. However, since sigsetjmp() is not guaranteed to work if the function that called it returns, main() instead invokes the DO_SIGSETJMP macro, also located in main.c; this macro sets up a context buffer, calls sigsetjmp(), then calls do_sigsetjmp() to pass the context buffer pointer to the signal code.

2-3-2. Configuration files

Configuration files are handled by code in conffile.c. This file has two external interfaces: configure(), which reads in settings from a configuration file, and deconfigure(), which restores the settings to their default values. (A third exported function, config_error(), is available for configuration directive handlers, described below, to call in order to print warning or error messages.)

In order to avoid leaving configuration variables in an inconsistent state if an error is found, configuration files are processed in two passes: first all settings are read into temporary storage, then, if no errors were found, the new values are assigned to the appropriate configuration variables. These two steps are both performed by configure(), with the third parameter (action) indicating which step is to be performed: CONFIGURE_READ to read in the configuration file, CONFIGURE_SET to store the new values in the configuration variables, or both (CONFIGURE_READ|CONFIGURE_SET) to perform both steps in one call. Additionally, when configuring settings for the first time, the original value of each configuration variable is saved, allowing deconfigure() to restore those values later.

Configuration information is stored in one of two text files: ircservices.conf for core configuration and modules.conf for module configuration (these filenames cannot be changed at runtime, but can be changed at compilation time via the IRCSERVICES_CONF and MODULES_CONF constants in defs.h). The file to be used implied by the modulename parameter passed to configure(); if a non-NULL name is given, modules.conf is used, otherwise ircservices.conf is used.

Both files use the same format: one configuration directive per line, with comments delimited by # (blank and comment-only lines are permitted). The configuration directive and its parameters are each separated by a nonzero amount of whitespace. If whitespace is needed inside a string parameter, the parameter can be enclosed in double quotes; in this case, double quotes must be used around the entire parameter—they are treated as ordinary characters in the middle of a parameter. Configuration directives are treated case-insensitively.

Aside from caller-specified configuration directives, three meta-directives are recognized in configuration files. The IncludeFile directive allows nesting of configuration files, taking a single parameter which specifies the file to insert. (Nesting is internally limited to 100 levels to avoid recursion loops.) The Module directive is only recognized in modules.conf, and specifies the beginning of a particular module's section, with the module name given as a parameter to the directive; it must be matched by an EndModule directive (taking no parameters). When the modulename parameter to configure() is not NULL, only the configuration directives in the section matching the given module name are processed.

Configuration directives to be processed are specified by an array of ConfigDirective structures, passed as the directives parameter to configure(). This structure is defined in conffile.h, and consists of a name parameter (const char *name) followed by an fixed-size array of parameter substructures (the array is 8 elements long by default, which can be changed via the CONFIG_MAXPARAMS constant in defs.h).

The string in the name field gives the name of the directive, which should consist of only alphanumeric characters, no punctuation or spaces. Directives are treated case-insensitively, and if more than one entry has the same name, only the first will be used. A value of NULL for the name field is used to terminate the array.

The parameter substructure of ConfigDirective contains the following fields:

int type

Specifies both the data type of the variable which holds the parameter value and the format in which it is expressed in the configuration file, using one of the CD_* constants defined in conffile.h:

CD_NONE: No parameter present (processing of the directive is terminated). This constant is not used in actual parameter definitions, but as it has a value of zero, any parameters not specified in the ConfigDirective structure will automatically end the parameter list, so no explicit fencepost is required.
CD_INT: An integer parameter. Any 32-bit integer value is accepted, and the value is stored as type int32.
CD_POSINT: A positive integer parameter. As for CD_INT, but zero and negative values are not accepted.
CD_PORT: A TCP/UDP port number. Integers from 1 through 65535 inclusive are accepted, and the value is stored as type int32.
CD_STRING: A string parameter. A pointer to the string value is stored in the char * configuration variable. Strings are internally allocated with malloc() and freed when necessary by configure() and deconfigure(); they should be treated as read-only by the caller.
CD_TIME: A time parameter, stored as type time_t. The parameter can be either an integer number of seconds or a string of one or more value-unit pairs, where units are "d" (days), "h" (hours), "m" (minutes), or "s" (seconds). For example, a time of 1 hour and 30 minutes could be specified as "1h30m", "90m", or "5400".
CD_TIMEMSEC: A time parameter, parsed as a decimal number of seconds and converted to milliseconds. The value is stored as type int32.
CD_FUNC: A parameter which is handled by an external function (see below).
CD_SET: A pseudo-parameter (does not use up an actual parameter to the directive), which sets a variable of type int to 1 if the directive is seen.
CD_DEPRECATED: A pseudo-parameter, which causes a warning to be written to the log that the directive is deprecated if it is encountered in a configuration file.

int flags

Specifies zero or more flags for the parameter:

CF_OPTIONAL: Indicates that the parameter is optional, not required. If the parameter is missing from the configuration file, the variable's value is not changed.
CF_DIRREQ: Indicates that the directive is required; if the directive is not found when reading the configuration file, an error will be generated. This flag is only valid when used with the first parameter.
CF_SAVED: Used internally. Indicates that the original value of the variable has been saved in the prev field.
CF_WASSET: Used internally. Indicates that the new field has been set.
CF_ALLOCED: Used internally. Indicates that the current value of the string variable was allocated by the configuration file parser.
CF_ALLOCED_NEW: Used internally. Indicates that the string value in the new field was allocated by the configuration file parser.

void *ptr

Points to the variable into which the value read from the configuration file is to be written (except for types CD_FUNC, described below, and CD_DEPRECATED, which does not set a value). Note that for string variables, this is a pointer to the char * variable that will receive the new string pointer, so the effective type is char **.

CDValue prev CDValue new

Used internally to hold the variable's original value and the new value read in from the configuration file, respectively. Callers should not attempt to access these fields.

If the processing required for a parameter is more complex than the basic types listed above, an external function can be specified to process the parameter. To do this. set the parameter's type to CD_FUNC, and in the ptr field, place a pointer to a handler function that takes three parameters and returns an int:

int function(const char *filename, int linenum, char *param);

The first two parameters, filename and linenum, serve two purposes. One is provide the filename and line number currently being processed, when reading the file; these can then be passed to config_error() if a warning or error message needs to be printed. The other is to indicate the action requested of the function. (Note: This is poor design. Ideally, the action requested should be specified by a separate parameter to the function.) If filename is not NULL, then the function is being called to process the parameter string and store any resulting values in a temporary location; if it is NULL, then the function should perform some other action as specified by linenum:

CDFUNC_INIT: Prepare for processing a new value (e.g., save variables' original values and clear any variables used for temporary value storage).
CDFUNC_SET: Copy any temporary values to their final locations.
CDFUNC_DECONFIG: Restore configuration variables' original values.

The final parameter, param, is the parameter string read from the configuration file, and may be modified or destroyed (the parser will not make any further use of it).

The function should return nonzero on success, zero on error. However, errors are only checked for when reading/processing the parameter; the CDFUNC_INIT, CDFUNC_SET, and CDFUNC_DECONFIG operations are assumed to succeed.

Configuration file processing details

The calling pattern of configure() and deconfigure() looks roughly like the following:

configure(..., CONFIGURE_READ) -> read_config_file() -> do_read_config_file() [ -> do_read_config_file()... ] -> parse_config_line() configure(..., CONFIGURE_SET) -> do_all_directives(ACTION_COPYNEW) deconfigure() -> do_all_directives(ACTION_RESTORESAVED)

configure() with the CONFIGURE_SET flag and deconfigure() have basically the same function: to store a value into each configuration variable. This is handled by the internal function do_all_directives, which uses the action parameter to select whether to copy the new value read in from the configuration file (ACTION_COPYNEW) or the saved original value (ACTION_RESTORESAVED).

configure() with the CONFIGURE_READ flag calls read_config_file(), which opens the proper configuration file, initializes all configuration parameters for reading, calls do_read_config_file() to actually process the file, and checks that all required directives were seen; the function returns nonzero on success, zero if an error was detected.

do_read_config_file() iterates through each line of the given file, calling parse_config_line() for each non-empty line (except that the meta-directives IncludeFile, Module, and EndModule are processed directly by do_read_config_file()). parse_config_line(), in turn, splits the given line into directive and parameters, locates the entry in the ConfigDirective array corresponding to the directive (generating an error if none is found), and processes each of the directive's parameters, returning success (nonzero) or failure (zero) to do_read_config_file().

2-3-3. The main loop

Once initialization has completed, Services enters the main loop. This fairly simple loop performs the following functions:

Saves the current databases to persistent storage, if the save_data global flag has been set to a nonzero value or the time specified by the UpdateTimeout configuration directive has passed since the last time the databases were saved. After saving, the save_data flag is cleared and the last database save time is stored in last_update.
Exits the loop if the delayed_quit global flag has been set to a nonzero value. (This flag is used to cause Services to save its databases before exiting, and can be set by the SIGTERM signal, as described below in section 2-3-4, or by the OperServ SHUTDOWN and RESTART commands, as described in section 4 of the user's manual.)
Sends a PING message to the remote server if the connection to the server is active and periodic pinging has been enabled via the the PingFrequency configuration directive.
Checks active timers for timeout events (see section 2-7), if the time specified by the TimeoutCheck configuration directive has passed since the last such check.
Checks sockets for activity (see section 3-6).
Flushes out all accumulated channel mode changes, if channel mode merging has not been enabled by the MergeChannelModes configuration directive (see set_cmode() in section 2-6-5).

The main loop terminates when the quitting global flag becomes nonzero, and can also abort via the delayed_quit flag as described above.

2-3-4. Signals

Services makes use of the following signals to perform certain actions:

SIGTERM: Causes Services to save its databases and terminate, as if the OperServ SHUTDOWN command had been given.
SIGINT, SIGQUIT: Causes Services to terminate immediately without saving its databases, as if the OperServ QUIT command had been given.
SIGHUP: Causes Services to save its databases and reread the configuration file, as if the OperServ UPDATE and REHASH commands had been given. Rehashing is done via the reconfigure() function provided by init.c.
SIGUSR2: Causes Services to close the log file and reopen it immediately; this can be useful if the log file has been moved.

To catch program or system faults, the SIGSEGV, SIGBUS, SIGILL, SIGTRAP, SIGFPE, and (if defined) SIGIOT signals are directed to a generic termination handler (weirdsig_handler(), also used for SIGINT and SIGQUIT). However, if the -dumpcore option was given to configure, SIGSEGV is instead left alone, causing the operating system to abort the program and dump a core file if a segmentation fault occurs.

In addition, SIGUSR1 is used by the memory subsystem to cause the program to abort if an out-of-memory condition is detected by the smalloc(), scalloc(), or srealloc() functions.

All signals not mentioned above are set to "ignore" by the init_signals() function. (However, the SIGPROF and SIGCHLD signals are left alone: SIGPROF so that profiling can be performed, and SIGCHLD so that any child processes can be properly reaped.)

The handlers for each of these signals, as well as the init_signals() function which sets them up, are in signals.c. This source file has two additional external interfaces, enable_signals() and disable_signals(), which enable and disable, respectively, processing of the SIGTERM, SIGHUP, and SIGUSR2 signals, to prevent any action being taken when Services' internal data may be in an inconsistent state. The signals are not ignored, only blocked, so if one of these signals is received between a disable_signals() call and the corresponding enable_signals() call, it will be processed immediately when enable_signals() is called.

2-3-5. Termination

Once the main loop terminates, Services calls the cleanup() function (defined in init.c), which performs the following actions—more or less the reverse of init():

Writes a log entry with the quit message (stored in the global variable quitmsg); a default message is written if none has been provided, but any code that causes Services to exit should provide an appropriate quit message.
Flushes out any unsent mode changes.
Unloads all modules.
If the remote server socket has been opened, sends an SQUIT to the remote server (if connected) and closes the socket.
Calls lang_cleanup() to free all memory used by multilingual support.
Calls other subsystems' cleanup routines.
Unregisters callbacks registered by init().
Calls the module subsystem's cleanup routine.
Calls deconfigure() to reset all configuration variables to their original values and free any strings allocated by the configuration file parser. However, the value of LogFilename is preserved in a static buffer, in case the log needs to be reopened (as can happen when restarting; see below).
Closes the log file.

Finally, Services terminates by returning from main(); however, if a restart has been requested (by setting the restart global flag to a nonzero value), Services first attempts to re-execute itself via the execve system call. If this fails, the log file is reopened and an error message is written (regardless of whether read-only mode was enabled or not).

2-4. Logging

Logging functionality is provided by the functions in log.c (and the corresponding log.h header file, included by services.h).

The logging subsystem does not include explicit initialization or cleanup routines; all necessary processing is carried out in the open_log() and close_log() functions, which are (as the names imply) used to open and close the log file; open_log() in particular relies on the LogFilename global variable (which reflects the same-named configuration directive) for the name of the file to open (see below). There is also an open_memory_log() function, which can be used to set up a memory buffer to hold log messages when the log filename is not yet known; a reopen_log() function, to close and reopen the log file (in case the file has been moved and needs to be recreated, or if LogFilename has changed, for example); and a log_is_open() function, which returns whether a log file or buffer is currently open.

The log filename specified in the LogFilename configuration is taken to be a template containing one or more of the following tokens:

%y: The current year (4 digits).
%m: The current month, 01-12 (2 digits).
%d: The current day of the month, 01-31 (2 digits).
%%: A literal "%" character.

The template is processed by the gen_log_filename() function, called by open_log() and reopen_log(); the function replaces the tokens with their appropriate values and returns the resulting filename.

Actual logging of messages is done via a set of ten functions, depending on the particulars of the message:

log(format, ...) log_debug(debuglevel, format, ...) log_perror(format, ...) log_perror_debug(debuglevel, format, ...) module_log(format, ...) module_log_debug(debuglevel, format, ...) module_log_perror(format, ...) module_log_perror_debug(debuglevel, format, ...) fatal(format, ...) fatal_perror(format, ...)

Of these, the first eight (all but fatal() and fatal_perror()) are implemented as macros defined in log.h which call the do_log() function. The module_ functions are intended for use in modules, and insert the module name before the log message (using the MODULE_NAME macro, as described in section 4-2-2); the _perror functions append a system error string to the end of the message, like perror(); and the _debug functions allow a minimum debug level for the message to be specified (and also cause the string "debug:" to be inserted at the beginning of the message if the given level is greater than zero). To illustrate the format of messages written by these functions, a sample message from module_log_perror_debug(1,"message") might be:

[Jan 01 12:34:56 2000] debug: (module_name) message: System error

When running in debug mode, the time is printed to microsecond resolution.

Implementation note: One reason these are implemented as macros is to avoid GCC warning about log() conflicting with the built-in mathematical function log(); another is to make it unnecessary for modules to have to manually specify MODULE_NAME when calling the module logging functions.

The last two logging functions, fatal() and fatal_perror(), are intended for conditions under which a catastrophic failure cannot be avoided; they write the given message to the log file (prefixed by a timestamp and the string "FATAL:"), send a WALLOPS message to the remote server if connected, and call exit(1) to abort the program, without performing any of the ordinary cleanup procedure.

All of the logging functions above first check whether the log file needs to be rotated, by calling check_log_rotate() if the log file is open. This function calls gen_log_filename() and compares the result against the name of the currently open log file; if the filenames differ, then the current log file is closed, and a new one is opened with the new name.

Actual writing to the log file is performed by the vlogprintf() function, which can be thought of as a vfprintf() with an implied file parameter (the log file). There are also logprintf() and logputs() functions, which similarly function as fprintf() and fputs() do. (Note that logputs() does not output a trailing newline, like fputs() and unlike puts().) These functions first write the given string to standard error if the program is running in no-fork mode (from the -nofork command-line option). The string is then written to the currently open log file; if no log file is open but the memory buffer is available (and not full), the string is written to the buffer instead.

2-5. Message sending and receiving

In order to operate, Services must be able to send messages to and receive messages from the remote IRC server. While each IRC server has its own idiosyncrasies (which are handled by IRC protocol modules, as described in section 5), all share the same text-based, line-oriented format described in RFC 1459, and the Services core includes several functions for handling basic message sending and receiving.

2-5-1. Sending messages

Message sending routines are located in send.c and send.h. The most basic of these is vsend_cmd() (and its companion send_cmd()), which takes an optional message source, a message format string, and format arguments, and formats them into an IRC message which it then sends to the remote server. (These functions would have been better named [v]send_msg(), but c'est la vie.) For example, a PING message could be sent to the remote server with:

send_cmd(NULL, "PING :%s", ServerName);

Since it would be overly repetitive to write out the entire message format every time a message was to be sent, and since some protocols use different message formats for some messages, there are a number of shortcut routines which send a certain type of message to the remote server. For example, PRIVMSG, NOTICE, and WALLOPS messages can be sent using the routines of the same names:

void privmsg(const char *source, const char *dest, const char *fmt, ...): Sends a PRIVMSG message from source to dest.
void notice(const char *source, const char *dest, const char *fmt, ...): Sends a NOTICE message from source to dest.
void wallops(const char *source, const char *fmt, ...): Sends a WALLOPS message (or an equivalent message for the protocol in use) to the network.

Of these, NOTICE is used most commonly by far, and it has several variations of its own:

void notice_list(const char *source, const char *dest, const char **text): Sends each string in the NULL-terminated array text as a NOTICE message from source to dest.
void notice_all(const char *source, const char *fmt, ...): Sends a NOTICE message from source to all clients on the network.
void notice_lang(const char *source, const User *dest, int *index, ...): Sends a NOTICE message from source to dest, taking into account the language preference of the target client and splitting the text into separate messages at newline boundaries.
void notice_help(const char *source, const User *dest, int *index, ...): Sends a NOTICE message from source to dest like notice_lang(), but also replaces each occurrence of %S (with an upper-case "S") in the format string with the value of source.

The latter two functions, notice_lang() and notice_help(), take advantage of multilingual support (see section 2-8-2) to send messages in the user's selected language; in this case the destination is passed as a User structure (see section 2-6-2) rather than a string. These functions also process printf()-style formatting tokens in the specified message.

Other sending functions include:

void send_channel_cmd(const char *source, const char *fmt, ...): Sends a message that changes a channel's status. Some protocols do not allow pseudoclients to change channel status directly, and substitute the server name for the nickname given in the source parameter.
void send_cmode_cmd(const char *source, const char *channel, const char *fmt, ...): Sends a MODE message for a channel. The format string should start with the mode parameter ("+..." or "-...").
void send_error(const char *fmt, ...): Sends an ERROR message to the remote server, and disconnects.
void send_namechange(const char *nick, const char *newname): Sends a message to change the "real name" of a pseudoclient. Not supported by some protocols.
void send_nick(const char *nick, const char *user, const char *host, const char *server, const char *name, const char *modes): Sends messages to introduce a client to the network.
void send_nickchange(const char *nick, const char *newnick): Sends a message to change the nickname of a pseudoclient.
send_nickchange_remote(const char *nick, const char *newnick): Sends a message to change the nickname of a client on another server. Not supported by some protocols.
void send_pseudo_nick(const char *nick, const char *realname, int flags): Introduces a pseudoclient to the network.
void send_server(): Sends the initial messages required upon connection to the remote server.
void send_server_remote(const char *server, const char *desc): Sends a message to introduce a new (fake) server to the network.

Note that some of these functions are actually implemented by protocol modules, as described in section 5. This means that they may not have exactly the same result on different protocols (for example, send_nickchange_remote() won't do anything if the protocol doesn't support remote nickname changing), and that the functions cannot be used before a protocol module is loaded (any attempt to do so will cause the program to abort).

send.c also defines several variables used to indicate characteristics of the protocol in use:

protocol_name: A string describing the protocol.
protocol_version: A string describing the versions of the protocol supported by the module.
protocol_features: A bitmask of features supported by the protocol.
protocol_nickmax: The maximum nickname length supported by the protocol.

These variables are set by the protocol module in its initialization routine; see section 5-2 for details. send.c hooks into the "load module" callback to watch for a protocol module being loaded, and ensures that the protocol has set all protocol variables and functions properly. Implementation note: since there is nothing to specify that a particular module is a protocol module, the function simply assumes that the first module loaded is a protocol module.

2-5-2. Receiving messages

Message reception is handled by the socket callbacks readfirstline_callback() and readline_callback(), which are called when a line of text is available to be read from the network (see section 3 for a description of how socket processing works). When the socket is created, its READLINE callback is set to readfirstline_callback(); this routine reads the first line of data from the network, sets the linked global flag (if the first line was not an ERROR message) to indicate that Services has connected to the network, introduces all pseudoclients (using the introduce_user() function in init.c—see section 7-1 for details), calls the "connect" callback, and changes the socket's callback function to readline_callback() to read subsequent messages.

Each line of data read in this way is sent to the process() function, discussed in section 2-5-3 below, for parsing and processing.

2-5-3. Processing messages

Once a message has been received from the remote server, it is passed to the process() function, defined in process.c, for processing. (Technically, process() takes no arguments, and reads the message from the global inbuf variable; this approach is taken to allow signal handlers to log the current buffer in case the program crashes during processing of a message.) process() first extracts the sender, if any, from the beginning of the buffer and splits the rest of the buffer into fields in the RFC 1459 style—note that there is no facility for handling protocols which do not use RFC 1459-style messages. process() then calls the "receive message" callback, which is the lowest-level method of hooking into input messages; if no callback function handles the message, it is then processed in the ordinary manner, which involves looking up the message name using find_message() and calling the message's handler function if one exists. The handler receives the message's source as a string, and the message's parameters as a count/vector pair (the command itself is not passed to the function).

find_message() is located in messages.c, along with other routines for managing messages. Message handlers are organized into tables of Message structures, each of which is a pair of a message name and a handler for that message. These tables can be registered with the message processing code using the register_messages() function, and removed again with the unregister_messages() function (the default handlers are installed by the messages_init() function). This is the method by which protocol modules (see section 5) typically handle protocol-specific messages, though in some cases it is necessary to hook into the "receive message" callback instead.

When called, find_message() searches all registered tables for a handler for the given message (case-insensitive), and returns it if found. If two or more tables have handlers for the same message, the one in the most recently registered table is used, allowing previously-installed handlers to be overridden (however, there is no facility for calling the overridden handler).

Internally, find_message() uses a doubly-linked list of message names and associated Message structures to locate messages; this list is created by init_message_list() called each time a message table is registered or unregistered. When a message is found, it is shifted one element toward the head of the list (if it is not already at the head), so that subsequent searches can find that message faster. This allows frequently-seen messages to "bubble" up to the top of the list, reducing the time spent looking up each message. (A decent hash table would probably be more efficient still, if more complex.)

The default message handlers are also defined in messages.c, and cover the basic set of IRC messages, such as PRIVMSG (which calls the "m_privmsg" callback for processing), JOIN, and SERVER. There are also entries in the table for ignored messages, such as NOTICE and PONG, with no handler specified; these are present to prevent process() from logging an "unknown message" warning.

Noticeably absent from the message table are NICK and USER. Client registration is one of the greatest points of difference between IRC protocols, and attempting to use the default RFC 1459 method does not work on most modern protocols, so handling of these messages is left entirely to the protocol module. As a corollary, if the protocol module does not handle these (or whatever other message may be used for introducing clients to the network), Services will not be able to recognize any clients.

2-5-4. The ignore list

In order to provide some measure of protection against users "spamming" Services with messages in order to cause a denial of service, the default PRIVMSG handler includes logic which keeps track of how much load each client is placing on Services, and ignores PRIVMSG messages from clients who exceed a certain threshold. The ignore data is not kept in a "list" per se, but is instead stored as part of each User structure (see section 2-6-2); the ignore_init() routine initializes the fields in this structure used for ignore data, and the ignore_update() routine updates the fields after work has been done on behalf of the client. Both of these routines are contained in ignore.c.

The "ignore value" for a client, stored in the ignore field of the User structure, is calculated roughly as the average over time of a function whose value is 1 when Services is executing code on behalf ot the client and 0 at all other times, with recent values given more weight. Rather than keep track of the exact time values, however, a decaying average is computed, with the value decaying by half every time interval specified by the IgnoreDecay configuration directive. If this average value exceeds the threshold specified by IgnoreThreshold, the PRIVMSG handler will ignore the client's message.

This approach is obviously limited in its effectiveness—for example, it cannot deal with a botnet or other large group of clients attacking Services if each client stays below the ignore threshold—but it can serve as a first line of defense against malicious users.

2-6. Servers, clients, and channels

Since Services connects to the IRC network as a server, it must keep track of the IRC network's state—the servers, clients, and channels on the network—just as other servers do. This section discusses the way in which this state is stored and the routines used for managing it.

2-6-1. Servers

Server management routines are contained in servers.c and servers.h. The important routines (other than initialization and cleanup) are do_server() and do_squit(), which are called from their respective message handlers to add and remove servers. The server records themselves are stored in a hash table, created using the hash.h header file. Each record is a Server structure, containing the server's name, network join time, and client list (see section 2-6-2 below); there is also a flag indicating whether the record represents a real server on the network or a fake server created by Services (such as with the OperServ JUPE command). Services also creates a fake server record for itself, using the empty string for the server name.

When adding a server, parent/child links are established between the originating server and the new server. This allows easy removal of an entire subtree of servers when an SQUIT message is received: do_squit() calls squit_server() on the quitting server, which calls recursive_squit() to delete any child servers (from Services' point of view) before deleting the quitting server itself. In turn, recursive_squit(), as its name suggests, recursively calls squit_server() for each child server. Since the IRC protocol mandates a tree structure for the network—cycles are not permitted—there is no danger of infinite recursion.

In addition, there are some protocols which do not explicitly send QUIT messages for clients on disconnecting servers, a bandwidth-saving feature commonly known as "NOQUIT" from the token used in protocol negotiation to indicate that the feature is available. If the protocol module signals, via the PF_NOQUIT protocol feature flag (see section 5-2), that it supports this feature, squit_server() will take care of removing all clients on each deleted server before deleting the server itself.

2-6-2. Clients

Clients are also called "users" in IRC. It is common to use the term "client" to refer to any program which connects to an IRC server and "user" to refer in particular to a human operating a client (or a client operated by a human), but through an unfortunate choice of terminology, clients are called "users" in the Services source code. (In this documentation, the term "client" is used to refer to an IRC client on the network, while "user" refers more generally to the human controlling a client.)

Be that as it may, each client (user) that connects to the network is given a User record, as defined in users.h, which is managed by code in users.c; as with server management, users.c uses a hash table to hold the client records. Each client record contains:

The client's nickname, username, hostname, and "real name" strings. "Fake hostname" (the hostname shown to non-operator clients) and IP address are also recorded for protocols which support them.
Next/previous links for linking all clients on the same server (the snext and sprev fields).
Nickname registration data (see section 7-3-1-1).
The client's connection timestamp as passed from the remote server, as well as the local timestamp when Services received the client registration message. There is also a "Services stamp" field, a unique integer assigned by Services and maintained by the IRC servers (if supported) to identify clients across netsplits; this is used by the NickServ pseudoclient in nickname authentication.
The client's IRC modes, and flags used by Services.
The client's ignore data (see section 2-5-4).
Various counters and timers, such as for counting bad passwords.
An array of registered nickname group IDs for which the client has identified (used by NickServ).
A list of channels the client is currently in.
A list of channels the client has identified for (used by ChanServ).

As can be seen above, the User structure contains a few fields which are used only by modules. From a design standpoint, these fields would be ideally stored in separate tables set up by those modules, but they are aggregated into the User structure for the sake of convenience. The core code does not access any of the module-based data, with the exception of the multilingual code, which uses the language setting stored in the nickname's registration data, if any (see section 2-8-2).

The primary interface into the client management code is through the IRC message processing functions, as with servers. In the case of clients, there are several messages that need to be handled: NICK, MODE (for a client target), KILL, QUIT, JOIN, PART, and KICK. (The latter three are technically channel-related messages, but as they also operate on client data, they are handled here and call functions from the channel management subsystem to do their work.) As some of these messages have large parts in common, they often call local subroutines to perform their work: QUIT and KILL both use quit_user() to clean up after the departing client, and PART and KICK call part_channel() to remove the client from the specified channel. (The infamous "JOIN 0" message, which causes a client to leave all channels it is currently in, is similarly implemented.)

In addition to the processing functions, users.c provides a number of client-related utility functions. These include informational functions that return a client's status on the network or on a particular channel:

int is_oper(const User *user): Returns whether the client is an IRC operator.
int is_on_chan(const User *user, const char *chan): Returns whether the client is on the specified channel.
int is_chanop(const User *user, const char *chan): Returns whether the client is a channel operator on the specified channel.
int is_voiced(const User *user, const char *chan): Returns whether the client is voiced on the specified channel.

Functions for handling user/host masks:

int match_usermask(const char *mask, const User *user): Returns whether the client's username and hostname information match the given user@host mask.
void split_usermask(const char *mask, char **nick, char **user, char **host): Returns the nickname, username, and hostname parts of a mask as separate strings.
char *create_mask(const User *user, int use_fakehost): Creates a new mask based on the client's username and hostname information, returning it as a malloc()'d string.

And functions related to guest nicknames:

char *make_guest_nick(): Creates a new guest nickname. The returned nickname is stored in a static buffer.
int is_guest_nick(const char *nick): Returns whether the given nickname is a guest nickname.

2-6-3. Channels

Channels are managed by the channels.c and channels.h files, which, like servers and clients, store records for each channel in a hash table. Channel records contain:

The channel's name.
Channel registration data (see section 7-4-1-1).
The channel's creation timestamp.
The channel's topic, along with the nickname of the client which set the topic and the time the topic was set.
Mode information for the channel, including a bitmask of binary modes and fields for non-binary modes and ban/exception/invite lists. Not all fields are used by all protocols; also, the meaning of the bits in the modes field changes with the protocol (see section 2-6-4 below).
A list of clients on the channel, with channel user modes (such as +o and +v) and local flags for each client.
Fields used to check for "bouncy modes" (see below).

As discussed in section 2-6-2, the channel messages JOIN, PART, and KICK are processed by the client management subsystem; the handlers for those messages call chan_adduser() and chan_deluser() for the channel side of the processing. The only messages processed entirely by channels.c are the pure channel messages MODE (for a channel target) and TOPIC.

One problem that can occur with channels is that, possibly due to a misconfiguration, the remote server (or another server on the network) does not allow Services to change the channel's modes; this can result in an infinite loop of channel mode changes taking place. For example, ChanServ might request that the mode +s be set on a channel, but when Services sends that mode change out, the remote server immediately counters it with a -s. ChanServ, believing that its +s actually took effect before the -s was sent, sends out a new +s, and the cycle continues. This can result in flooding of the link between Services and its remote server and make the channel unusable due to the neverending mode changes.

To avoid this problem, the MODE message handler watches for identical mode changes from any server, and counts the number of changes that occur per second. ChanServ uses this in its mode-setting routine (check_modes(), described in section 7-4-1-3) to decide whether to attempt to change modes on the channel. Implementation note: While the count of mode changes per second and the "bouncy modes" flag is kept on a per-channel basis, the mode string itself is stored in a single static variable, so bouncy modes may not be detected if they occur on multiple channels at the same time.

The MODE message handler also keeps track of multiple channel user mode changes for the same client within the same MODE message, and aggregates them so that the "channel umode change" callback is only called once per client per message.

Aside from message processing, channels.c also provides the function chan_has_ban(), which returns whether a given ban mask (case-insensitive) exists on a particular channel.

2-6-4. Client and channel modes

Services keeps track of binary modes for clients and channels using flag bitmasks rather than strings, for efficiency. In order to convert between the mode characters used by the IRC protocol and these flag values, the modes.c source file (along with its companion header, modes.h) provides several utility routines for handling client and channel modes:

void mode_setup(): Initializes internal tables (see below).
int32 mode_char_to_flag(char c, int which): Converts a mode character to the corresponding flag value.
char mode_flag_to_char(int32 f, int which): Converts a single mode flag to the corresponding character.
int32 mode_string_to_flags(const char *s, int which): Converts a string of mode characters to a set of flags.
char *mode_flags_to_string(int32 flags, int which): Converts a set of mode flags to a string of mode characters (the returned string is stored in a static buffer).
int mode_char_to_params(char c, int which): Returns the number of parameters used when setting or unsetting a mode, as plus_params<<8 | minus_params (see the ModeData structure description below).
int32 cumode_prefix_to_flag(char c): Converts a nickname prefix character to the corresponding channel user mode flag.

The which parameter passed to most of these functions indicates what type of mode is being used: MODE_USER for client modes (such as +o for IRC operator status), MODE_CHANNEL for channel modes (such as +s for secret channels), and MODE_CHANUSER for "channel user modes"—channel modes that are applied to clients on the channel rather than the channel itself, such as +o for channel operator privileges. (Channel modes and channel user modes share the same set of mode characters in the IRC protocol, but Services treats them separately, hence there is a separate mode selection constant for them.)

In the case of channel modes, not every mode is a simple binary on/off flag; for example, +l (limit) takes an integer parameter that specifies the client limit on the channel, and +b can be specified multiple times. These are handled by additional fields in the channel record for each of these "special" modes. When there is a distinction between "set" and "unset" for the mode (such as +l/-l), it is also given a flag value, but modes such as +b that can be set multiple times are not given a flag at all.

While these functions are able to handle the standard IRC mode characters (client modes oiw, channel modes biklmnpst, and channel user modes ov) by default, many IRC protocols introduce additional modes specific to that protocol. In order to support these modes as well, modes.c exports three arrays into which mode information can be written: usermodes[], chanmodes[], and chanusermodes[], for user (client), channel, and channel user modes, respectively. These are each arrays of 256 elements, where each element of the array contains information about the character with the value of the corresponding array index (for example, the element with index 65 has information about the mode character "A"). The type of each element is a ModeData structure, containing:

int32 flag

The flag value assigned to the mode (must be unique among all modes of the same type). Can be 0x80000000 (MODE_INVALID) if no flag is to be used for the mode, as is generally the case for modes like the channel mode +b which can be set multiple times.

uint8 plus_params

The number of parameters used when adding the mode.

uint8 minus_params

The number of parameters used when removing the mode.

char prefix

For channel user modes, the nickname prefix character for the mode (such as "@" for "+o").

uint32 info

Zero or more informational flags about the mode:

MI_MULTIPLE: The mode can be set multiple times (like +b on channels).
MI_REGISTERED: The mode should be set on clients with registered nicknames, or on registered channels.
MI_OPERS_ONLY: The mode causes a channel to be limited to operators only.
MI_REGNICKS_ONLY: The mode causes a channel to be limited to clients with registered nicknames only.

The upper eight bits of the info field are reserved for local use by modules, which can take advantage of this to decouple behavioral logic from actual mode characters. See the description of the Unreal protocol module in section 5-6-14 for an example of using such local information flags.

Whenever any changes are made to the mode arrays, the mode_setup() routine must be called to update the internal lookup tables used by the conversion functions. (mode_setup() must also be called once before using any of the functions, even when the arrays are not modified; but since init() takes care of this call, it is not a concern in practice.)

2-6-5. High-level actions

The actions.c source file provides several routines which implement common operations on clients and channels. While the core code does not make use of these routines, they are provided to simplify pseudoclient code and reduce redundancy. The routines are:

int bad_password(const char *service, User *u, const char *what)

Logs a bad password attempt for a client. The client is sent a "password incorrect" message, and the bad password count is incremented, first being cleared if at least the interval specified by BadPassTimeout has passed. If the bad password count reaches the limit specified by BadPassLimit, the client will be killed (disconnected from the network), and a warning will be sent when the count reaches one less than the limit. Returns 1 if the client was warned, 2 if killed, 0 otherwise.

void clear_channel(Channel *chan, int what, const void *param)

Clears modes and/or clients from a channel, depending on the what and param parameters. what can be any of the following flags:

CLEAR_MODES: Remove all regular channel modes, like +s and +l. param is ignored.
CLEAR_BANS: Remove all channel bans. If param is not NULL, it is treated as a User *, and only bans matching that client are cleared.
CLEAR_EXCEPTS: Remove all channel ban exceptions. param is treated as for CLEAR_BANS.
CLEAR_UMODES: Remove channel user modes. param is cast to uint32 and treated as the set of mode flags to clear.
CLEAR_USERS: Kick all clients from the channel. param is treated as a char * containing the reason to use in the kick message, and must not be NULL.

More than one of these flags can be combined, but for obvious reasons, this only works when the same param value can be used for all flags; CLEAR_UMODES cannot be used with CLEAR_BANS or CLEAR_EXCEPTS, for example. There is also a callback provided by this function, named "clear channel", which takes precedence over the standard processing (CLEAR_EXCEPTS in particular must be implemented via callback since it is not an RFC-standard feature).

const char *set_clear_channel_sender(const char *newsender)

Sets the name (typically a pseudoclient nickname) used as the sender in messages generated by clear_channel(), and returns the old sender name. If newsender is NULL, the server name is used as the sender, which is the default behavior. newsender can also be specified as PTR_INVALID to retrieve the current sender name without changing it.

void kill_user(const char *source, const char *user, const char *reason)

Kills the specified client from the IRC network. source is used as the message sender, and reason is the reason string for the KILL message.

void set_topic(const char *source, Channel *c, const char *topic, const char *setter, time_t t)

Sets the topic on the specified channel. This function calls a callback ("set topic") to perform its work, since different protocols have different methods for setting channel topics; in particular, some protocols ignore channel topic changes depending on the timestamp used in the TOPIC message.

void set_cmode(const char *sender, Channel *channel, ...)

Sets modes on a channel, including channel user modes. This function accumulates multiple mode changes for the same channel in a buffer, only sending out a MODE message when instructed to or when necessary (when the number of parameters to the message exceeds the maximum of 6, or when the number of channels with cached mode changes exceeds the limit set by MERGE_CHANMODES_MAX in defs.h). The routine also tries to be "smart" about multiple changes for the same mode, and if it sees a mode change that renders a previous change meaningless, it will remove that previous change without ever sending it to the network. There are two ways to call the function:

set_cmode(sender, channel, modes, param0, param1, ...): Ordinary operation, with a non-NULL value for sender. modes is passed as an ordinary mode change string, such as "+nst-il+k", and any parameters for the modes are passed in the order the mode characters appear in modes, as for an IRC MODE message. Note that all parameters, including numeric parameters, must be passed as strings.
set_cmode(NULL, channel): Flush out accumulated mode changes for the given channel. If channel is also NULL, accumulated mode changes for all channels are flushed out.

Internally, set_cmode() uses a modedata structure to keep track of each set of mode changes for a channel. Simple binary modes are stored as sets of flags to add and remove (binmodes_on and binmodes_off), while modes that take parameters are accumulated in two arrays: opmodes[], containing each mode character prefixed by a + or -, and params[], containing the parameters for each mode (space-separated if the mode takes two or more parameters). For mode-with-parameters number n, opmodes[2*n] is either "+" or "-", opmodes[2*n+1] is the mode character itself, and params[n] is the string holding the mode's parameters.

set_cmode() processes each mode character in turn, watching for "+" or "-" to indicate whether the modes are being added or removed. Binary modes are simply added to the appropriate binmodes field; modes with parameters are appended to the opmodes[] and params[] arrays, with a flush being performed if the number of mode parameters exceeds the RFC-defined limit of 6 or the total length of the parameters exceeds a maximum derived as the RFC line length limit of 510 characters less the maximum length of the other parts of the MODE message.

If the MergeChannelModes option is set, set_cmode will also set a timeout for flushing the modes out to the network (see section 2-7 for details on timeouts).

When modes need to be flushed out, whether due to a full message, a timeout, or a manual flush with sender==NULL, the flush_cmode() routine is called. This routine collects all accumulated modes into a string: binary modes are written first (with mode removals before mode additions, since at least some IRC server software ignores a +s when sent after a -p), followed by modes with parameters in the order they were accumulated. The generated MODE message is then sent out to the network, and the modedata structure is cleared so that it can be reused.

2-7. Timed events

While most events in Services happen synchronously, such as in response to a message from the IRC network, it is sometimes necessary to schedule actions to occur at a later time. This is accomplished through the use of timed events, called timeouts. Timeouts are implemented by the source files timeout.c and timeout.h.

To create a new timeout, call add_timeout() or add_timeout_ms(), both of which function identically except that the former uses units of seconds while the latter uses milliseconds (in fact, add_timeout() is implemented in terms of add_timeout_ms()). These functions take a pointer to the routine to be called when the timeout expires, and a repeat flag which, if nonzero, causes the timeout to restart at the given delay value when it expires (normally, the timeout is removed after it expires and the timeout routine is called). The functions return a pointer to a Timeout structure, which is also passed to the timeout routine when it is called. If the caller needs to pass extra data to the timeout routine, the data field can be set to an arbitrary value; this must be done before the routine that added the timeout returns (more precisely, before the next time check_timeouts() is called) in order to avoid a race condition. Implementation note: The maximum delay on a timeout is 2^31-1 milliseconds, or about 25 days. This is due to the 32-bit width of the time_msec() return value; any time 2^31 milliseconds or more in the future will be treated as in the past due to signed difference comparison.

Since Services is a single-threaded program, timeouts have to be checked for manually at periodic intervals. This is done by calling the check_timeouts() function; the main loop of Services calls this at intervals specified by the TimeoutCheck configuration directive. (As a consequence, timeouts will only have a resolution equal to TimeoutCheck—if TimeoutCheck is specified as 3 seconds, for example, a timeout specified for 1 millisecond can still take up to 3 seconds to be executed.) Upon being called, check_timeouts() iterates through the linked list of timeouts, checking for any that have expired. For such timeouts, the timeout's expiration routine is called, passing a pointer to the Timeout structure as a parameter, then the timeout is either restarted (if it is specified as a repeating timeout) or deleted.

The timeout routines guarantee that a new timeout added from a timeout routine will not be checked during that run of check_timeouts(). Internally, this is ensured by always adding new timeouts at the head of the timeout list.

If a timeout needs to be deleted before it expires, the del_timeout() function can be used. This function can also be used to delete a repeating timeout from the timeout's own expiration routine (or even a non-repeating timeout, though since the timeout would be deleted anyway it would serve no purpose). To avoid dangling pointers, del_timeout() does not actually delete any timeouts during a check_timeouts() run; instead, it clears the timeout's internal timeout field (which holds the time in milliseconds when the timeout is set to expire) to zero, causing check_timeouts() to delete the timeout when it is reached during list iteration.

There is also a function send_timeout_list(), intended for debugging (and only available if DEBUG_COMMANDS is defined), which sends the current timeout list to the given client as NOTICE messages.

2-8. Multilingual support

In order to provide a more natural interface for users who speak different languages, Services includes the ability to send messages in any of several languages. This functionality is implemented in the language.c and language.h source files, and the actual text used for each language is stored in the lang subdirectory.

2-8-1. Overview

The multilingual support system in Services uses tables of strings to accomplish "translation" of messages. These tables are indexed by named constants, listed in the automatically-generated langstrs.h header file (see section 10-3-3 for details on how this file is generated); a routine that wants to take advantage of multilingual support can then use the appropriate message index to obtain a translated message.

There is no on-the-fly translation, of course; all messages used with multilingual support must be prepared ahead of time. These messages are stored in data files in the lang subdirectory, which are precompiled for efficiency and loaded into Services at runtime. It is also possible to add new strings to the string tables, or modify the precompiled strings, on the fly; however, all such strings must likewise be prepared ahead of time (or otherwise generated by the code calling the multilingual routines).

2-8-2. Using multilingual strings

Before being used, the multilingual support subsystem must first be initialized by calling lang_init(). This routine sets up the internal string tables, then loads each language's precompiled data file from the languages subdirectory of the data directory; the filenames are hardcoded in the filenames[] array. The lang_cleanup() function takes care of freeing these resources.

The precompiled data files consist of a string count, a table of offsets to the strings, and the strings themselves separated by null bytes. For the sake of efficiency, the entire string data is loaded into a single block of memory, a pointer to which is stored at index 0 of each language's string table; pointers to individual strings are then calculated from the string offset table in the file, and the pointer to the text for string index n is stored at array index n+1. Implementation note: This is admittedly a rather confusing approach, and probably came about from reluctance to introduce yet another file-scope array.

Once the language data has been loaded, several functions are available to make use of multilingual strings. The simplest of these is getstring() and its companion getstring_lang(), which simply return the text associated with the selected string index. The getstring() routine uses a nickname group record, NickGroupInfo *, to select the language, while getstring_lang() uses the language index directly. (Language index values can be obtained with the lookup_language() function, using the name of a language—the same as the precompiled data file name; see also section 2-8-4—to retrieve the index, or the LANG_* constants in language.h can be used directly.)

Implementation note: With the exception of getstring_lang(), all of the string retrieval functions use nickname group records rather than language index values to select the language. This stems from the fact that users' language preferences are recorded as part of the nickname group record for the user's registered nickname, but is arguably poor design, with core code relying on the internal data structure of a module.

For obvious reasons, the strings returned by getstring() are generally used in messages sent to users. For this reason, the routines notice_lang() and notice_help() are provided by the send.c source file to simplify the operation of retrieving a string in the user's selected language and sending it as a NOTICE. The routines can handle newlines in the strings as well, breaking the strings up into separate NOTICE messages at the newlines. The difference between the functions is that notice_help() replaces the "%S" token in the string (a capital S, not a lowercase s) by the nickname used to send the messages; this makes it easy to refer to a pseudoclient's own nickname in help messages.

Both notice_lang() and notice_help() treat the strings returned by getstring() as printf()-style format strings, and accept variadic argument lists in the same style as the basic notice() function. This does, however, present a problem for multilingual support: different languages have different word orders, while the order of format parameters cannot be changed around to match the language. Translators must therefore be careful to maintain the same token order when translating strings, though this may result in slightly unnatural text.

One other shortcut function for sending messages to users is the syntax_error() function. This function takes a command name and a string index containing a syntax message, and sends a two-line NOTICE of the form:

-Service- Syntax: COMMAND parameter-1 parameter-2... -Service- Type /msg Service HELP COMMAND for more information.

The first line uses the SYNTAX_ERROR string, inserting the syntax string passed to the routine, and the second uses MORE_HELP, inserting the pseudoclient nickname and command name.

In addition, there are three functions which operate on time values. strftime_lang() is a multilingual version of the standard strftime() function, generating a string describing a specific date and time using the language specified by the nickname group record passed in; the format string, of course, is specified by a string index rather than a literal string pointer. The %a, %A, %b, and %B tokens are treated specially: rather than passing them directly to strftime() (which would always return the same weekday and month names), they are translated by strftime_lang(), using the STRFTIME_DAYS_SHORT, STRFTIME_DAYS_LONG, STRFTIME_MONTHS_SHORT, and STRFTIME_MONTHS_LONG strings, respectively. Each of these strings is assumed to be a newline-separated list of weekday or month names, and the appropriate name is selected based on the time passed to the function.

maketime() takes a time interval (the parameter type is time_t, but it is simply a count of seconds, not a timestamp) and and generates a "human-readable" string describing that time interval: for example, a value of 3716 seconds might be described simply as "1 hour", or as "1 hour, 2 minutes" (using the relevant language strings, as listed below). By default, maketime() does not display seconds, so an interval of less than 60 seconds will be rounded up to "1 minute"; this behavior can be changed by specifying the MT_SECONDS flag when calling the function. Also, maketime() normally only describes the interval using the largest relevant unit (days, hours, minutes, or seconds), but a second unit can be added, as in the "1 hour, 2 minutes" example above, by specifying the MT_DUALUNIT flag. The following strings are used by this function—note in particular the inclusion of spaces before the unit names, since some languages do not use them:

STR_SECOND: " second" (for exactly 1 second)
STR_SECONDS: " seconds" (for other numbers of seconds)
STR_MINUTE: " minute" (for exactly 1 minute)
STR_MINUTES: " minutes" (for other numbers of minutes)
STR_HOUR: " hour" (for exactly 1 hour)
STR_HOURS: " hours" (for other numbers of hours)
STR_DAY: " day" (for exactly 1 day)
STR_DAYS: " days" (for other numbers of days)
STR_TIMESEP: the separator between units, such as ", " (comma and space)

maketime() returns its result in a static buffer, so the string should be copied elsewhere if it will be needed at a later time, or if two or more consecutive calls are made.

The last function, expires_in_lang(), takes an expiration timestamp and returns a string describing how long it will be before that expiration time arrives. Generally, this is just the result of maketime() called with MT_DUALUNIT on the difference between the current time and the given expiration time, but if the timestamp is zero (meaning no expiration), then the EXPIRES_NONE string ("never expires") is returned instead. If the given time has already passed, it is treated as an interval of 1 second, unless expiration has been disabled via the -noexpire command-line option, in which case the EXPIRES_NOW string ("expired") is returned.

2-8-3. Modifying the string table at runtime

There are three ways to modify the string table while Services is running: string mapping, explicit string setting, and external language files.

The simplest way to change multilingual strings is through string mapping, which causes requests for one string index to return a different string. The advantage of string mapping is that a single operation changes the contents of the string for all languages; however, mapping only works with strings which are already present in the string tables.

String mapping is performed with the mapstring() routine. After calling this routine with the string index to modify (old) and the new string to return for that index (new), all requests to getstring() or other text retrieval functions for old will instead return the text of new in the same language. mapstring() returns the previous mapping of old (which will be equal to old if no previous mapping had been performed), which can be used to cancel the mapping:

static int old_STRING1 = -1; int my_init() { // ... old_STRING1 = mapstring(STRING1, STRING2); return 1; } void my_cleanup() { if (old_STRING1 >= 0) { mapstring(STRING1, old_STRING1); old_STRING1 = -1; } // ... }

Mapping is used most often to change the text of pseudoclient replies or help messages based on configuration settings or protocol features, by writing two or more versions of the message ahead of time and mapping the appropriate one to the base string as necessary; this allows a single string index to be used without having to check the relevant options or protocol features every time.

Implementation note: Unexpected results can occur if unmaps are not performed in the proper order. A solution might be to have mapstring() keep track of all maps internally and return a "mapping ID" to the caller; calling a new function unmapstring() with the mapping ID would then remove that particular mapping from the mapping stack.

When more versatility with the replacement strings is needed, and particularly when a module wants to add new strings to the string tables, the setstring() and addstring() routines can be used. setstring() allows the contents of a string in a specific language to be set to any arbitrary text, while addstring() creates a new, initially empty, string in the string table, returning the new string's index (which can then be used with setstring() and the other multilingual support functions, just as with the built-in strings).

However, a problem crops up when using addstring() to add new strings: the string's index is not constant. It is, of course, possible to export a variable containing the index to any source file that needs it, but a cleaner approach is to call the lookup_string() function with the same name given to the addstring() function, which returns the index corresponding to the given string name. lookup_string() also works with built-in strings, using the constant name as the string (so, for example, the result of lookup_string("SYNTAX_ERROR") would be the value of the constant SYNTAX_ERROR).

When a module adds many new strings to the string table, it can be inconvenient to call addstring() and setstring() for every string. To avoid this hassle, the load_ext_lang() function is provided to load "external language files", text files containing language string data. The format of these files is essentially the same as the base language source flies (see section 2-8-4 below), with the exception that the language name is also included on the line containing the string name, separated from the string name by whitespace. This function is called to load external language files specified with the LoadLanguageText configuration directive.

There is also a reset_ext_lang() routine, which clears out all changes made with setstring() and load_ext_lang(); strings added to the string table with addstring() are left in, with their contents cleared to empty strings. However, outside of the lang_init() and lang_cleanup() functions, this is only called by the reconfiguration code in init.c, and should not be called by modules.

Implementation note: It probably shouldn't be called during reconfiguration, either, since it blows away anything modules may have changed in their initialization routines. As a consequence, third-party modules should not use setstring() or load_ext_lang() at all, and should instead rely on the LoadLanguageText directive to load text for any strings that they add.

2-8-4. The language file compiler

The text files which define the base language strings are stored in the lang subdirectory of the top source code directory, one for each language: en_us.l, de.l, and so on. The format of these files is fairly simple, consisting of a series of string names followed by their contents (blank lines, and lines beginning with the "#" character, are ignored). For each string, the string name is placed alone on a line, followed by one or more lines of message text; each line of message text must begin with the tab character (ASCII code 9)—spaces are not permitted. If two or more lines are given, they are joined by linefeed (ASCII code 10, standard Unix newline) characters.

Rather than manually parsing these files each time Services is loaded, however, they are precompiled into binary data files by the langcomp program, compiled from langcomp.c in the lang subdirectory. This program reads in a language source file specified as a command-line argument, and (if no errors are encountered) writes out the precompiled binary data for that file to the same filename with the extension stripped; for example, langcomp en_us.l creates the file en_us, and so forth.

In order to ensure that the strings are written in the correct order regardless of the order in which they appear in the source file, langcomp relies on a file named index, which contains a list of all string names in the order they should be stored. (As described in section 10-3-3, this file is generated automatically from en_us.l, which is treated as the canonical language file.) If langcomp encounters a string name which is not listed in this file, it reports an error and does not generate an output file. It is also possible to get warnings on strings listed in index but missing from the language file, by passing the -w option to langcomp. Implementation note: One problem that has occurred frequently is forgetting to insert a tab at the beginning of a blank line intended to be part of a message. It might be a good idea to have langcomp -w warn about such cases as well.

To make certain that the core source code and modules also use the same string order and index values, the index file is also used to generate a header file, langstrs.h, which is included by language.h. This file contains #define directives to define the string index constants; if the preprocessor macro LANGSTR_ARRAY is defined when the file is included, then it also defines an array containing the string names as C strings (language.c uses this to implement the lookup_string() function).

2-9. Module interfaces

While many modules in Services perform their own independent functions, there are some sets of modules which implement the same functionality in alternative ways. Protocol methods are the most obvious example of this, and the core code provides an interface to these modules, as described in section 2-5-1. This section covers the core interfaces to two more such sets of modules: encryption modules and database modules. Unlike protocol modules, encryption and databases are not used by the core code, but interfaces are supplied to simplify the design of modules which do make use of them.

2-9-1. Encryption

Encryption modules are used by the Services pseudoclients to encrypt passwords, reducing the danger of passwords leaking as a result of improper access to the databases. While the encryption modules themselves can encrypt any arbitrary data, the core interface explicitly uses the context of passwords. The interface is implemented by encrypt.c and encrypt.h.

At the center of the interface is the Password structure, defined in encrypt.h. This structure contains a buffer for the encrypted password itself, along with a string pointer identifying the cipher used to encrypt the password; this allows passwords using different ciphers to be mixed in the same set of data and still be decrypted/checked, assuming a module implementing the cipher is available. The cipher may be NULL, indicating that the password is not encrypted.

Password structures can be allocated either dynamically or statically. Dynamic Password structures can be obtained with the new_password() function and freed with the free_password() function; static variables can be initialized with init_password() and cleared with clear_password(). clear_password() can also be used at any time to clear the contents of a Password structure when they are no longer needed.

Encryption of plaintext password strings into Password structures is done with the encrypt_password function. This function uses the cipher named by the EncryptionType configuration directive to encrypt the password, returning an error if no module has registered that cipher (see below). The inverse function, decrypt_password(), is also available, but should only be used when there is a need to view the plaintext password itself (for example, the NickServ and ChanServ GETPASS commands use this function). Checking whether a user-entered password is correct can be accomplished through the check_password() function without decrypting the encrypted password.

To set the contents of a Password structure to particular values, such as when reading them from an external source, call set_password(). (Make sure to zero out the original data afterwards if it will no longer be used, so a copy of the password data does not remain in memory.) When writing or storing the contents of a Password structure elsewhere, the structure members may be accessed directly; however, make certain you treat the password[] field as a binary buffer, not a string (e.g., use memcpy() rather than strcpy() to copy it), and remember to check whether the cipher field is NULL before accessing it. The copy_password() function is also available for copying data from one Password structure to another.

When a module implementing an encryption cipher is loaded, it should call register_cipher() with a pointer to a CipherInfo structure that gives the cipher's name (a string identifying the cipher for use in passwords' cipher fields and the EncryptionType directive) and the functions which implement encryption, decryption, and password checking. The module must also be certain to call unregister_cipher() with the same CipherInfo structure before being unloaded, or the encryption interface may attempt to call routines which are no longer present in memory.

Implementation note: It might be cleverer and more foolproof to define a particular identifier to be used/exported for the CipherInfo structure, then add a load-module hook and automatically register the cipher if the identifier is found, unregistering it when the module is unloaded.

2-9-2. Database storage

The files databases.c and databases.h define a common interface for storing persistent data. Modules that want to use this interface do so by defining data tables describing the data to be stored and how to access it; see section 6-2 for details.

The database interface itself is quite simple. A module wishing to have a data table stored in persistent storage calls the register_dbtable() function (and must, of course, call unregister_dbtable() for the same table before being unloaded). Registering a table will cause the table's contents to be automatically loaded from persistent storage (if no database module is available, the table load will occur when a database module is registered). This is all that the calling module needs to do; the database subsystem will take care of the rest. It is, however, important to note that unregistering a database table will not cause it to be written out; if a write before close is desired, it must be done manually as described below.

Actual writing of the registered database tables to persistent storage is accomplished by calling the save_all_dbtables() function. The main loop of Services calls this function periodically, as dictated by the UpdateTimeout configuration setting; it is also called directly by the OperServ UPDATE command.

Database modules register themselves with the core database interface by calling register_dbmodule(), providing a DBModule structure giving the various routines that implement the database opreations. Unlike encryption modules, only one database module can be active at a time; if a second database module tries to register itself, register_dbmodule() will return failure. As usual, the companion function unregister_dbmodule() must be called upon module unload.

2-10. Module command list maintenance

There is one other facility provided by the core code for the use of modules: a command lookup system, implemented by commands.c and commands.h, which pseudoclients can use to execute user commands.

Each command to be handled by this facility must have a Command record defined for it. This is a structure that contains the command name (command names are treated case-insensitively); a function pointer for the routine to call to run the command, which is passed the User structure for the user giving the command; an optional function pointer to a function which returns a boolean value indicating whether the user is allowed to execute the command, such as is_oper(); help message numbers and parameters; and a next field, used to link multiple records for the same command name together (this will be set when the command is registered, and need not be set by the caller).

To allow multiple modules to utilize this facility without their command lists interfering, a command list ID is passed to each function to identify which set of commands to look at. The type of this ID is Module * (the type of a module handle, as described in section 4), and typically each module will pass in its own module handle, or possibly the handle of its "parent" module, for example in the case of the NickServ submodules like nickserv/access which add to the main NickServ command set.

In order to use these functions, the command list must first be initialized; this is done by calling new_commandlist() with the desired command list ID. The ID must be unique, and the function will fail if another command list with the same ID already exists. Once the command list has been successfully created, commands can be registered with the register_commands() function. This function takes an array of commands to register, terminated by an entry with the name field set to NULL.

If a module needs to remove commands, such as when cleaning up, it can use the unregister_commands() function; the same array pointer as was passed to register_commands() must be used here as well (it is not possible to unregister only part of a command array). Likewise, the entire command list can be deleted with del_commandlist(); before the command list can be deleted, however, all commands must have been removed from it, or the function will fail.

There are three functions that make use of these command lists. The simplest is lookup_cmd(), which simply looks up a command by name and returns a pointer to the Command structure, or NULL if the command is not found. This can be used, for example, if a command record needs to be modified at runtime (many of the built-in pseudoclients use this to change help messages depending on the runtime and IRC network environment). If there are multiple commands with the same name registered in the same command list, this function will return a pointer to the one most recently registered; the next field can be used to examine other records for the same command name.

A more useful function is run_cmd(); this looks up the given command name in the same manner as lookup_cmd(), then calls the command's processing routine with the calling user's User record passed as a parameter. However, if the command has a privilege-checking routine (has_priv) defined, run_cmd() first calls that routine, and if it returns false (zero), a "permission denied" error is sent to the user and the command is not executed. If the given command name is not found, a NOTICE to that effect is sent to the user instead. Implementation note: The reason additional parameters to the processing routine are not supported is because all current command routines extract their parameters using strtok(NULL,...). This is clearly poor design, and could be improved by, for example, making run_cmd() into a variadic function and passing the va_list to the command routine, or even just by adding a void * parameter to run_cmd() and the command routine prototype.

The last command-related function is help_cmd(). It takes the same parameters as run_cmd(), but rather than executing the command's processing routine, it uses the help message fields of the command record to send a help message to the user (again, informing the user if the command name is not found in the command list). Three help messages can be specified for each command: helpmsg_all, which is displayed to all users; helpmsg_reg, which is displayed after helpmsg_all to users who are not IRC operators; and helpmsg_oper, which is displayed after helpmsg_all to IRC operators. These fields are all message index numbers for use with the multilingual routines, not literal strings. If any field is -1, the corresponding help message is not displayed, so (for example) a command that does not have any special help for IRC operators can use -1 in both the helpmsg_reg and helpmsg_oper fields.

It is also possible to specify format parameters for the help messages, as long as they are strings; the four string values help_param1 through help_param4 will be passed to notice_help() to fill in "%s" tokens in the help message. If more complex processing or other parameter types are required, help_cmd() cannot be used; the module will have to send the proper help text out itself.

Previous section: About this manual | Table of Contents | Next section: Communication (socket) handling