IRC Services Technical Reference Manual

6. Database handling

6-1. Databases in Services
6-2. The database subsystem interface
    6-2-1. Tables, records, and fields
    6-2-2. Registering and unregistering tables
    6-2-3. Loading and saving data
6-3. Database modules
6-4. Specific module details
    6-4-1. database/standard
        6-4-1-1. Data format
        6-4-1-2. Module structure
    6-4-2. database/version4
        6-4-2-1. Data format
        6-4-2-2. Module structure
6-5. Auxiliary source files
    6-5-1. fileutil.c, fileutil.h
    6-5-2. extsyms.c, extsyms.h

Previous section: IRC server interface | Table of Contents | Next section: Services pseudoclients

6-1. Databases in Services

As with any program that handles large amounts of data, Services needs a place to store nickname, channel, and other data. In Services, the primary data storage method is in-memory lists and tables; however, since these disappear when Services terminates, a more persistent method of recording the data is required. This is implemented through the database subsystem, briefly touched on in section 2-9-2.

The primary reason for the use of such a two-layer structure is because of the history of Services; as it was originally designed only for use on a small network, the effort required to implement Services using a true database management system was seen as excessive compared to the simplicity of accessing data structures already in memory. As a result, little thought was given to the structure or accessibility of the persistent data files, which were seen as only an adjunct to the in-memory structures. While this served well enough for a time, the system's inflexibility proved cumbersome as more data was stored, and the file format's opaqueness caused trouble for other programs attempting to access the data.

The latter problem of opaqueness was mostly resolved with the addition of XML-based data import and export modules (misc/xml-import and misc/xml-export, described in section 8-4). The database system itself remained an issue through version 5.0, but has been redesigned for version 5.1 to allow significantly more flexibility in storing data, as described below. (The two-layer style has been retained, however, primarily due to the difficulty of changing it—a complete rewrite of Services would be required.)

6-2. The database subsystem interface

In Services (as in any typical database system), data to be stored in databases is organized into tables, records, and fields. However, this organization is separate from the in-memory representation of the data: rather than storing the actual data itself, the "tables" handled by the database system hold information on how to access the data. The actual operations of reading data from and writing data to persistent storage are then performed using this information, along with utility routines provided by the table's owner.

The core part of the database subsystem is in the source files databases.c and databases.h.

6-2-1. Tables, records, and fields

A table, as used by the database subsystem, is defined by a DBTable structure, which contains information about the fields used in the table and utility routines used to create, delete, and access records in the table. The structure is defined (in databases.h, along with all other database-related structures and declarations) as follows:

const char *name: The name for the table. This is used to identify the table to the database system, and is generally used as a filename or other identifier for the copy of the table in persistent storage. The name must be unique among all registered tables.
DBField *fields: A pointer to an array of DBField structures describing the fields in the table, terminated by an entry with DBField.name set to NULL.
void *(*newrec)(): Returns a newly allocated record to place data in. This function is guaranteed to never be called for more than one record simultaneously (in other words, a call to newrec() is guaranteed to be followed by a call to either insert() or freerec()), so this routine may return a pointer to a static buffer instead than actually allocating memory if doing so is more convenient (see the description of the nickserv/access module in section 7-3-2 for an example of such usage).
void (*insert)(void *record): Inserts a record into the table. This function is called by the database subsystem to insert a new record into the table after it has been successfully loaded. The record passed in is no longer valid after the function returns.
void (*freerec)(void *record): Frees resources used by a record. This function is called by the database subsystem if an error occurs while loading a record, before the record has been inserted into the table.
void *(*first)(): Returns a pointer to the first record in the table.
void *(*next)(): Returns a pointer to the next record in the table after the last one returned by first() or next().
int (*postload)(): Called by the database subsystem after all records have been loaded. This can be used to, e.g., implement data integrity checks which can only be performed after all data has been loaded. If the routine returns zero, the load operation is treated as a failure. This field may be NULL if no post-load routine is required.

As can be seen from this structure, the actual records themselves are not stored in the DBTable structure, but are rather left to the table's owner to store as appropriate. For example, the ChanServ pseudoclient module stores the data for each record in a ChannelInfo structure.

The field data, stored in DBField structures, likewise does not hold actual data, only instructions on how to access it. The DBField structure contains:

const char *name

The name for the field. Typically the same as the field identifier used in the program.

DBType type

The type of the field. Valid types are defined in databases.h:

DBTYPE_INTn, DBTYPE_UINTn: Signed and unsigned integer values of different bit lengths (8, 16, or 32).
DBTYPE_TIME: A time_t value.
DBTYPE_STRING: A string (char *) value, which can be NULL.
DBTYPE_BUFFER: A fixed-length buffer. The buffer size is given in DBField.length.
DBTYPE_PASSWORD: A Password value, as defined in encrypt.h.

int offset

The offset in bytes from the start of the record to the location where the field's value is stored. For records stored in a struct, this can be obtained using the standard offsetof() macro; for example, the offset of this member in a DBField structure is given by:

offsetof(DBField, offset)

int length

For DBTYPE_BUFFER fields, this gives the length of the buffer, in bytes. This value is ignored for other field types.

int load_only

If nonzero, the field is not saved to persistent storage. This is intended to facilitate changes in table format, as described below.

void get(const void *record, void **value_ret)

If not NULL, provides a function to retrieve the field's value; the database subsystem will call this function instead of simply accessing the data stored in the field. record is a pointer to the record structure, and value_ret points to a buffer to receive the value; the buffer will be large enough to hold the data type specified by DBField.type. (For strings, store a char * value in *value_ret; the string will not be freed after use, so a static buffer or other method is needed to avoid memory leaks if the string is generated dynamically.)

void put(void *record, const void *value)

If not NULL, provides a function to set the field's value; the database system will call this function instead of simply storing the loaded data into the field. record is a pointer to the record structure which is to receive the data, and value points to the data itself, in the format given by DBField.type. (For strings, this is a char * value which may be NULL; if not NULL, the string has been allocated with malloc() and will not be freed by the database subsystem, so you will need to free it if you do not store the pointer directly into the record.)

This structure is designed with the assumption that data will be stored in some structured type in memory; if no get() or put() routine is provided, the database subsystem will simply access the memory location derived by adding the field's offset to the record pointer returned by the table's record access functions (newrec(), first(), or next()). If this is not sufficient, however, the table owner can define get() and/or put() functions for accessing the data.

One member of the DBField structure that deserves particular mention is the load_only member. If a field's load_only value is nonzero, then the field will be ignored when the database is saved to persistent storage. This can be used to handle changes in the format of a table; if the old field is left defined with load_only nonzero and a put() routine provided, that routine will be called whenever a record with the old field is loaded, allowing the old field's value to be processed as necessary to fit the new table format. Alternatively, the old field can be left in the in-memory structure, and code added to the table's insert() routine to handle the data translation.

6-2-2. Registering and unregistering tables

In order for a database table to be loaded from and saved to persistent storage, it must first be registered with the database subsystem by calling the register_dbtable() routine; the complementary unregister_dbtable() routine must be called when the table is no longer needed (for example, when the module owning the table exits). The routines' prototypes are as follows:

int register_dbtable(DBTable *table) void unregister_dbtable(DBTable *table)

Both routines take a pointer to the DBTable structure describing the table to be registered or unregistered. The register_dbtable() routine returns nonzero on success, zero on failure.

Note that register_dbtable() assumes that the in-memory table is empty, and has no facility to signal the database owner to clear the table before data is loaded. The database owner must ensure that the table is empty or take whatever other precautions are appropriate before registering the table.

6-2-3. Loading and saving data

Data loading and saving is performed on a per-table basis when the table is registered or unregistered, respectively; thus data is immediately available for use when register_dbtable() returns successfully, and any changes made to the data after unregister_dbtable() is called will not be reflected in persistent storage. There is also an auxiliary routine, save_all_dbtables(), which causes all registered tables to be saved (synced) to persistent storage immediately:

int save_all_dbtables()

This routine returns one of the following values:

1 if all database tables were saved with no errors, or if no tables are registered.
0 if some tables were saved successfully, but errors occured on at least one table.
-1 if no tables were saved successfully.

This routine is called by the main loop (via the save_data_now() helper routine) at periodic intervals or when explicitly requested, as described in section 2-3-3.

6-3. Database modules

The core portion of the database subsystem only provides the interface for persistent storage of databases; the actual work of transferring data to and front persistent storage is performed by database modules. The standard database modules are located in the modules/database directory.

A database module registers itself with the core part of the subsystem by calling register_dbmodule(); as with tables, the module must unregister itself with the complementary unregister_dbmodule() when exiting:

int register_dbmodule(DBModule *module) void unregister_dbmodule(DBModule *module)

Only one database module may be registered; if a second module tries to register itself, register_dbmodule() will return an error (zero).

The DBModule structure passed to these functions contains two function pointers:

int (*load_table)(DBTable *table) int (*save_table)(DBTable *table)

As the names suggest, load_table() is called to load a table from persistent storage, and save_table() is called to save a table to persistent storage. Both routines should return nonzero on success, zero on failure.

Since the DBTable structures representing registered database tables are passed directly to these two routines, the module must take care to observe the restrictions and requirements on calling the table's function pointers documented in section 6-2 above, such as not calling newrec() twice without an intervening insert() or freerec() and ensuring that postload() is called when a table has been loaded. Implementation note: A better implementation might hide the DBTable structure from database modules, providing an interface that ensures the rules are followed.

To simplify data access logic and avoid bugs caused by misuse of data fields, database modules should use the get_dbfield() and put_dbfield() routines to read and write fields in database records. These routines are declared as:

void get_dbfield(const void *record, const DBField *field, void *buffer) void put_dbfield(void *record, const DBField *field, const void *value)

The routines will automatically call the field's get() or put() routine if one is supplied, or else copy the field's value to or from the supplied buffer.

6-4. Specific module details

Services includes two standard database modules. The first, database/standard, is (as the name implies) intended to be the standard module for use with version 5.1; it stores each table in a binary data file. The second module, database/version4, uses the same file format as was used in Services versions 4.x and 5.0, and is intended for compatibility when testing Services 5.1 or converting databases to the new format. (It is not possible to have one module handle loading and a different module handle saving, so data must first be exported to XML using the version4 module and then imported using the standard module in the latter case.)

6-4-1. `database/standard`

6-4-1-1. Data format

This module stores each table in a file whose name is constructed by replacing all non-alphanumeric characters (except hyphens and underscores) by underscores and appending ".sdb". The file format consists of three main sections, described below. In all cases, numeric data is written in big-endian format (with the most-significant byte first); strings are stored as a 16-bit length in bytes followed by the specified number of bytes of string data (including a terminating null byte), with a NULL string indicated by a length value of zero. Implementation note: This obviously limits the length of a string to 65,534 bytes. This is the result of reusing the string reading and writing routines used for the old file format; while it has not proved to be a problem to date, it is nonetheless an unnecessary artificial limitation.

The file header

The file header contains basic information about the file, in the following four fields:

File format version: (32-bit integer) A value which identifies the file format in use. This is always the constant NEWDB_VERSION; the upper 24 bits of this value contain the ASCII string "ISD", identifying the file as an IRC Services Database, and the lower 8 bits contain a format version number, currently 1.
Header size: (32-bit integer) The total size of the header, in bytes. Currently 16.
Field list offset: (32-bit integer) The offset in bytes from the start of the file to the field list, described below.
Record data offset: (32-bit integer) The offset in bytes from the start of the file to the record data, described below.

The field list

The field list contains information about the fields in the data table and how they are stored in the file. The field list can be stored anywhere in the file, but the current implementation writes it immediately after the file header. The field list consists of a header followed by a variable number of field entries. The header contains the following three values:

Field list size: (32-bit integer) The total size of the field list, in bytes.
Number of fields: (32-bit integer) The number of fields in the field list.
Record data size: (32-bit integer) The size in bytes of the fixed part of a single record's data. This is the portion of the record data which is always stored in the same format, excluding variable-length data such as strings.

For each field, the following data is recorded:

Field data size: (32-bit integer) The size in bytes of the data as stored in the fixed part of the record data. All fields are stored consecutively in the order they appear in the field list, with no padding; thus the offset of a field's data is equal to the sum of the sizes of all previous fields.
Field type: (16-bit integer) The type of the field. The value is one of the DBTYPE_* constants defined in databaess.h. Implementation note: This is a bad idea; it would be better to explicitly define constants in standard.c to avoid problems arising from changes in the values of the constants.
Field name: (string) The name of the field.

The record data

The last section of the file contains the actual data for each record in the table. To avoid the potential for a corrupt record to render all following records unreadable (if, for example, the length of a string is incorrect), the actual record data is preceded by a record descriptor table, which contains a file offset pointer and total length for each record's data.

In order to simplify the writing of database files, the record descriptor table is allowed to be fragmented into multiple parts. Each partial table consists of an 8-byte header containing:

Next table pointer: (32-bit integer) The absolute file offset (in bytes) of the next record descriptor table. Set to zero for the last table in the file.
Table length: (32-bit integer) The length of this record descriptor table in bytes, including the header.

The remainder of the table is filled with 8-byte record descriptors, each containing:

Record data pointer: (32-bit integer) The absolute file offset of the record's data.
Record data length: (32-bit integer) The length of the record's data in bytes.

Note that the header has the same format as a record descriptor, so the entire descriptor table can be treated as an array of descriptors in which the first entry points to the next table rather than a particular record.

The record data pointed to by each descriptor consists, in turn, of a fixed-length part and a variable-length part. The fixed-length part (also referred to in the field list description above) contains all data which is of a fixed length for every record; this includes all numeric data, as well as a 32-bit data offset pointer for strings (see below). Variable-length data is stored immediately after the fixed-length part of the data, in arbitrary order.

The various field types are stored as follows (where not explicitly mentioned, the value is stored entirely in the fixed-length part of the record data):

DBTYPE_INTn, DBTYPE_UINTn: The value is stored using the requisite number of bytes (1, 2, or 4, depending on the data type size).
DBTYPE_TIME: The value is stored as a 64-bit integer.
DBTYPE_STRING: A 32-bit data offset is stored in the fixed-length part; this is a byte offset relative to the start of the record data, and points to the location of the actual string data (a16-bit length followed by character data), stored in the variable-length part of the record.
DBTYPE_BUFFER: The value is stored using the number of bytes specified by DBField.length.
DBTYPE_PASSWORD: The value is stored as a data offset pointing to the string giving the cipher name (Password.cipher) followed by a fixed buffer of PASSMAX bytes. The cipher name itself is stored in the variable-length part, like other strings.

6-4-1-2. Module structure

Database loading and saving are handled by the routines standard_load_table() and standard_save_table(), respectively. (The standard_ prefix comes from the module name, and is included to avoid potential name clashes with other database modules, which would complicate debugging.) Each of these routines calls three subroutines to handle each of the three parts of a database file described in section 6-4-1-1 above.

The SAFE() preprocessor macro defined at the top of the file is used in read and write operations to check for a premature end-of-file (on read) or a write error (on write) and abort the routine in these cases.

Three helper functions used in loading and saving are defined first:

TableInfo *create_tableinfo(const DBTable *table): Generates a TableInfo structure corresponding to the given database table. The TableInfo structure is defined at the top of the file, and includes the size of each field as stored in memory and on disk, as well as the location of each field within the record's data as written to disk. (This latter value, offset, is set to -1 by this routine, since it is initialized differently when loading than when saving.)
void free_tableinfo(TableInfo *ti): Frees a TableInfo structure created by create_tableinfo().
const char *make_filename(const DBTable *table): Generates the filename corresponding to the table name for the given table. The returned filename string is stored in a static buffer, which will be overwritten by subsequent calls.

Following these routines is standard_load_table(), along with its helper routines read_file_header(), read_field_list(), and read_records(). When called, standard_load_table() takes the following actions:

Generates a TableInfo structure for the database table.
Opens the file corresponding to the table, using open_db() from fileutil.c(see section 6-5-1).
Calls read_file_header() to read in the file header.
Seeks to the beginning of the field list, and calls read_field_list() to read it in.
Seeks to the beginning of the record data, and calls read_records() to read it in.

read_file_header() is fairly straightforward; it simply reads in the four header fields, checks the version number and header size to ensure that they have appropriate values, and returns the field list and record data offsets in the variable references passed in.

read_field_list() is slightly more complex; since there is no guarantee that the record structure stored in the file will match that given by the DBTable structure, the routine must match fields in the file to those in the structure. read_field_list() iterates through the fields in the loaded table, searching the TableInfo structure for a matching field (the name, type, and field size must all match); if found, the record data offset is recorded in the TableInfo structure, while unknown fields are simply ignored. Implementation note: As a side effect of this handling, fields like nicknames, channel names, and passwords will cease to be recognized if the relevant buffer sizes are changed, thus the note in defs.h about backing up the data before changing the constants.

read_records() reads in and loops through the record descriptor tables, continuing until an empty descriptor, signifying the end of the table, is found. In order to avoid duplication of code, the descriptor table is loaded (when necessary) at the beginning of the loop; however, since the descriptor table must be loaded before the end-of-data check can be made, the loop termination check is performed in the middle of the loop, immediately after the descriptor table loading. The recnum loop variable indicates the current index in the descriptor table, with 1 meaning the first record descriptor (after the header) and 0 meaning that a new table has to be loaded; the modulo arithmetic in the loop variable update expression ensures that when the index reaches the end of the table, it will be reset to zero, causing the next table to be loaded.

The table-saving routine standard_save_table() and its subroutines write_file_header(), write_field_list(), and write_records() operate in essentially the same way, although they are slightly simpler because there is no need to check for invalid data, as must be done while reading. The other point worth mentioning is that open_db() automatically writes the version number given as the third parameter into the file, so there is no need for write_file_header() to do so. Implementation note: Yes, this is ugly; see section 6-5-1 for an explanation.

Finally, the source file concludes with the standard module variables and routines, along with the DBModule structure required for registering the database module. However, for this module they are enclosed by #ifndef INCLUDE_IN_VERSION4 and #endif; this is so that the source file can be directly included in the database/version4 module (see section 6-4-2-2 below) without causing identifier conflicts or other problems.

6-4-2. `database/version4`

This module is intended as a compatibility/transition module, and (as the name implies) supports database files in the format used by 4.x versions of Services, as well as the extended form of that format used in version 5.0. These versions did not support generic database tables, so any such tables which do not correspond to tables used in version 4/5 format files are simply written out in the same format used with the database/standard module.

6-4-2-1. Data format

The pre-5.1 data file format is a rather complex beast, an extended form of the original database files which were simply binary dumps of the structures used in memory. The format is not documented outside of the code that implements it, and even I (the developer) often have to refer to the code when analyzing a database file from these versions.

The database files used encompass all of the data handled by the standard pseudoclients, but the files are generally split by pseudoclient name rather than individual table: for example, the nickname, nickname group, and memo data are all stored in the same database. This, again, derives from the fact that such data was at one time all stored as part of the same structure (even now, memos are stored in the nickname group structure rather than having their own separate table in memory). The list of files and the tables they encompass follows:

nick.db contains the nick, nickgroup, nick-access, nick-autojoin, memo, and memo-ignore tables.
chan.db contains the chan, chan-access, and chan-akick tables.
oper.db contains the oper table.
news.db contains the news table.
akill.db contains the akill and exclude tables.
exception.db contains the exception table.
sline.db contains the sgline, sqline, and szline tables.
stats.db contains the stat-servers table.

In general, the contents of each database file can be divided into three parts: the base data, the 5.0 extension data, and the 5.1 extension data, all concatenated together. This division of data was introduced in version 5.0 to allow databases written by version 5.0 to be read by version 4.5 (minus the 5.0-specific features, of course); version 5.0 wrote the data in the format used by version 4.5, then appended the 5.0-specific data to the end of the file, so that the 4.5 code would simply ignore it, believing that it had reached the end of the data, while the 5.0 code would know to look for the extension data to supplement the (possibly inaccurate) data in the base part. Version 5.1 takes the same approach with respect to version 5.0, resulting in database files which are very convoluted but which can be used in any of versions 4.5, 5.0, or 5.1.

Each part begins with a 32-bit version number identifying the format of the data (like the 5.1 standard format, all values are stored in big-endian byte order); this value is fixed at 11 for the base data and 27 for the 5.0 extension data, the file version numbers used in the final releases of these versions of Services. The file version is followed immediately by the data itself, whose format varies depending on the particular data being stored. Simple arrays like news and autokill data typically use a 16-bit count followed by the appropriate number of repetitions of the data structure, a format which is also used for sub-arrays such as access lists within nickname and channel data. Nicknames and channels, on the other hand, do not have a count field, and instead simply consist of a byte with value 1 followed by the nickname or channel data structure for as many structures as necessary, followed by 256 zero bytes indicating the end of the table. (The reason for 256 zero bytes instead of just one is that very old versions of Services, earlier than version 4.0, wrote out each collision list of the 256-element hash arrays separately, terminating each list with a zero; when this was changed, the fiction of 256 collision lists was kept in order to simplify the database reading logic.)

For cases where there is a difference in data format or content between the base, 5.0, and 5.1 data, the data is written so that if loaded by the corresponding version of Services, it will be interpreted as closely as possible to the true value. For example, the 32-bit nickname group ID is written into 16 bits of the nickname flags and the 16-bit registered channel limit in the base data, since 4.5 does not interpret these bits; however, since 5.0 does make use of them, the correct values of those two fields are then re-recorded in the 5.0 extension data. Similarly, channel access levels are recorded in the base data using the 4.5 access level system (a range from -9999 to 9999 with standard levels clustered from -2 to 10), and again in the 5.0 extension data using the current system.

6-4-2-2. Module structure

The source file, version4.c, starts with a workaround for a limitation of static module compilation. As with the database/standard module, this module makes use of the utility routines in fileutil.c; however, if fileutil.c is simply linked into the module, as is done with the database/standard module, an error would occur at link time due to the symbols being defined in both modules. While it is possible to adjust the compilation process to avoid this problem, the database/version4 module instead simply uses #define to rename all of the exported functions in fileutil.c, then includes that source file directly.

The four version number defines indicate the file version numbers to be used with various parts of the data:

FILE_VERSION: The file version used for the base data. Always 11 (the last value used in Services 4.5).
LOCAL_VERSION: The file version used for the 5.1 extension data. Incremented when the 5.1 extension data format changes.
FIRST_VERSION_51: The first file version used in Services 5.1. Used to ensure that the 5.1 extension data is valid.
LOCAL_VERSION_50: The file version used for the 5.0 extension data. Always 27 (the last value used in Services 5.0).

The CA_SIZE_4_5, ACCESS_INVALID_4_5, and def_levels_4_5[] constants and array are used when processing channel privilege level data as stored in the base data section. Since Services 4.5 always stored the privilege level array, even if all values were set to the defaults, this array is used to detect such a case when loading data and to supply data for channels using the default settings when saving. (The channel access levels themselves use a different scale in 4.5; this is handled by the convert_old_level() and convert_new_level() helper functions, defined later.)

The last set of compatibility constants and variables, MAX_SERVADMINS, services_admins[] and so on, is used to handle loading and saving of the Services administrator and operator lists in oper.db. (Version 4.5 kept these separate from the nickname data, as opposed to the current method which stores the OperServ status level in the nickname group data.)

Following these preliminary declarations are the main load and save routines, version4_load_table() and version4_save_table(), preceded by forward declarations of the individual table handling routines. For the most part, these consist of checking the name of the table to be loaded or saved and calling the appropriate routine; however, since most database files encompass two or more tables, the table pointers must be saved in local variables until all relevant tables are available. Also, several tables are simply ignored this is because the load/save routines access the corresponding data directly through the parent structures (for example, channel access and autokick lists are accessed via the ChannelInfo structures in the chan table). One other workaround required when loading data is the temporary setting of the global noexpire flag; as the comments in the code indicate, this is because the databases are loaded in several steps, and records' expiration timestamps may not be correct until the final step, so leaving expiration enabled could cause records to be improperly expired during the loading process (since expiration occurs when a record is accessed via the various pseudoclients' get(), first(), and next() functions).

Next are three short utility functions. The first, my_open_db_r(), calls open_db() from fileutil.c (see section 6-5-1) to open the given database file for reading, then reads in the file version number and checks that it is within range for the base data section; the version number is then returned in *ver_ret. (File versions below 5, corresponding to Services 3.0, are not supported because they stored numeric values in a machine-dependent format.) The other two utility routines, read_maskdata() and write_maskdata, are used to read and write lists of MaskData structures, used (for example) in autokills and S-lines.

The bulk of the module is taken up by the routines to load particular tables. Since each database file has its own particular format, the table load/save routines must be tailored for each file; the load routines, in particular, must be able to handle multiple versions of files, and as such are especially complex (for the nickname and channel tables, the load routine is broken up into several subroutines). For the sake of simplicity and speed, the routines access the relevant structures directly rather than going through the DBField entries of the table; this means that the module must be updated whenever the structures' formats or meanings change, but as the module is only intended as a transitional one, this is not seen to be a significant problem.

The load/save routines also call some routines defined in the various pseudoclient modules, such as get(), first(), and next() routines for the various data structures. Since the database may be (and generally is) loaded before the pseudoclient modules, the symbols must be imported appropriately; this is handled by the extsyms.c and extsyms.h auxiliary files, though the handling is rather machine-dependent. See section 6-5-2 for details.

The routines used for loading and saving tables which do not correspond to any of the files listed above, load_generic_table() and save_generic_table(), are actually renamed versions of the standard_load_table() and standard_save_table() routines defined in the database/standard module. To avoid the difficulties involved in trying to load two database modules at once, this module simply includes the standard.c source file directly, after setting up #define directives to rename the load and save routines; a #ifndef INCLUDED_IN_VERSION4 protects the parts of the database/standard module not related to loading and saving, avoiding multiple definitions of module-related symbols.

6-5. Auxiliary source files

6-5-1. `fileutil.c`, `fileutil.h`

fileutil.c (and its corresponding header, fileutil.h) provide utility functions used by both the database/standard and database/version4 modules for reading and writing binary data files. The functions use a dbFILE structure to indicate the file to be read from or written to; this is analagous to the FILE structure used by stdio-style functions, but includes extra fields used by the open and close functions to ensure that a valid copy of the file is retained even if a write error occurs (see the function descriptions below for details). The actual file pointer is also available in the structure's fp field for direct use with the stdio functions.

There are several preprocessor conditionals on CONVERT_DB scattered throughout the code. These are used to prevent unneeded portions of code, particularly log- and module-related functions, from being seen when the source file is compiled for the convert-db tool.

The following functions are available. Note that all of the read/write functions (except get_file_version() and the raw read/write functions read_db(), write_db(), and getc_db()) share the property that they return 0 on success and -1 on error.

int32 get_file_version(dbFILE *f): Retrieves the file version number from the given file. Returns -1 if the file version could not be read.
int write_file_version(dbFILE *f, int32 filever): Writes the specified file version number to the file. Returns 0 on success, -1 on failure.
dbFILE *open_db(const char *filename, const char *mode, int32 version): Opens the given file for reading (mode=="r") or writing (mode=="w"), returning the dbFILE structure pointer on success, NULL on failure. When opening a file for writing, the actual file created is a temporary file whose name is the given filename with ".new" appended; when close_db() is called, the rename() system call is used to overwrite any existing file with this temporary file. This ensures that a valid copy of the file will remain on disk even if the writing process is interrupted for some reason. The version parameter is used only when opening a file for writing, and is automatically written to the file using write_file_version().
int close_db(dbFILE *f): Closes the given file. If the file was open for writing, the temporary file is renamed over the original (if any exists), generating an error if the rename operation fails. Returns 0 on success, -1 on failure.
void restore_db(dbFILE *f): Closes the given file. If the file was open for writing, removes the temporary file, leaving the original file unchanged. This function never generates an error (errors returned from fclose() are ignored), and preserves the value of errno.
int read_db(dbFILE *f, void *buf, size_t len): Reads the specified number of bytes from the file into buf, returning the number of bytes successfully read or -1 on error. Implemented as a macro in fileutil.h.
int write_db(dbFILE *f, const void *buf, size_t len): Writes the specified number of bytes from buf into the file, returning the number of bytes successfully written or -1 on error. Implemented as a macro in fileutil.h.
int getc_db(dbFILE *f): Reads a single byte from the file, returning the byte's value on success, -1 on error. Implemented as a macro in fileutil.h.
int read_int8(int8 *ret, dbFILE *f): Reads an 8-bit integer from the file, storing it in the location pointed to by ret. Returns 0 on success, -1 on failure.
int read_uint8(uint8 *ret, dbFILE *f): Reads an unsigned 8-bit integer from the file. Identical in behavior to read_int8(); this function is provided to avoid signed/unsigned type conversion warnings when compiling.
int write_int8(int8 val, dbFILE *f): Writes the given 8-bit integer to the file. Returns 0 on success, -1 on failure.
int read_int16(int16 *ret, dbFILE *f): Reads a 16-bit integer from the file, storing it in the location pointed to by ret. Returns 0 on success, -1 on failure.
int read_uint16(uint16 *ret, dbFILE *f): Reads an unsigned 16-bit integer from the file. Identical in behavior to read_int16().
int write_int16(int16 val, dbFILE *f): Writes the given 16-bit integer to the file. Returns 0 on success, -1 on failure.
int read_int32(int32 *ret, dbFILE *f): Reads a 32-bit integer from the file, storing it in the location pointed to by ret. Returns 0 on success, -1 on failure.
int read_uint32(uint32 *ret, dbFILE *f): Reads an unsigned 32-bit integer from the file. Identical in behavior to read_int32().
int write_int32(int32 val, dbFILE *f): Writes the given 32-bit integer to the file. Returns 0 on success, -1 on failure.
int read_time(time_t *ret, dbFILE *f): Reads a timestamp value from the file, storing it in the location pointed to by ret. Returns 0 on success, -1 on failure. Timestamp values are always stored using 64 bits, regardless of the size of the time_t type.
int write_time(time_t val, dbFILE *f): Writes the given timestamp value to the file. Returns 0 on success, -1 on failure.
int read_ptr(void **ret, dbFILE *f): Reads a pointer value from the file, storing it in the location pointed to by ret. The value will be either NULL or an arbitrary non-NULL value. Returns 0 on success, -1 on failure. Implementation note: This function and its complement, write_ptr(), are included only for use by the database/version4 module and the convert-db tool, which actually do have to deal with pointers written in this way.
int write_ptr(const void *ptr, dbFILE *f): Writes the given pointer value to the file. The actual pointer itself is not stored, only a flag indicating whether the pointer is NULL or not. Returns 0 on success, -1 on failure.
int read_string(char **ret, dbFILE *f): Reads a string from the file, allocating memory for the string using malloc() and storing a pointer to the string in the location pointed to by ret. Note that the value stored may be NULL. Returns 0 on success, -1 on failure.
int write_string(const char *s, dbFILE *f): Writes the given string (which may be NULL) to the file. The string must be no longer than 65,534 bytes (if longer, the value written will be silently truncated). Returns 0 on success, -1 on failure.
int read_buffer(buf, dbFILE *f): Reads the given buffer (assumed to be declared as, e.g., a char array) from the file. Returns 0 on success, -1 on failure. Implemented as a macro in fileutil.h.
int write_buffer(buf, dbFILE *f): Writes the given buffer (assumed to be declared as, e.g., a char array) to the file. Returns 0 on success, -1 on failure. Implemented as a macro in fileutil.h.

6-5-2. `extsyms.c`, `extsyms.h`

extsyms.c and extsyms.h are used by the database/version4 module to import external symbols from other modules which may not be loaded when the version4 module is initialized. The version4 module makes use of a number of functions and variables from the various pseudoclient modules, and adding code at every use to check whether the appropriate module is loaded and look up the symbol would only further complicate already complex code. For this reason, the actual work of looking up the symbols is done in extsyms.c, and extsyms.h provides redefinition macros to allow the version4 module to be written as if the functions and variables were already present.

The actual work of looking up and accessing (for values) or calling (for functions) the external symbols is implemented by the IMPORT_FUNC(), IMPORT_VAR(), and IMPORT_VAR_MAYBE() macros defined in extsyms.c. These macros all have the same basic format: they define a variable of the form __dblocal_symbol_ptr to hold the value of the symbol (the address of the function or variable), followed by a function which looks up the symbol's value if it is not yet known, then accesses or calls it. (Module pointers are likewise cached in file-local variables, declared separately.) If the symbol or its module cannot be found, the local routine fatal_no_symbol() is called to abort the program, except for IMPORT_VAR_MAYBE(), in which case a default value is returned from the accessing function if the symbol is not available.

The logic for accessing an external variable is simple; a reference to the variable is translated by macros in extsyms.h into a call to the function defined by IMPORT_VAR() or IMPORT_VAR_MAYBE() (whose name has the format __dblocal_get_variable()), which accesses the variable's value through the pointer obtained from looking up the symbol and returns it. The function's declaration uses the GCC typeof() built-in operator to give the function's return value, as well as the cache variable for the symbol value, the same type as the variable itself.

Calling an external function is a more complex task, due to the fact that functions can take parameters or not and can return or not return a value. Rather than explicitly writing out the symbol access functions for each external function accessed, extsyms.c makes use of a GCC feature which allows a function to call another function, passing along the same parameters passed to the parent function, and return its return value without knowing anything about either the parameters or the type of return value. This feature is the builtin apply/return code, which takes the general form:

__builtin_return(__builtin_apply( function_pointer, __builtin_apply_args(), parameter_buffer_size))

where function_pointer is a pointer to the function to be called, and parameter_buffer_size is the maximum amount of stack space expected to be used by the parameters to the function, if any. If this feature is not available, for example because a compiler other than GCC is in use, then the code tries to use another (assembly-based) algorithm to accomplish the same thing if possible, or generates a compilation error if no such substitute algorithm is available.

However, the use of the __builtin_apply() GCC feature in Services has, over the course of Services' development, revealed a few bugs in the implementation of that feature; as such, Services must sometimes resort to an assembly-based algorithm even when using GCC. The necessity of this is indicated by the preprocessor macro NEED_GCC3_HACK, which is set by the configure script if it detects that this workaround is required. The bugs which have been discovered are:

The generated code can access the wrong area of memory when setting up the stack for the called function (GCC Bugzilla bug 8028 [gcc.gnu.org]).
The generated code can fail to pass through the called function's return value (GCC Bugzilla bug 11151 [gcc.gnu.org]).
A function calling __builtin_apply() can behave incorrectly if inlined in another function. (GCC Bugzilla bug 20076 [gcc.gnu.org]). This is not directly relevant to extsyms.c, but caused problems at one time in the configure script.

Finally, in order to avoid cached pointers going stale when a module is unloaded, extsyms.c includes a callback function for the "unload module" callback, which clears out all cached pointers for a module when the module is unloaded.