SIL Coding Style Guidelines

Language standard

Use C99 (but avoid complex number types and variable-length arrays) and C++11. Anonymous unions are allowed, but otherwise avoid compiler-specific features.

Rationale: All modern compilers support C99 and C++11, with occasional minor exceptions that generally do not impact SIL. Modern compilers also support anonymous unions; this feature is nonstandard in C99, but it has been added to the C11 standard, so such unions will be forward-compatible. On the flip side, the C11 standard makes C99's complex number types (the <complex.h> header) and variable-length arrays optional, so those features should be avoided.

Type size assumptions

Assume that int is at least 32 bits wide, that pointer values are at least as wide as int, and that function pointer types can be safely converted to and from void *. Make no other assumptions beyond what is required by the relevant language standard.

Rationale (int): In all modern PC-class environments, including devices such as tablets and game consoles which can run general-purpose software, the int type is at least 32 bits wide. (There are some embedded processor environments which use 16-bit ints to better match the hardware's capabilities, but such environments are generally not suited to running SIL-based programs.) Modern programs frequently need to work with data larger than a 16-bit integer can hold; requiring all code to either check for 16-bit overflow or explicitly use a 32-bit type would significantly increase the risk of bugs.

Rationale (pointers): In all modern PC-class environments, pointers to both data and functions are simple scalar values of at least the native word size, ensuring that an int-sized value can be safely stored in and later retrieved from a pointer variable and that a function pointer can be safely converted to and from a void pointer. This technique is useful in certain cases, such as when passing an integer value or function pointer through an interface that takes an opaque pointer argument. While these conversions should be avoided when possible, for the purposes of SIL they may be considered safe.

Note that converting a pointer to int and back is not safe! On systems with 64-bit pointers but 32-bit int, the upper 32 bits of the pointer would be lost in such a conversion.

Integer types

Use int by default, sized types when appropriate. Use int8_t instead of char for numeric data. Avoid long and short except where required.

Rationale (int): We assume that the int type is at least 32 bits wide (see type size assumptions above), so there is no need to use long or int32_t merely to ensure a 32-bit data type. Using int as widely as possible reduces the chance of accidental truncation due to conversion between types of different sizes.

Rationale (sized types): Since we assume int is at least 32 bits wide, there should rarely be a need to specify sized types for local variables. However, sized types can be useful in certain cases, such as:

int64_t for values that may grow larger than 32 bits (prefer this to long long since the latter may be a 128-bit type in 64-bit environments and thus less efficient—though long long may be useful if the value is to be used in a formatted string);
when the size of the data is known ahead of time (such as a 16-bit pixel value);
to reduce the memory footprint of structured data (for a specific example, see boolean values below); or
to map a structure to an external byte stream (such as a file).

Rationale (char): While char and int8_t are normally the same internal type, char should be limited to data which is actually textual in nature. For 8-bit numeric data, use int8_t (or uint8_t, but see signed vs. unsigned integers below) to indicate to the reader that the data is numeric. Note in particular that whether char is signed or unsigned depends on the compiler, so using char for a signed 8-bit integer can result in nasty surprises. (For this reason, char variables should only be compared against other data of char type such as character literals, not against integer values.)

Rationale (long and short): Sized integer types make long and short generally unnecessary, but when calling library functions with long- or short-type parameters or return values, it can be more convenient to use those types directly instead of casting back and forth. However, try not to propagate such types outside the immediate locality of the library call.

Signed vs. unsigned integers

Use signed types unless there's a good reason to use unsigned types. Don't use unsigned types just because you never expect to have a negative value.

Rationale: Conversions between signed and unsigned integer types are a perennial source of bugs, so much so that most modern compilers warn about mixing them. The easiest way to avoid these bugs is to not use unsigned types at all. In particular, "this value will never be negative" is not a reason to use an unsigned type; someday the value will go negative, and your code will break.

There are still a few cases in which unsigned types are beneficial:

Library functions: Many of the standard C library functions take unsigned parameters or return unsigned results (size_t is a major offender here). However, try not to propagate the unsigned type outside the immediate locality of the library call.
Data requiring the full unsigned range: Pixel data is a common example of this; a pointer to 8-bit-per-component RGBA pixel data must be uint8_t, not int8_t, since 8-bit color component values range from 0 through 255. (However, you can and should use signed int rather than uint8_t for local variables holding such values.)
Bitfields: Integers used as bitfields (such as for flags) should usually be unsigned, so that masking and shifting right the highest bit does not result in a negative value. But consider breaking out the flags or fields into separate variables if memory size is not an important consideration.
Wraparound behavior: If a variable's value is expected to wrap around from its maximum to its minimum or vice versa, unsigned variables may be a better choice because the C language guarantees wraparound behavior for unsigned integer types but not for signed ones, so less effort is needed to portably ensure correct behavior. However, additional care may be needed when taking differences of two such values, since the C language does not specify the behavior of storing a large unsigned value into a signed variable (thus, for example, (int)~0U == -1 is not guaranteed to hold on all systems).
Optimization: In some cases, it takes fewer CPU instructions or cycles to operate on unsigned values than on signed values. However, make sure the benefit is worth it before taking this step (remember that premature optimization is the root of all evil).

Boolean values

Use the uint8_t type for data stored in memory, int otherwise. Never assign a value other than 0 or 1 to a boolean variable.

Rationale (data types): Experience has shown that boolean flags are used sufficiently often to warrant the use of a smaller data type than int when such values are stored in memory. On the other hand, values typically stored in registers, such as function parameters and return values or local variables, do not benefit from using a smaller-sized type. Note that C99 provides the _Bool type, along with the <stdbool.h> header which defines it as bool, but this type is not guaranteed to be link-compatible with the C++ bool type, so it is unsuitable for use in a library such as SIL which may be linked with C++ code.

Rationale (assignment of values): While C treats any nonzero value as true, assigning an arbitrary value directly to a boolean variable can have unexpected results: for example, on a system where long is larger than int, the value LONG_MIN itself is nonzero and therefore true, but assigning it directly to a boolean variable (of type int or uint8_t) will result in a false value due to truncation. It is permissible to copy the value of one boolean variable to another if the first variable's value is known to be safe (either 0 or 1), but do not assume that boolean arguments passed to an interface function have safe values.

Floating-point types

Use float by default, double only when necessary. Avoid long double entirely.

Rationale (float): Computations using double values are generally more expensive than computations with float values even on systems with hardware support for double-precision floating point. Some older systems implement double-precision operations in software, in which case the difference in execution time can reach an order of magnitude or more and significantly impact overall program performance as well as code size. For the vast majority of floating-point computations typically performed by SIL-based programs, float provides sufficient precision to ensure correct behavior. Additionally, the consistent use of a single type helps avoid unexpected behavior that can result from mixing precisions, such as loss of precision when a double value is passed through a function which takes a float parameter.

Rationale (long double): Everything above regarding the disadvantages of double applies even more strongly to long double. The range and precision of double are great enough that its use should not introduce any user-visible artifacts; long double is generally suited only to scientific applications, and an argument could even be made that if the precision of double is insufficient for a given application, then the floating-point model itself is inappropriate and the application should use an alternate representation for its numeric values.

An example of appropriate usage of double is in the SIL interface time_now(), which returns a double-precision timestamp in units of seconds. If a float value was returned, it would quickly lose precision to the point of preventing accurate sub-second timing. For example, using an IEEE 754-compliant single-precision type, the timestamp resolution would drop to around 1 millisecond after 8192 seconds (about 2¼ hours); at a frame rate of 60 frames per second, this is more than 5% of a frame, and attempting to use such a low-resolution timestamp for accurate timing could result in noticeable jitter. double, on the other hand, provides nanosecond or better resolution over a time span of 8,388,608 seconds (more than 3 months).

Floating-point literals

Append an f suffix to floating-point literals to mark them as single precision. Integer literals may be used for integral values in floating-point expressions.

Rationale (f suffix): It can be easy to forget that floating-point literals are double precision by default, but including a double-precision literal in an expression causes all other operands in the expression to be promoted from single to double precision, even if the expression's value is then assigned to a single-precision variable (in which case the value must then be converted again, from double to single precision). Always include the f suffix on floating-point literals to mark them as single precision, except when the literal is intended to be a double-precision value or is used in a double-precision expression.

Rationale (integer literals): Unlike double-precision literals, integer literals do not cause promotion of floating-point operands, so they are generally safe to use in floating-point expressions, and expressions may be easier to read without extraneous ".0"s on such values. However, bear in mind that if both operands to an operator are integers, the operation will be performed as an integer and may consequently overflow; values which have the potential to cause such overflow should be written as floating-point literals, including the f suffix when appropriate.

Exception: It is not necessary (though also not prohibited) to include the f suffix on floating-point literals used in initializers, constant expressions, or function arguments, since such values will be converted to single precision at compile time and thus will have no impact on runtime performance.

Examples:

float rounded_value = floorf(value + 0.5f); /* "3600.0f" is acceptable, but unnecessary if hours is known to be small. */ float hours_to_sec(int hours) {return hours * 3600;} /* Floating-point literal since "1000000000" would cause integer overflow * for sec > 2. */ float sec_to_nsec(int sec) {return sec * 1.0e9f;} /* This function returns a double, so the "f" suffix is not needed (and * would in fact cause an unnecessary conversion from float to double). */ double sec_to_nsec_2(int sec) {return sec * 1.0e9;} /* This function returns a float, but the input variable is a double, so * the literal is written as a double-precision value to match. */ float nsec_to_sec(double nsec) {return nsec / 1.0e9;} /* This version performs the division in single precision. Note the explicit * "(float)" to indicate that single-precision arithmetic was intended. */ float nsec_to_sec_2(double nsec) {return (float)nsec / 1.0e9f;} float blink_interval = 1.25; // "f" suffix not needed on initializers. /* In tests, the "f" suffix may be necessary to ensure that a constant * expression is not computed (by the compiler) in more precision than * appropriate. Leaving the literals below as double precision would cause * expected_value to differ in the least significant bit from the value * computed by the function call. */ float f(float x, float y) {return (x/1000.0f) * (y/1000.0f);} float expected_value = 0.567f * 0.789f; // Not "0.567 * 0.789". ASSERT(f(567, 789) == expected_value);

/* These unnecessarily convert from single to double precision and back to * single precision again. */ float rounded_value = floorf(value + 0.5); float hours_to_sec(float hours) {return hours * 3600.0;} /* Literals in compound assignment operations are treated as operands to * the corresponding arithmetic operations, so they must be explicitly * single-precision when appropriate. This example unnecessarily performs * the multiplication in double precision. */ float x = get_value(); x *= 1.1;

Compound literals

Use for constant arguments or by-reference parameters. Parentheses are not required, but beware of interaction with macros.

Rationale (use): C99-style compound literals can be convenient to avoid the extra verbosity of declaring a variable whose only purpose is as an argument to a function; this usage parallels the use of a constructor return value as a function argument in C++. Single-element array literals can also be convenient when a function takes a numeric argument by reference instead of by value.

Rationale (parentheses): Compound literals already require a pair of parentheses and a pair of braces; adding a second enclosing set of parentheses can contribute to "bracket overload" and thus reduce readability. However, keep in mind that without parentheses, a compound literal will be broken into multiple macro arguments at the commas. You shouldn't be using macros anyway, but if you do need to pass a compound literal to a macro, remember to enclose it in parentheses.

Examples:

graphics_set_fixed_color(&(Vector4f){1.0, 1.0, 1.0, 0.5}); /* Generally acceptable if the structure is only used once, but take care * in particular that the added indentation does not lead to excessive use * of vertical space. */ some_function(handle, index, &(const OperationData){.operation = SOME_OPERATION, .flags = 0, /* ... */}); /* Here, we imagine a get_object_data() function which accepts a buffer * size by reference and returns the actual amount of data written in the * same parameter. If the buffer is known to be the proper size, it can * be more concise to pass the size as a literal single-element array of * the proper type instead of declaring a separate variable just to provide * a location for the (unused) return value. */ ObjectData data; get_object_data(object, &data, (int[]){sizeof(data)}); /* An array compound literal can also be useful to discard a value returned * by reference from a function which does not allow a NULL argument for * the return pointer. In this case, be aware of an oversight in the C99 * specification which requires at least one initializer in the literal * array; writing "(ErrorCode[1]){}" in this example would cause some * compilers to report errors. */ ASSERT(store_object(object, (ErrorCode[]){0}));

/* Avoid adding extra parentheses around the literal. */ graphics_set_fixed_color(&((Vector4f){1.0, 1.0, 1.0, 0.5})); /* If a complex structure is used more than once, make it a variable. */ some_function(handle, index1, &(const OperationData){.operation = SOME_OPERATION, .flags = 0, /* ... */}); some_function(handle, index2, &(const OperationData){.operation = SOME_OPERATION, .flags = 0, /* ... */}); /* If the added indentation results in excessive wrapping, use a variable * instead. (In this particular case, vec4_add() and vec4_scale() would * be an even better choice.) */ graphics_set_fixed_color(&(Vector4f){.x = (first_color.x + second_color.x) * 0.5f, .y = (first_color.y + second_color.y) * 0.5f, .z = (first_color.z + second_color.z) * 0.5f, .w = (first_color.w + second_color.w) * 0.5f});

Zero values

Use NULL for pointers, '\0' for characters.

Rationale: While 0 represents a zero value in any type, use of the most appropriate literal (NULL for pointers, '\0' for characters) helps remind the reader of the data type. However, it is not necessary to write 0.0 or 0.0f for floating-point values (see also floating-point literals above).

Comparing against zero

In a conditional, omit the comparison if the intent of the expression is clear; otherwise, explicitly compare against a zero value.

Rationale: Conditional expressions in if, for, and while statements (as well as subexpressions of the logical operators && and ||) treat any nonzero value as true, so there is no need to explicitly write "expression != 0", and it can often be more readable to omit the comparison against zero. For example, when a zero value or null pointer indicates the absence of an object, it is generally clearer to write "if (object)" (which can be read "if object exists") than to explicitly compare against zero, which can suggest that zero has some special meaning.

Use an explicit zero if the value zero has a specific meaning, such as the first entry in a zero-indexed list. Also use an explicit zero when testing the return value of a comparison function like strcmp() or memcmp() which returns a positive, zero, or negative value to indicate the result of the comparison; an expression like if (strcmp(...)) tempts the reader to think that the test passes if the strings are equal, when in fact it fails for equal strings. (This can be considered another example of "the value zero has a specific meaning". See also failure return values below.)

If you need to store a flag for whether a value is zero, you can use the logical negation operator "!". However, do not use a double negation to test for a nonzero value; explicitly compare against zero instead.

Examples:

for (object = list; object; object = object->next) { // ... } if (user_input == 0) { // 0 is a meaningful value here. run_menu_0(); } if (strcmp(input, correct_answer) == 0) { user_wins(); } alloc_failed = !buffer; // Or "alloc_failed = (buffer == 0)".

if (strcmp(input, correct_answer)) { user_loses(); // Huh? The user loses if they give the right answer? } have_buffer = !!buffer; // Use "have_buffer = (buffer != 0)" instead.

Failure return values

For functions which can fail, use a zero value to indicate failure.

Note that this differs from some common APIs, including POSIX and some standard C library functions, which use zero to indicate success and -1 to indicate failure.

Rationale: When following the rules for comparing against zero, it is semantically clearer for a true value to indicate success and a false value to indicate failure.

Exception: If zero is within the range of legitimate results for the function, such as for a function which returns an array index, use -1 to indicate failure. But if -1 is also a legitimate result, use a return (pointer) parameter to store the result and return only the success/failure status as the function's return value (using zero to indicate failure).

Examples:

if (!sound_play(/* ... */)) { display_error("Failed to play sound"); }

int do_something(void) { // ... return success ? 0 : -1; } /* Given the above function definition, a reader skimming the code who was * not familiar with the function might think the test below was inverted * and might even insert a "!", breaking the logic. */ if (do_something()) { display_error("Failed to do something"); }

Side effects in conditional expressions

Assignment allowed when the intent is clear, but always use an explicit comparison.

Rationale: It can be convenient to use an assignment in the conditional expression of a control statement, to assign a value and test that value at the same time. However, this can also make it harder to follow the flow of the code, so assignments should only be used when that assignment is the primary purpose of the entire expression. Also, a lone assignment expression in a conditional statement can look like a mistyped equality comparison (and indeed, many compilers will emit a warning along those lines), so always enclose the assignment in parentheses and explicitly compare for inequality to zero.

Examples:

if ((value = strtol(s, NULL, 10)) < 0) { // ... } if (!(buffer = mem_alloc(size, 0, 0))) { return ERROR_OUT_OF_MEMORY; } while ((s = strtok(NULL, " ")) != NULL) { // Explicit comparison against zero. // ... }

while ((ptr = get_next_object())) { // Doubly-parenthesized expression is hard to read. // ... } if ((slash = strchr(s, '/')) < (dot = strchr(s, '.'))) { // Pull the assignments out. // ... }

Bitwise operators

Don't use the bitwise operators (<< >> & | ~) to perform arithmetic.

Rationale: In the past, a common C idiom was to replace certain arithmetic operations with bitwise operations, such as replacing multiplication by a power of two with the equivalent left-shift operation, on the (generally correct) theory that bitwise operations execute more quickly than arithmetic ones. Modern compilers are perfectly capable of performing this optimization themselves, so there is almost never any need to resort to this hack. (This can also lead to subtle bugs due to precedence errors.)

In the specific case of signed division, using a right shift instead of an arithmetic division will actually change the behavior of the program, since right-shifting a negative value on a system using two's-complement integer representation will round the value away from zero, while division is defined to round the value toward zero. (Note that the C99 standard declares the result of right-shifting a negative value to be implementation-defined, so it is unwise to rely that behavior even if it were desired.)

Exceptions: Bitwise shifts are allowed if you only have the shift count to work with, as in the "num_sectors" example below. Bitwise operators may also be used if they provide a significant performance benefit, such as in a tight loop where the compiler fails to optimize an arithmetic operation; but even in cases where the use of bitwise operators can make a difference, don't use them unless you've profiled the code and you're certain that the arithmetic operator is a significant bottleneck. Remember that premature optimization is the root of all evil.

In some cases, it may not be clear whether an operation is semantically arithmetic or logical. For example, computing the page number from a memory address could be interpreted as either arithmetically dividing by the page size or logically extracting the bits containing the page number. Use your best judgment in such cases, but don't stretch for a logical-operation interpretation just to make an excuse to use bitwise operators.

Examples:

x /= 2; // Not "x >>= 1"! y %= 8; // Not "y &= 7"! /* Permissible if you don't have sector_size. */ num_sectors = size >> log2_sector_size; /* Bitwise operators are fine for bitwise operations like field extraction. */ green = (pixel >> 8) & 0xFF;

line_offset &= (display_stride - 1); // Don't do this unless you absolutely have to.

The const keyword

Use const freely when appropriate.

Rationale: const can help prevent errors resulting from accidentally assigning to the wrong variable; as a bonus, it generally helps the compiler optimize better. Use it whenever you initialize a variable that won't be changed, such as when saving the result of a function call.

const can be applied at all pointer levels of a pointer variable, but often one const is good enough. (Also note that some library functions expect const at some levels but not at others, and for multi-level pointers, the constness of each level has to match.)

Examples:

const int texture = texture_create(...); /* "const char *str" would be fine too; that formulation would allow the * (pointer) value of str itself to be modified, but typically the greater * concern with string variables is that the content of the string is not * changed, which "const char *" ensures. Note that most modern compilers * will emit a warning if you attempt to assign a string literal to a * (non-const) "char *" variable, and also place string literal data in a * read-only data section which will cause the program to abort if it * attempts to modify that data. */ const char * const str = "foo"; int is_number(const char *s) { const char *end; /* strtol() requires a non-const "char **" as the second parameter. * Rather than declaring the variable as "char *" even though we won't * write through it, we declare it as "const char *" and cast away the * const just for this call. */ (void) strtol(s, (char **)&end, 0); return *s != '\0' && *end == '\0'; }

Constants

Use #define or enum for file-scope scalar constants, const for local scalar constants, and static const for array or structured constants. Don't forget to const both levels of a string array.

Rationale: In C, file-scope constants are not folded, so defining such a constant in a header using static const would emit a copy of the constant in every object file including the header, needlessly wasting space. Scalar constants should therefore be defined using either #define (but be careful when using macros) or enum; the advantage of enum is that the symbol is included in debug information and can be referenced in a debugger, while the disadvantages are that the syntax is slightly more obtuse and that floating-point values can't be used. Constants local to a function, on the other hand, can usually be compiled directly into the instruction stream as a register load, so there is no problem with just using const. (In this case, static is unnecessary and could potentially waste space in the object file.)

Examples:

#define MAX_ENTRIES 100 // Or "enum {MAX_ENTRIES = 100};". enum {STATUS_GOOD = 1, STATUS_BAD, STATUS_UGLY}; /* Note the double "const" here; the first "const" declares the string data * to be immutable, while the second "const" declares the array itself to * be immutable. Both are required to make the data truly constant. */ static const char * const usage_text[] = { "Usage: mytool [OPTION]...", "Options:", // ... }; int myfunc(int x) { const int maxval = 30; static const int primes[] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}; // ... }

The goto statement

Use the goto statement when it will avoid repetition of error-handling code. Never use goto for any other purpose.

Rationale: Although "considered harmful" by some—and indeed, injudicious use of goto can greatly impair code maintainability—the goto statement is useful in consolidating error-handling logic in a single location, and it should be used in preference to repeating the same cleanup code over and over.

Examples:

Object *create_object(void) { Object *object; if (!(object = malloc(sizeof(*object)))) { goto error_return; } if (!init_object(object)) { goto error_free_object; } if (!register_object(object)) { goto error_deinit_object; } return object; error_deinit_object: deinit_object(object); error_free_object: free(object); error_return: return NULL; }

Object *create_object(void) { Object *object; if (!(object = malloc(sizeof(*object)))) { return NULL; // This is okay... } if (!init_object(object)) { free(object); return NULL; // ... but now we're starting to repeat ourselves. } if (!register_object(object)) { deinit_object(object); /* Oops, forgot to add the free(object) call! */ return NULL; } return object; }

Assertions

Assert early and often, but only for impossible conditions. Prefer static assertions to runtime assertions. Include fallback actions for runtime assertion failure where feasible, but avoid complex fallback actions.

SIL provides (in base.h) three macros which can be used for checking assertions: ASSERT(), PRECOND(), and STATIC_ASSERT(). At present, the first two macros are essentially identical, but PRECOND() is intended for checking function preconditions, and further use of the macro may be made for that purpose in the future, so prefer PRECOND() over ASSERT() when checking a function argument against a precondition; use ASSERT() for all other cases. STATIC_ASSERT() is for the specific case of assertions which can be evaluated at compile time, such as checking the size of a structure against an expected value.

Rationale (assert early and often): Programmers are only human, and errors will creep into any nontrivial code. Assertions provide a way to check for errors at compile time or runtime before those errors cause crashes, data corruption, or other serious problems.

Rationale (only impossible conditions): By writing an assertion, you are declaring (asserting) that the asserted expression must be true under every possible condition. If there is any possible condition under which the expression might be false, no matter how unlikely, do not use an assertion. For example, never assert that a memory allocation has succeeded, because there's always the possibility that the program will run out of memory (or address space) and the allocation will fail. In such cases, always implement and test proper error handling.

Exceptions (only impossible conditions): You do not need to consider hardware errors such as memory or register corruption when deciding whether a condition is possible; for example, if an interface function checks the value of an argument, its helper functions do not also need to make the same check. You may also assume that system calls and other external functions behave according to their documentation—you may still need to work around bugs in such functions, but you don't need to assume the presence of such bugs until they make themselves known. (Conversely, don't omit a check for a documented failure condition just because the current implementation of the function doesn't generate that condition; or if you do omit it because handling the failure would be complex and difficult to test, make sure to clearly document that fact.)

Rationale (prefer static assertions): Static assertions are checked during the build process and will cause compilation to fail if the assertion does not hold. Since these do not rely on executing a specific code path to test the assertion, you should always use them when possible.

Rationale (include fallback actions): The ASSERT() and PRECOND() macros accept an optional fallback action, which is a statement (or multiple statements) that will be executed if the assertion fails and the program is not running in debug mode. While there is a school of thought which argues that the program should always abort on assertion failure because the internal state has left the designed bounds and further behavior cannot be predicted—and indeed, hard failure can be more appropriate than graceful fallback in security-critical situations—it is often feasible to perform some sort of recovery short of terminating the entire program. For example, if a function which expects a valid file handle receives a null pointer, the function can simply return an error state as it would for an actual error with a valid file handle. This may result in the program terminating itself with an error message, but even that is more user-friendly than simply crashing. (STATIC_ASSERT() does not include a fallback action parameter because the assertion is checked at compile time, so no fallback is necessary. Instead, the macro's second argument is an error message which will be displayed if the compiler supports C11-style static assertions, and which serves as documentation of the assertion in any case.)

Rationale (avoid complex fallback actions): By their very nature, fallback actions for assertions cannot be tested like other code, since (assuming the program does not have any relevant bugs) the asserted condition will never fail, and if it did, the test (which runs in debug mode) would terminate anyway. For this reason, fallback actions should be extremely simple; often, a single return statement is sufficient. In cases where there is no simple way to recover from an assertion failure, prefer to omit the fallback action entirely, especially if there are no serious consequences from the failure.

Examples:

Object *create_object(void) { Object *object = mem_alloc(sizeof(*object), 0, 0); if (!object) { return NULL; // There would also be a test case for this failure. } init_object(object); ASSERT(object->state == NEW, return NULL); // Note that we don't even bother freeing the memory. return object; } int count_subobjects(Object *object) { PRECOND(object != NULL, return 0); // This is a function precondition. return list_length(object->subobjects); } struct ObjectData { // ... }; STATIC_ASSERT(sizeof(struct ObjectData) == 128, "ObjectData does not match file record size");

Object *create_object(void) { Object *object = mem_alloc(sizeof(*object), 0, 0); ASSERT(object != NULL); // This could legimitately fail. init_object(object); ASSERT(object->state == NEW, list_append(broken_objects, object); // Complex and difficult to test. return NULL); return object; }

Macros

Use judiciously and with care.

Macros are permitted when they serve a purpose which is difficult or impossible to accomplish otherwise. However, be especially careful of unintended side effects when writing a macro (see the examples below).

Note that many function-like uses of macros—specifically, those which do not include control statements like return that escape the scope of the macro and whose parameters take specific types—can be replaced with static inline functions at no cost to performance. Doing so both avoids the potential problems of macros and allows the compiler to perform its usual type-checking.

Rationale: Preprocessor macros are a powerful metaprogramming tool, but that power can easily hurt readability. Since macros are expanded before the source code is parsed, it's easy to write a macro that has unintended consequences, and it can be difficult to figure out exactly what those consequences were.

Examples:

/* Use parentheses around negative numeric values and expressions. */ #define INVALID_VALUE (-1) #define DEGREES_TO_RADIANS (PI / 180.0) /* Use parentheses around macro parameters, since otherwise they can be * expanded into something completely different from what you intended. */ #define DOUBLE(x) ((x) * 2) /* When possible, use an inline function (or several) instead of defining * a macro. */ static inline float sqrf(float x) {return x*x;} static inline double sqr(double x) {return x*x;} /* Bracket multiple statements or control structures with do {...} while (0) * so they do not break a surrounding if/for/while. */ #define ABORT_IF_NEGATIVE(x) do { \ if ((x) < 0) {return -1;} \ } while (0) /* Macros can be useful to report the source code location of an error (the * DLOG() macro provided by SIL does this as well). */ #define LOG_ERROR(str) log_error("%s:%d: %s", __FILE__, __LINE__, (str))

/* This definition will cause BAD_DOUBLE(x<y ? x : y) to only double y, * not x. */ #define BAD_DOUBLE(x) (x * 2) /* This definition would give an unexpected result if used as an operand * in a multiplication expression, for example. */ #define BAD_INCREMENT(x) (x) + 1 /* While this example is properly parenthesized, the two uses of "x" will * cause any side effects in the actual parameter to be evaluated twice. * Inline functions are a better choice here. */ #define SQR(x) ((x) * (x)) /* This definition will cause the structure of the function to change when * BAD_ABORT_IF_NEGATIVE(x) is expanded. (This case is also an example of * why omitting braces on if statements is a bad idea.) */ #define BAD_ABORT_IF_NEGATIVE(x) if ((x) < 0) {return -1;} float myfunc(float x) { if (x <= 0) BAD_ABORT_IF_NEGATIVE(x); else x += sqrtf(x); return x; }

#include directives

Order headers alphabetically, using full pathnames for each header. Include system headers after SIL headers.

When including SIL headers in a source file, order the headers componentwise alphabetically by full pathname, excluding the .h filename extension. As a corollary, each header should declare all external types it references, except for those defined in src/base.h (which will always be included first).

If you need to include any system headers, list them after all SIL headers. It may be useful to further subdivide these into standard system headers and headers for specific system libraries.

Rationale (full pathnames): Using the full pathname of a header tells the reader immediately where the header is located; a relative pathname would force the reader to check the location of the source file and manually resolve the relative path. Additionally, if a system header happens to have the same name as a header you create, the compiler may include the system header instead of yours if you give only the filename in the #include directive.

Rationale (SIL headers first): Including all SIL headers before any system or other external headers ensures that SIL headers do not attempt to make use of types or other declarations from those external headers. For example, including <stdio.h> at the top of a source file would mask compiler errors from uses of the FILE type in any subsequently included SIL headers.

Exception: Use relative pathnames instead of full pathnames for nested includes in public headers, to avoid requiring particular compiler flags for client code.

Example:

#include "src/base.h" #include "src/foo.h" #include "src/foo/quux.h" // "foo/..." comes before "foo-". #include "src/foo-bar.h" #include <errno.h> #include <stdio.h> #include <FooLibrary/FooBase.h> #include <FooLibrary/FooExtras.h>

Nested #includes

When declaring external types, use a forward declaration instead of a nested #include when possible.

Rationale: Including a header file inside another header file just to get the declaration of a structured type forces all users of the header to pay the cost of loading the nested header. Instead, when possible, use forward declarations of struct and union types. (This generally means you'll need to use "struct type" or "union type" instead of just the type name in function declarations.) Since C++ doesn't allow forward declarations of enums, headers which reference enum types and which may be included from C++ code (for SIL, this means all public headers) will have to use nested includes for such types.

Structure tags and typedefs

When defining a struct, union, or enum type, include a typedef which defines the tag name as a type.

Typically, the typedef should precede the definition of the structured type itself, so the type name can be used within the definition (such as when defining a "next" pointer for a list). However, C++ does not allow referencing an enum before it has been defined, so in that case, the typedef must follow the enum (or the enum must be defined within the typedef statement).

Rationale: In C++, all tags for structured types (including class, struct, union, and enum) are automatically defined as type names, but in C, an explicit typedef is required for each type. Since C++ will not complain about such typedef statements, they should be included for all structured types visible to C code.

Branch hints

Use UNLIKELY() to indicate tests for error conditions. Otherwise, use branch hints sparingly and only after analyzing performance.

Rationale: As noted in the GCC documentation for the __builtin_expect() intrinsic used to implement branch hints, "programmers are notoriously bad at predicting how their programs actually perform", and what seems like an "obvious" optimization may in fact hurt performance due to things like unexpected calling patterns or CPU idiosyncrasies. However, when checking for errors from system or low-level functions (such as when allocating memory) or verifying function parameters, marking a comparison with UNLIKELY() serves as documentation that the comparison is testing for an exceptional condition without requiring the reader to understand the details of the comparison, while also providing slightly better performance on the non-failing code path.

Character encoding

All source files are encoded in UTF-8, but avoid non-ASCII characters when possible.

Rationale: UTF-8 is currently the de facto standard for text encoding, and full Unicode (including L'...' character values) is supported by at least GCC and recent versions of Clang. However, support is by no means universal, so try to avoid characters outside the ASCII range in non-comment source code, and test extensively if you do use them.

Line length

Lines should not be longer than 79 columns.

Rationale: 80 columns has proven to be a good balance between avoiding unnecessary wrapping and keeping the text narrow enough to scan easily (that is, without forcing the eyes to move back and forth on each line). 80 columns is also a fairly standard width for terminal programs and editors. However, some such programs have troubles with lines that are exactly 80 columns long (for example, Emacs will wrap the 80th character to the next line when using an 80-column display); for this reason, lines should be kept to 79 characters when possible.

Exceptions:

If the 80th character on a line is trailing punctuation, such as a semicolon or opening brace, it's okay to leave it on that line (but it's still better to wrap if that's reasonable).
For cases where there is a strong correlation between physical lines and their contents, such as a sequence of strings containing lines of text which will be written consecutively to the screen, prefer keeping the correlation even if it results in lines longer than 80 columns.

Indentation

The basic indentation unit is four columns. Outdent half a unit (two columns) for labels, including case labels in a switch block. Indent one unit or to the opening parenthesis for continued lines, except when aligning related subexpressions.

The SIL source tree includes an Emacs directory-local variable list (.dir-locals.el) which causes the Emacs editor to use the proper indentation settings.

Rationale: Four columns is enough to clearly indicate the nesting depth at a glance, without being so wide that it pushes reasonably nested code off the edge of the screen. (Corollary: If code is indented so much that the line length limit becomes a problem, the nesting level is too deep.) Four columns is also divisible by two to provide an intermediate indentation for labels.

Examples:

int func(int x) { if (x < 0) { goto error; } return x * (x+1); error: return -1; } float my_function_with_a_long_name(int my_first_parameter, float my_second_parameter) { if (my_first_parameter > 0) { return my_function_with_a_long_name( my_first_parameter - 1, sqrtf(my_second_parameter)); } else { return my_second_parameter; } }

Tabs vs. spaces

Always use spaces, never tabs.

Rationale: There is little consensus between editor programs on the width of a tab stop; thus, to properly read code indented with tabs, the reader of the code must make a special effort to configure their software properly. It's far preferable for the (single or few) writers of source code to make the effort to use spaces rather than force the (many) readers to change their editor settings for each program's source code they view.

One statement per line

Only one statement or label is allowed on a line.

Rationale: It can be easy to overlook extra statements on the same line, especially when they are infrequent.

Exception: If all cases in a switch will fit on one line each and contain no more than one statement (excluding break, return, or goto), the statements may be moved to the same lines as their respective case labels. In this case, do not outdent the case labels.

Examples:

x /= 2; // Good. y /= 2; switch (value) { case 10: return 2; // Note indentation. case 20: return 5; default: return 0; } int myfunc(int input) { int result; switch (input) { case -1: /* This could potentially all fit on one line, but that would violate * the one-statement-per-line rule since there are two non-label, * non-"break" statements for this case. */ errors++; result = -1; break; default: result = input+1; break; } return result; }

x /= 2; y /= 2; // Bad. int myfunc(int input) { // ... return 1; error: return 0; // Put the return on a separate line. }

Related subexpressions

Line up related subexpressions when feasible.

Rationale: If an expression consists of several related subexpressions, it can be easier to read when those subexpressions are lined up on separate lines, so the reader can skim down and easily spot the differences. In some cases, it can be useful to insert null operations (such as a shift by zero) for parallelism.

Examples:

pixel = (red << 0) | (green << 8) | (blue << 16) | (alpha << 24); if (x >= 0 && x < width && y >= 0 && y < height && z >= 0 && z < depth) { // ... }

Whitespace around operators

Put one space around binary operators except "->" and ".", and no spaces between a unary operator and its operand.

Rationale: Whitespace improves readability when used in moderation. Omitting whitespace around member reference operators and unary operators emphasizes their tighter binding.

Exceptions:

In complex expressions with multiple operators, it's okay to omit whitespace in inner terms as long as the overall expression remains readable. In some cases, it may be more readable to break the expression across multiple lines or use extra variables to hold intermediate values.
It's also permissible to omit whitespace in extremely simple expressions, such as "x+1", but prefer to include the whitespace if doing so doesn't detract from readability.

Examples:

i = j * 2; // Or "i = j*2". object->refcount++; result += i*60 + (j+59)/60; /* Sometimes a unary minus can be hard to see next to a long variable name. * In that case, add parentheses around the operand rather than inserting * a space. */ value = -(long_named_variable);

x=1; // Never omit spaces around an assignment operator. value = - long_named_variable; // Never insert a space after a unary operator.

Whitespace in function and macro calls

Add spaces after commas; don't add spaces around parentheses.

Exception: It's okay to omit spaces after commas in nested function calls, as long as doing so doesn't hurt readabililty.

Examples:

function(param1, param2); MACRO(param1, strchr(string,'/'), param3);

Parentheses in expressions

Use enough parentheses to make the expression easy to read at a glance. Always use parentheses to separate operators with confusing precedence.

Rationale: Some combinations of operators are particularly susceptible to precedence errors:

The logical operator && has a higher precedence than ||, but especially since the operands are often long subexpressions, it can be easy to lose track of precedence without parentheses.
The bitwise operators (& | ^) have a lower precedence than the relational operators (== != < <= > >=). For example, expressions which extract and test a bitfield within a value need parentheses around the extraction subexpression.
Similarly, the bit shift operators have a lower precedence than the arithmetic operators.

The compiler will generally emit a warning if parentheses are missing in any of the cases listed above.

Examples:

if ((x < 0 || x > width) && !allow_out_of_bounds) { // ... } if ((pixel & 0x001F) == 0x1F) { red_is_maximum = 1; } component_sum = (pixel & 0x001F) + ((pixel & 0x03E0) >> 5) + ((pixel & 0x7C00) >> 10);

Parentheses with defined, sizeof, and return

Treat sizeof and defined like functions; don't use unnecessary parentheses around return values.

Rationale: While C does not require parentheses around the arguments to defined or (when the argument is a variable) sizeof, those keywords act like functions in that they return values^*, so uses of those keywords should be styled like function calls. return, on the other hand, does not behave like a function (it doesn't generate a value, and you couldn't put it on the right side of an assignment operator), so it shouldn't be styled like one.
^*Technically, defined doesn't "return a value" since it's not recognized by the compiler at all, but the preprocessor translates it into a boolean value, so it's the same sort of beast.

Examples:

#if defined(DEBUG) && defined(__GNUC__) /* GCC magic goes here. */ #endif int data_size(DataStruct *data, int num_data) { return sizeof(*data) * num_data; } int is_transparent_pixel(uint16_t pixel) { /* Parentheses around a return value are acceptable when they help * readability. Here, the outer parentheses help ensure that a reader * skimming the code does not miss the "== 0" at the end of the * expression. Note the space before the opening parenthesis. */ return ((pixel & 0x8000) == 0); }

int bad_func(struct foo *ptr, int count) { int size = count * sizeof *ptr; // Looks like "count times sizeof times ptr". size += count * sizeof int; // This doesn't even compile. return(size); // return is not a function! }

Conditional and loop statements

Always use braces; insert a space before the opening parenthesis; add spaces after semicolons in a for statement; put the opening brace on the same line as the closing parenthesis.

Rationale (mandatory braces): Failing to use braces with control statements can easily lead to bugs, such as when attempting to add a second statement to an if without a block.

Exception (opening brace): The opening brace can be moved to the next line if it doesn't fit on the same line, or to avoid confusion between a continued line of the control statement and the first line of the nested block when the two lines have similar indentation (see the second for example below).

If a block is long, it can be useful to annotate the closing brace with the control statement that began the block (see the while example below).

Examples:

if (flag1) { // ... } else if (flag2) { // Treat "else if" as a single keyword. // ... } else { // ... } while (alive()) { // ... } // while (alive()) for (i = 0; i < count; i++) { // ... } /* Here, the opening brace is moved to its own line to avoid confusion between * the continuation of the "for" statement and the first line of the block. */ for (i = 0, x = 1, y = 1; i < count; i++, x *= 2, y *= 2) { // ... }

switch statements

Braces around case blocks are optional. Use the FALLTHROUGH macro (from <SIL/base.h>) if a case falls through to the next case.

Rationale (optional braces): Unlike other control statements, the use of braces in switch statements has no effect on control flow. In general, use braces when you need to define variables local to that case.

Rationale (use of FALLTHROUGH): It can be hard to tell at a glance whether a missing break statement is intentional or not. Documentation helps reassure the reader of the intended behavior, suppresses "code falls through" warnings from modern compilers, and also avoids the risk that someone (maybe even you) will accidentally insert a break during a code cleanup session.

Examples:

switch (value) { case 1: // ... FALLTHROUGH; // Document that the missing "break" is intentional. case 2: // ... break; case 3: { int local_var; // Local to this case only. // ... break; } // case 3 default: // ... break; } // switch (value)

Function pointers

Explicitly dereference function pointers when calling through them.

Rationale: Using an explicit dereference operation makes it clear to the reader that the thing being called is a function pointer and not an actual function.

Exception: Function pointers accessed through a structure do not need to be explicitly dereferenced if they are used like C++ instance methods.

Examples:

(*function_ptr)(args); object->method(object, args); // Where "method" is a function pointer.

function_ptr(args);

Function declarations and definitions

Use an explicit void with functions that take no parameters. The opening brace for a function body goes on the following line.

Rationale (explicit void): In C (as opposed to C++), an empty parameter list means that the function's parameters are unspecified. This prevents the compiler from checking the number and types of parameters at call sites, so functions which take no parameters should have an explicit void to indicate that fact to the compiler.

Rationale (opening brace on following line): Putting the brace on its own line gives an additional visual indication that the brace starts a new function.

Exception: If the function is both very short (1-2 lines) and defined with static linkage, it is acceptable to put the definition's opening brace on the same line as the function declaration. If the function body fits on the same line as the declaration, the entire function may be written on one line.

Examples:

void end_program(void) // Not "void end_program()". { // ... } /* Concise format for a short function. */ static inline float sqrf(float x) {return x*x;}

Order of function definitions

Within each source file, order function definitions either top-down (interface functions first, followed by local functions) or bottom-up (local functions first, interface functions last). In the top-down format, declare local functions in the same order they are defined.

Rationale: Using a consistent order for function definitions helps readers follow the program structure.

Exception: Short local functions which are only used in one place, such as a callback function whose pointer is passed to an external API, may be defined immediately before the function that uses them.

SIL code generally prefers the top-down style, but since C requires local (static) functions to be declared before they are used, each local function must be declared and defined at separate locations in the same file, which can get somewhat repetitive. Test sources in particular tend to put "helper" functions at the top, followed by the actual test cases (the "interface" functions by this rule), thus obviating the need for separate declarations.

Comments

Use // for inline comments, separated from the code by at least two spaces; prefer /*...*/ for block comments. Mark known problems or shortcomings with "FIXME".

When using inline comments on multiple consecutive or nearby lines, align the starting columns of the comments as long as doing so doesn't insert an inordinate amount of space between the code and the comment.

Examples:

/* Short explanation of the upcoming code. */ if (need_check) { // FIXME: See if we can get away with just 0-9. int start = 0; // Description of variable. int end = 99; // Description of variable (aligned with above comment). // ... }

MyLongStructureTypeName *structure; // Description of variable. int result; // Too much space here. int i; // Wait, which line does this go with?

Function headers

Insert a block comment of the appropriate format above each function declaration. Do not copy the comment to the function definition.

Rationale (standard header format): Including a header of a fixed format allows readers to quickly determine the purpose of and calling pattern for a function without having to read the function's code.

Rationale (include at declaration): Keeping the header with the declaration allows readers to easily browse the list of declared functions, such as in a header file, without having to scan over the implementing code for each function.

Rationale (do not include at definition): Experience has shown that keeping two copies of the function header will quickly lead to documentation desyncs as function signatures change. This naturally does not apply to static functions which do not have a separate declaration (see order of function definitions above).

Exception: The full header may be omitted for short local functions which have no separate declaration and are defined close to their place of use, provided that a short comment describing the purpose of the function is included instead.

See the source code for details of the function header format. Note that the SIL function header format deliberately does not use markup tokens for tools like Doxygen; the headers are intended to be easily perusable by someone looking directly at the source code, without requiring separate tools to interpret the comments.

General naming rules

Names should be descriptive. Don't abbreviate unnecessarily, but avoid excessive verbosity.

In general, an identifier's name should immediately tell the reader the purpose of the identifier, but it should be concise enough that its length does not obscure the structure of the code. For example, LIMIT would be a poor name for a global constant; the name tells us nothing about what sort of limit it is. But the same LIMIT might make perfect sense in a short function whose sole purpose was to bound its parameter to be less than a certain value, and indeed a longer name would serve no purpose except to clutter up the code.

Similarly, a file's name should make the file's purpose clear to someone looking at a directory listing, but should not be so long as to clutter log messages (which include the name of the source file which generated the message). In the case of filenames, it's acceptable to include the directory path when determining whether a filename is "clear"; for example, resource/core.c clearly refers to core functionality for resource management, and does not need to be expanded to resource/resource-core.c.

Avoid overusing abbreviations, since they can reduce readability by forcing the reader to stop and mentally expand the abbreviation each time the identifier is used. For example, in a function that uses a variable to hold a count of objects, nobj would be a poor name for the variable since its meaning is not immediately obvious to a reader unfamiliar with the code. num_obj would be better, but unless the variable is heavily used throughout the function, num_objects is more friendly to the reader. However, number_of_objects would be unnecessarily verbose, since a num_ prefix is generally understood to mean "number of".

Single-letter and similarly short variable names should be avoided except in cases where their meaning is obvious and generally accepted. For example, i is widely accepted as an iterator variable and may be used in that context, but it should not be used for a temporary variable, even in a limited scope. Similarly, short names for types or functions are acceptable when they are clearly derived from similar names in the standard libraries.

Examples:

#define FRAME_RATE_MIN 20 // Clear and concise. #define PI 3.1415926535897932 // Short but well-known; fine even at global scope. static inline double sqr(double x) { // Follows the pattern of sqrt(), so okay. return x*x; } void collide_objects_with_player(void) // Clear and reasonably concise. { const int player_x = player->position.x; // Not just "x" or "y". const int player_y = player->position.y; for (int i = 0; i < num_objects; i++) { // "i" is well-known as an iterator. const int object_x = objects[i].position.x; const int object_y = objects[i].position.y; if (player_x == object_x && player_y == object_y) { // ... } } }

#define FRATE 20 // Not obvious what it's used for. #define MINIMUM_NUMBER_OF_FRAMES_PER_SECOND 20 // Way too verbose. void doobjs(void) // Not obvious what the function does. { /* If these were the only coordinates used in the function, "x" and "y" * might be okay, but here they reduce readability because the function * uses two sets of coordinates. */ const int x = player->position.x; const int y = player->position.y; /* "o" is not recognized as an iterator, so unlike "i" it's unacceptable. */ for (int o = 0; o < num_objects; o++) { /* Again, it's unclear from the names what these coordinates are. */ const int x2 = objects[o].position.x; const int y2 = objects[o].position.y; if (x == x2 && y == y2) { // ... } // ... } }

Function naming rules

Name functions with a part of speech appropriate to the function's behavior. Use prefixes or (for C++ code) namespaces to indicate the conceptual hierarchy of a function.

Functions can be broken down into three major groups:

Functions which perform an action: for example, resource allocation or graphics rendering. These should be named using verbs or verb phrases.
Functions which return data: for example, property accessors. These should be named using nouns or noun phrases. If the noun or noun phrase alone is ambiguous, it may be prefixed with "get_".
Functions which evaluate data: for example, comparison functions. These should be named using adjectives or adjectival phrases.

If a function fits in two or more groups, name it based on the one which best represents the function's overall behavior—though that may also be a sign that the function is too complex and should be refactored into multiple smaller functions (also see below regarding function complexity).

Rationale (part of speech): Using an appropriate part of speech gives the reader a useful hint as to the function's behavior without forcing the reader to check the function's documentation. For example, an accessor function named using a noun phrase reassures the reader that the function does not modify an object passed to it.

Rationale (prefixes): Prefixes or namespaces provide the reader with information about the type of data processed or operation performed by the function, again reducing the need for the reader to consult the function's documentation.

Examples:

/* A graphics-related function which reads pixel data from the display. */ void graphics_read_pixels(int x, int y, int w, int h, void *buffer); /* A function which returns the width and height of a texture. Note that * this is treated as an accessor function even though it is not, strictly * speaking, a pure function (since it writes through the return pointers * passed as the second and third arguments); conceptually, it is a pure * function which returns two values, and it writes through pointers only * because the C language does not allow returning multiple values from a * function. */ void texture_size(int texture, int *width_ret, int *height_ret); /* A function which checks and returns whether a work queue is busy. */ int workqueue_is_busy(int workqueue);

/* Even as a local function name, the lack of a prefix makes it unclear at * call sites what type of data the function operates on. Also, the use of * the verb "read" might make readers wonder whether the function performs * any expensive processing to return its value, as opposed to simply * retrieving a property value. */ static int read_width(int texture);

Filename formatting rules

Use accepted filename extensions. Give each file in a directory a unique name exclusive of the extension. Avoid non-alphanumeric characters other than hyphen (-) and underscore (_).

Use the filename extensions listed below for each source file type:

C source: .c
C++ source: .cc
Objective-C source: .m
Objective-C++ source: .mm
Assembly: .s (lowercase s)
Assembly with C preprocessor directives: .S (capital S)

Rationale (filename extensions): While not strictly required on modern operating systems, the filename extension is an accepted way to inform the user of the type of content in the file. Some programs (including compilers and IDEs) also use the file extension to guess the file's content type, and using a nonstandard extension would confuse such programs to the detriment of the user.

Rationale (unique names): All source files are compiled to object files with the same filename extension (typically .o). If two source files in the same directory have the same name but a different extension, their object files would collide, breaking the build. If it is necessary to have two source files in different languages with the same purpose (for example, when implementing a C++ interface to C functions), use the base filename for the source file with the most nontrivial code, and rename other files to avoid object file collision. For example: utilities.c, utilities-cxx.cc, utilities-objc.m

Rationale (non-alphanumeric characters): Non-alphanumeric characters may have special meanings to some systems, preventing files whose names contain those characters from being used properly (or at all!) on such systems. For example, quote characters are used on many systems to enclose filenames containing spaces; conversely, spaces are used on most systems to separate command arguments, and including a space in a filename can cause builds to break in unexpected ways. The only symbols accepted as safe across all systems are the hyphen and underscore. Non-ASCII characters should also be avoided because some users' systems may not be able to display them properly.

Note that due to limitations of the POSIX-style library archive (.a) format, code intended to be compiled into a static library on POSIX systems must also ensure that no two files which will be included in the library have the same name exclusive of directory name; thus, dir1/file.o and dir2/file.o cannot be included in the same static library. SIL is not designed to be compiled into a static library and thus does not follow this rule.

Identifier formatting rules

Use MixedCase for type names, UPPER_CASE_WITH_UNDERSCORES for constants and preprocessor macros, and lower_case_with_underscores for functions, variables, and structure members.

Examples:

#define PI 3.1415926535897932 enum OperationStatus { STATUS_OK = 1, STATUS_FAILED = 2, }; typedef enum OperationStatus OperationStatus; // See note on typedefs above. typedef struct OperationRecord OperationRecord; struct OperationRecord { OperationStatus status; void *private; }; OperationStatus run_operation(OperationRecord *operation) { const float timeout = 0.1; // "TIMEOUT" would also be acceptable. operation->status = internal_run_operation(operation->private, timeout); if (operation->status != STATUS_OK) { static int did_warn = 0; if (!did_warn) { DLOG("Operation failed!"); did_warn = 1; } } return operation->status; }

Use of C++ keywords as C identifiers

C++ keywords may be used as identifier names in C code as long as they are appropriate, but take care when such identifiers appear in header files.

C++ reserves a number of keywords which can also be reasonably used as identifier names; for example, try could be a counter for an operation which may need to be retried several times, and this can be used as an object pointer when implementing instance-method-like functions in C. As long as the names are appropriate for the uses to which they are put, they may be freely used in C code.

However, care is needed when such identifiers appear in header files, such as when used as structure field names. In this case, renaming the identifier is usually best, but if the identifier does not need to be referenced by C++ code (for example, if it is a parameter name in a function declaration), it is also permissible to bracket the header with a #define/#undef pair:

#ifdef __cplusplus # define private private_ // Avoid errors when included from C++ source. #endif // ... #ifdef __cplusplus # undef private // Restore the normal meaning of "private" for subsequent C++ code. #endif

Use of comments

Use comments to summarize the purpose of a block of code or to explain the rationale behind or implications of a particular piece of logic. Do not use comments to simply rephrase the code in native language.

Rationale: Comments should provide additional information (in other words, "commentary") to the reader. Assume for this purpose that the reader understands the programming language better than you do, so any comments that simply state what the code is doing are useless. Instead, comments should explain why the code is written the way it is, to provide insight to a reader who does not understand the problem you are trying to solve. As a corollary, if you do feel the need to explain what the code is doing, it probably means the code itself is poorly written and should be fixed.

Notwithstanding the above, it can be useful to prepend comments to longer blocks of code which summarize the logic contained in the block. While these would ordinarily be discouraged as "what" comments (as opposed to "why" comments), they can help the reader quickly skim through larger functions without having to read the entire function, much like subheadings in a technical document. Other code styles recommend appropriately named subroutines for this purpose, but SIL style prefers not to extract blocks of logic unique to a particular function (see below regarding function complexity).

Examples:

/* This provides additional information to the reader explaining why the * assertion is safe, without requiring the reader to have memorized the * entire program structure. */ ASSERT(blocksize > 0); // Or it would have been handled by the caller. blocks = size / blocksize; /* This computation is sufficiently obscure that a comment helps the * reader understand its purpose. */ if ((size & (size - 1)) != 0) { return 0; // Reject sizes which are not powers of two. } /* This comment lets the reader quickly understand what the loop is doing * without having to read through the entire block of code. */ /* Find a block which matches our requirements. */ Block *block = NULL; for (Block *i = block_list; !block && i; i = i->next) { // (lots of code here) }

/* The meaning of this line is perfectly clear without the comment. */ counter++; // Increment counter. /* This comment would be unnecessary if proper variable names were used. */ px += t*s/w; // Add distance moved in tiles to player position.

Function complexity

Each function should have exactly one purpose. Extract repeated code into separate functions, but do not break up a function solely because of length.

Rationale: At a high level, functions can be considered the basic building blocks of programs. Accordingly, each function should do exactly one thing, and do it well; a reader should be able to look at a high-level algorithm and tell what it does just by reading the names of the functions it calls (see also the function naming rules). In particular, a function should never have side effects which are not obvious from the function's name.

Functions also enforce encapsulation of data, reducing the risk of unintended interactions between separate blocks of logic and increasing reusability of the code. Any algorithm which is repeated in more than one function (or more than once in the same function) is a good target for extraction into a separate function.

On the flip side, breaking a function up into several subfunctions forces a reader who wants to follow the code flow to jump back and forth between different places in the source code, which can lead to cognitive overload. For this reason, SIL style does not enforce a maximum size on functions, and instead prefers to keep code sequences unique to a particular function within that function, particularly when those sequences are short relative to the function as a whole. In cases where this results in a particularly large function, short summary comments above each block of code can help readers skimming the code to understand the logic more quickly (see above regarding use of comments).

Algorithm complexity

Use an algorithm of the appropriate computational complexity, but prefer simple-and-slow code to fast-and-complicated code.

While you should try not to introduce unnecessary computational complexity (for example, using a cubic-time algorithm when a quadratic-time algorithm is available), neither should you take "shortcuts" or "clever hacks" to cut down on execution time unless you have hard data demonstrating that such optimizations are of significant benefit to the program (or library) as a whole.

Rationale: This rule could also be phrased as, "Premature optimization is the root of all evil." The history of software development is littered with cases of programmers expending effort on optimizing routines which make no significant contribution to execution time in the first place—and introducing new, hard-to-find bugs as a result of their supposedly "clever" optimizations. Don't repeat their mistakes.

Examples:

char *make_lowercase(char *str) { /* Save the length so we don't recompute it on every loop iteration * (which would take linear time per iteration, making this a * quadratic-time function). */ const size_t len = strlen(str); for (size_t i = 0; i < len; i++) { if (str[i] >= 'A' && str[i] <= 'Z') { str[i] += 'a' - 'A'; } } return str; }

char *make_lowercase(char *str) { char *old_str = str; /* This "optimization" attempts to reduce the number of memory loads * required, but unless the sole purpose of the program is to convert * strings to lowercase, this is utterly unnecessary. Setting aside * the questions of whether it actually is faster and whether the * documented assumption below is valid, this code has two subtle * bugs. One, noted below, is the result of copy-and-pasting code * (this was in fact a mistake made while writing this very example); * it will cause every fourth byte to be improperly changed from [\]^_ * to {|}~<DEL>, and similarly for bytes in the ranges 0x80-0x9F and * 0xC0-0xDF. The other is more deadly: if the input string pointer * is not aligned to a multiple of 4 bytes, the program will crash on * CPUs which require aligned addresses for 32-bit integer loads. */ const uint32_t char0_mask = 0xFF000000; const uint32_t char1_mask = 0x00FF0000; const uint32_t char2_mask = 0x0000FF00; const uint32_t char3_mask = 0x000000FF; uint32_t bits; while (bits = *((uint32_t *)str), // Assume we can overrun 1-3 bytes. (bits & char0_mask) && (bits & char1_mask) && (bits & char2_mask) && (bits & char3_mask)) { if ((bits & char0_mask) >= (uint32_t)'A'<<24 && (bits & char0_mask) <= (uint32_t)'Z'<<24) { bits |= 0x20000000; } if ((bits & char1_mask) >= (uint32_t)'A'<<16 && (bits & char1_mask) <= (uint32_t)'Z'<<24) { // Oops, wrong shift! bits |= 0x00200000; } if ((bits & char2_mask) >= (uint32_t)'A'<<8 && (bits & char2_mask) <= (uint32_t)'Z'<<8) { bits |= 0x00002000; } if ((bits & char3_mask) >= (uint32_t)'A' && (bits & char3_mask) <= (uint32_t)'Z') { bits |= 0x00000020; } *((uint32_t *)str) = bits; str += 4; } while (*str) { if (*str >= 'A' && *str <= 'Z') { *str |= 0x20; } str++; } return old_str; }

SIL Coding Style Guidelines

1. Language features

2. Formatting

3. Naming

4. Other