SIL Coding Style Guidelines

1. Language features
Language standard
Type size assumptions
Integer types
Signed vs. unsigned integers
Boolean values
Floating-point types
Floating-point literals
Zero values
Comparing against zero
Side effects in conditional expressions
Bitwise operators
The const keyword
Constants
The goto statement
Assertions
Macros
#include directives
Nested #includes
Structure tags and typedefs
2. Formatting
Character encoding
Line length
Indentation
Tabs vs. spaces
One statement per line
Related subexpressions
Whitespace around operators
Whitespace in function and macro calls
Parentheses in expressions
Parentheses with defined, sizeof, and return
Conditional and loop statements
switch statements
Function declarations
Function pointers
Comments
3. Naming
General naming rules
Filename formatting rules
Identifier formatting rules
Use of C++ keywords as C identifiers
4. Other
Algorithm complexity

The following guidelines were observed when writing the SIL source code. Note, however, that they were treated as guidelines, not rules, with the cardinal rule being "readability first".

/* Green boxes show examples of code that follows the style guidelines. * Comments in light italics (like this one) explain details of the example * and are not intended to be read as part of the example code itself. */ return 1; // This comment is part of the example code.
/* Red boxes show examples of code that violates the style guidelines. */ return(0);

1. Language features

Language standard
Use C99 (but avoid complex number types and variable-length arrays) and C++11. Anonymous unions are allowed, but otherwise avoid compiler-specific features.

Rationale: All modern compilers support C99 and C++11, with occasional minor exceptions that generally do not impact SIL. Modern compilers also support anonymous unions; this feature is nonstandard in C99, but it has been added to the C11 standard, so such unions will be forward-compatible. On the flip side, the C11 standard makes C99's complex number types (the <complex.h> header) and variable-length arrays optional, so those features should be avoided.

Type size assumptions
Assume that int is at least 32 bits wide and that pointer values are at least as wide as int. Make no other assumptions beyond what is required by the relevant language standard.

Rationale (int): In all modern PC-class environments, the int type is at least 32 bits wide. (There are some embedded processor environments which use 16-bit ints to better match the hardware's capabilities, but such environments are generally not suited to running SIL-based programs.) Modern programs frequently need to work with data larger than a 16-bit integer can hold; requiring all code to either check for 16-bit overflow or explicitly use a 32-bit type would significantly increase the risk of bugs.

Rationale (pointers): In all modern PC-class environments, pointers are simple scalar values of at least the native word size, ensuring that an int-sized value can be safely stored in and later retrieved from a pointer variable. This technique is useful in certain cases, such as when passing an integer value through an interface that takes an opaque pointer argument. While these conversions should be avoided when possible, for the purposes of SIL they may be considered safe. Note that the converse does not hold: converting a pointer to int and back can change its value!

Integer types
Use int by default, sized types when appropriate. Use int8_t instead of char for numeric data. Avoid long and short except where required.

Rationale (int): We assume that the int type is at least 32 bits wide (see type size assumptions above), so thus there is no need to use long or int32_t merely to ensure a 32-bit data type. Using int as widely as possible reduces the chance of accidental truncation due to conversion between types of different sizes.

Rationale (sized types): Since we assume int is at least 32 bits wide, there should rarely be a need to specify sized types for local variables. However, sized types can be useful in certain cases, such as:

Rationale (char): While char and int8_t are normally the same internal type, char should be limited to data which is actually textual in nature. For 8-bit numeric data, use int8_t (or uint8_t, but see signed vs. unsigned integers below) to indicate to the reader that the data is numeric. Note in particular that whether char is signed or unsigned depends on the compiler, so using char for a signed 8-bit integer can result in nasty surprises.

Rationale (long and short): Sized integer types make long and short generally unnecessary, but when calling library functions with long- or short-type parameters or return values, it can be more convenient to use those types directly instead of casting back and forth. However, try not to propagate such types outside the immediate locality of the library call.

Signed vs. unsigned integers
Use signed types unless there's a good reason to use unsigned types. Don't use unsigned types just because you never expect to have a negative value.

Rationale: Conversions between signed and unsigned integer types are a perennial source of bugs, so much so that most modern compilers warn about mixing them. The easiest way to avoid these bugs is not to use unsigned types at all. In particular, "this value will never be negative" is not a reason to use an unsigned type; someday the value will go negative, and your code will break.

There are still a few cases in which unsigned types are beneficial:

Boolean values
Use the uint8_t type for data stored in memory, int otherwise. Never assign a value other than 0 or 1 to a boolean variable.

Rationale (data types): Experience has shown that boolean flags are used sufficiently often to warrant the use of a smaller data type than int when such values are stored in memory. On the other hand, values typically stored in registers, such as function parameters and return values or local variables, do not benefit from using a smaller-sized type. Note that C99 provides the _Bool type, along with the <stdbool.h> header which defines it as bool, but this type is not guaranteed to be link-compatible with the C++ bool type, so it is unsuitable for use in a library such as SIL which may be linked with C++ code.

Rationale (assignment of values): While C treats any nonzero value as true, assigning an arbitrary value directly to a boolean variable can have unexpected results: for example, on a system where long is larger than int, the value LONG_MIN itself is nonzero and therefore true, but assigning it directly to a boolean variable (of type int or uint8_t) will result in a false value due to truncation. It is permissible to copy the value of one boolean variable to another if the first variable's value is known to be safe (either 0 or 1), but do not assume that boolean arguments passed to an interface function have safe values.

Floating-point types
Use float by default, double only when necessary. Avoid long double entirely.

Rationale: Computations using double values are generally more expensive than computations with float values even on systems with hardware support for double-precision floating point. Some systems implement double-precision operations in software, in which case the performance difference can reach an order of magnitude or more and significantly impact overall program performance. For the vast majority of floating-point computations typically performed by SIL-based programs, float provides sufficient precision to ensure correct behavior. Additionally, the consistent use of a single type helps avoid unexpected behavior that can result from mixing precisions, such as loss of precision when a double value is passed through a function which takes a float parameter.

An example of appropriate usage of double is in the SIL interface time_now(), which returns a double-precision timestamp in units of seconds. If a float value was returned, it would quickly lose precision to the point of preventing accurate sub-second timing. For example, using an IEEE 754-compliant single-precision type, the timestamp resolution would drop to around 1 millisecond after 8192 seconds (about 2 hours); at a frame rate of 60 frames per second, this is more than 5% of a frame, and attempting to use such a low-resolution timestamp for accurate timing could result in noticeable jitter.

Floating-point literals
Append an f suffix to floating-point literals to mark them as single precision. Integer literals may be used for integral values in floating-point expressions.

Rationale (f suffix): It can be easy to forget that floating-point literals are double precision by default, but including a double-precision literal in an expression causes all other operands in the expression to be promoted from single to double precision, even if the expression's value is then assigned to a single-precision variable (in which case the value must then be converted again, from double to single precision). Always include the f suffix on floating-point literals to mark them as single precision, except when the literal is intended to be a double-precision value or is used in a double-precision expression.

Rationale (integer literals): Unlike double-precision literals, integer literals do not cause promotion of floating-point operands, so they are generally safe to use in floating-point expressions, and expressions may be easier to read without extraneous ".0"s on such values. However, bear in mind that if both operands to an operator are integers, the operation will be performed as an integer and may consequently overflow; values which have the potential to cause such overflow should be written as floating-point literals, including the f suffix when appropriate.

Exception: It is not necessary to include the f suffix on floating-point literals used as initializers or function arguments, since such values will be converted to single precision at compile time and thus will have no impact on runtime performance.

Examples:

float rounded_value = floorf(value + 0.5f); /* "3600.0f" is acceptable, but unnecessary if hours is known to be small. */ float hours_to_sec(int hours) {return hours * 3600;} /* Floating-point literal since "1000000000" would cause integer overflow. */ float sec_to_nsec(int sec) {return sec * 1.0e9f;} /* This function returns a double, so the "f" suffix is not needed (and * would in fact cause an unnecessary conversion from float to double). */ double sec_to_nsec_2(int sec) {return sec * 1.0e9;} /* This function returns a float, but the input variable is a double, so * the literal is written as a double-precision value to match. */ float nsec_to_sec(double nsec) {return nsec / 1.0e9;} float blink_interval = 1.234; // "f" suffix not needed on initializers.
// These unnecessarily convert from single to double precision and back again. float rounded_value = floorf(value + 0.5); float hours_to_sec(float hours) {return hours * 3600.0;}
Zero values
Use NULL for pointers, '\0' for characters.

Rationale: While 0 represents a zero value in any type, use of the most appropriate literal (NULL for pointers, '\0' for characters) helps remind the reader of the data type. However, it is not necessary to write 0.0 or 0.0f for floating-point values (see also floating-point literals above).

Comparing against zero
In a conditional, omit the comparison if the expression is unambiguous; otherwise, explicitly compare against a zero value.

Rationale: Conditional expressions in if, for, and while statements (as well as subexpressions of the logical operators && and ||) treat any nonzero value as false, so there is no need to explicitly write "expression != 0", and it can often be more readable to omit the comparison against zero. For example, when a zero value or null pointer indicates the absence of an object, it is generally more meaningful to say "if (object)" (which can be read "if object exists") than to explicitly compare against zero, which can suggest that zero has some special meaning. Do use an explicit zero if the value zero has a specific meaning, such as the first entry in a zero-indexed menu.

If you need to store a flag for whether a value is zero, you can use the logical negation operator "!". However, do not use a double negation to test for a nonzero value; explicitly compare against zero instead.

Examples:

for (object = list; object; object = object->next) { // ... } if (user_input == 0) { // 0 is a meaningful value here. run_menu_0(); } alloc_failed = !buffer; // Or "alloc_failed = (buffer == 0);".
have_buffer = !!buffer; // Use "have_buffer = (buffer != 0);" instead.
Side effects in conditional expressions
Assignment allowed when the intent is clear, but always use an explicit comparison.

Rationale: It can be convenient to use an assignment in the conditional expression of a control statement, to assign a value and test that value at the same time. However, this can also make it harder to follow the flow of the code, so assignments should only be used when that assignment is the primary purpose of the entire expression.

Also, a lone assignment expression in a conditional statement can look like a mistyped equality comparison (and indeed, many compilers will emit a warning along those lines), so always enclose the assignment in parentheses and explicitly compare for inequality to zero.

Examples:

if ((value = strtol(s, NULL, 10)) < 0) { // ... } while ((s = strtok(NULL, " ")) != NULL) { // ... }
while ((ptr = get_next_object())) { // Doubly-parenthesized expression is hard to read. // ... } if ((slash = strchr(s, '/')) < (dot = strchr(s, '.'))) { // Pull the assignments out. // ... }
Bitwise operators
Don't use the bitwise operators (<< >> & | ~) to perform arithmetic.

Rationale: In the past, a common C idiom was to replace certain arithmetic operations with bitwise operations, such as replacing multiplication by a power of two with the equivalent left-shift operation, on the (generally correct) theory that bitwise operations execute more quickly than arithmetic ones. Modern compilers are perfectly capable of performing this optimization themselves, so there is almost never any need to resort to this hack. (This can also lead to subtle bugs due to precedence errors.)

Even in cases where the use of bitwise operators can make a difference, don't use them unless you've profiled the code and you're certain that the arithmetic operator is a significant bottleneck. Remember that premature optimization is the root of all evil.

Examples:

x /= 2; // Not "x >>= 1;"! y %= 8; // Not "y &= 7;"! num_sectors = size >> log2_sector_size; // Permissible if you don't have sector_size.
offset &= (block_size - 1); // Don't do this unless you absolutely have to.
The const keyword
Use const freely when appropriate.

Rationale: const can help prevent errors resulting from accidentally assigning to the wrong variable; as a bonus, it generally helps the compiler optimize better. Use it whenever you initialize a variable that won't be changed, such as when saving the result of a function call.

const can be applied at all pointer levels of a pointer variable, but often one const is good enough. (Also note that some library functions expect const at some levels but not at others, and the constness of each level has to match.)

Examples:

const int texture = texture_create(...); const char * const *string_list = all_strings; // "const char **" is fine too. int is_number(const char *s) { const char *end; /* strtol() requires a non-const "char **" as the second parameter. * Rather than declaring the variable as "char *" even though we won't * write through it, we declare it as "const char *" and cast away the * const just for this call. */ (void) strtol(s, (char **)&end, 0); return *s != '\0' && *end == '\0'; }
Constants
Use #define or enum for file-scope scalar constants, const for local scalar constants, and static const for array or structured constants. Don't forget to const both levels of a string array.

Rationale: In C, file-scope constants are not folded, so defining such a constant in a header using static const would emit a copy of the constant in every object file including the header, needlessly wasting space. Scalar constants should therefore be defined using either #define (but be careful when using macros) or enum; the advantage of enum is that the symbol is included in debug information and can be referenced in a debugger, while the disadvantages are that the syntax is slightly more obtuse and that floating-point values can't be used. Constants local to a function, on the other hand, can usually be compiled directly into the instruction stream as a register load, so there is no problem with just using const. (In this case, static is unnecessary and could potentially waste space in the object file.)

Examples:

#define MAX_ENTRIES 100 // Or "enum {MAX_ENTRIES = 100};". enum {STATUS_GOOD = 1, STATUS_BAD, STATUS_UGLY}; /* Note the double "const" here; the first "const" declares the string data * to be immutable, while the second "const" declares the array itself to * be immutable. Both are required to make the data truly constant. */ static const char * const usage_text[] = { "Usage: mytool [OPTION]...\n", "Options:\n", // ... }; int myfunc(int x) { const int maxval = 30; static const int primes[] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}; // ... }
The goto statement
Use the goto statement when it will avoid repetition of error-handling code. Never use goto for any other purpose.

Rationale: Although "considered harmful" by some—and indeed, injudicious use of goto can greatly impair code maintainability—the goto statement is useful in consolidating error-handling logic in a single location, and it should be used in preference to repeating the same cleanup code over and over.

Examples:

Object *create_object(...) { Object *object; if (!(object = malloc(sizeof(*object)))) { goto error_return; } if (!init_object(object)) { goto error_free_object; } if (!register_object(object)) { goto error_deinit_object; } return object; error_deinit_object: deinit_object(object); error_free_object: free(object); error_return: return NULL; }
Object *create_object(...) { Object *object; if (!(object = malloc(sizeof(*object)))) { return NULL; // This is okay... } if (!init_object(object)) { free(object); return NULL; // ... but now we're starting to repeat ourselves. } if (!register_object(object)) { /* Oops, forgot to add the deinit_object() call! */ free(object); return NULL; } return object; }
Assertions
Assert early and often, but only for impossible conditions. Include fallback actions for assertion failure where feasible, but avoid complex fallback actions.

Rationale (assert early and often): Programmers are only human, and errors will creep into any nontrivial code. Assertions provide a way to check for errors at runtime before those errors cause crashes, data corruption, or other serious problems. SIL provides (in base.h) two macros, ASSERT() and PRECOND(), which can be used for this purpose. At present, the two macros are essentially identical, but PRECOND() is intended for checking function preconditions, and further use of the macro may be made for that purpose in the future, so prefer PRECOND() over ASSERT() when checking a function argument against a precondition. Use ASSERT() for all other cases.

Rationale (only impossible conditions): By writing an assertion, you are declaring (asserting) that the asserted expression must be true under every possible condition. If there is any possible condition under which the expression might be false, no matter how unlikely, do not use an assertion. For example, never assert that a memory allocation has succeeded, because there's always the possibility that the program will run out of memory (or address space) and the allocation will fail. In such cases, always implement and test proper error handling. Exceptions: You do not need to consider hardware errors such as memory or register corruption when deciding whether a condition is possible; for example, if an interface function checks the value of an argument, its helper functions do not also need to make the same check. You may also assume that system calls and other external functions behave according to their documentation.

Rationale (include fallback actions): The ASSERT() and PRECOND() macros accept an optional failure action, which is a statement (or multiple statements) which will be executed if the assertion fails and the program is not running in debug mode. While there is a school of thought which argues that the program should always abort on assertion failure because the internal state has left the designed bounds and further behavior cannot be predicted, it is often feasible to perform some sort of recovery short of terminating the entire program. For example, if a function which expects a valid file handle receives a null pointer, the function can simply return an error state as it would for an actual error with a valid file handle. This may still result in the program terminating itself with an error message, but even that is more user-friendly than simply crashing.

Rationale (avoid complex fallback actions): By their very nature, fallback actions for assertions cannot be tested like other code, since (assuming the program does not have any relevant bugs) the asserted condition will never fail, and if it did, the test (which runs in debug mode) would terminate anyway. For this reason, fallback actions should be extremely simple; often, a single return statement is sufficient. In cases where there is no simple way to recover from an assertion failure, prefer to omit the fallback action entirely, especially if there are no serious consequences from the failure.

Examples:

Object *create_object(void) { Object *object = mem_alloc(sizeof(*object), 0, 0); if (!object) { return NULL; // There would also be a test case for this failure. } init_object(object); ASSERT(object->state == NEW, return NULL); // Note that we don't even bother freeing the memory. return object; } int count_subobjects(Object *object) { PRECOND(object != NULL, return 0); // This is a function precondition. return list_length(object->subobjects); }
Object *create_object(void) { Object *object = mem_alloc(sizeof(*object), 0, 0); ASSERT(object != NULL); // This could legimitately fail. init_object(object); ASSERT(object->state == NEW, list_append(broken_objects, object); // Complex and difficult to test. return NULL); return object; }
Macros
Use judiciously and with care.

Macros are permitted when they serve a purpose which is difficult or impossible to serve otherwise. However, be especially careful of unintended side effects when writing a macro (see the examples below).

Note that many function-like uses of macros—specifically, those which do not include control statements like return that escape the scope of the macro and whose parameters take specific types—can be replaced with static inline functions. Doing so both avoids the potential problems of macros and allows the compiler to perform its usual type-checking.

Rationale: Preprocessor macros are a powerful metaprogramming tool, but that power can easily hurt readability. Since macros are expanded before the source code is parsed, it's easy to write a macro that has unintended consequences, and it can be difficult to figure out exactly what those consequences were.

Examples:

/* Use parentheses around negative numeric values and expressions. */ #define INVALID_VALUE (-1) #define DEGREES_TO_RADIANS (PI / 180.0) /* Use parentheses around macro parameters, since otherwise they can be * expanded into something completely different from what you intended. */ #define DOUBLE(x) ((x) * 2) /* When possible, use an inline function (or several) instead of defining * a macro. */ static inline float sqrf(float x) {return x*x;} static inline double sqr(double x) {return x*x;} /* Bracket multiple statements or control structures with do {...} while (0) * so they do not break a surrounding if/for/while. */ #define ABORT_IF_NEGATIVE(x) do { \ if ((x) < 0) {return -1;} \ } while (0) /* Macros can be useful to report the source code location of an error (the * DLOG() macro provided by SIL does this as well). */ #define LOG_ERROR(str) log_error("%s:%d: %s", __FILE__, __LINE__, (str))
/* This definition will cause BAD_DOUBLE(x<y ? x : y) to only double y, * not x. */ #define BAD_DOUBLE(x) (x * 2) /* This definition would give an unexpected result if used as an operand * in a multiplication expression, for example. */ #define BAD_INCREMENT(x) (x) + 1 /* While this example is properly parenthesized, the two uses of "x" will * cause any side effects in the actual parameter to be evaluated twice. * Inline functions are a better choice here. */ #define SQR(x) ((x) * (x)) /* This definition will cause the structure of the function to change when * BAD_ABORT_IF_NEGATIVE(x) is expanded. */ #define BAD_ABORT_IF_NEGATIVE(x) if ((x) < 0) {return -1;} float myfunc(float x) { if (x <= 0) BAD_ABORT_IF_NEGATIVE(x); else x += sqrtf(x); return x; }
#include directives
Order headers alphabetically, using full pathnames for each header. Include system headers after SIL headers.

When including SIL headers in a source file, order the headers componentwise alphabetically by full pathname, excluding the .h filename extension. As a corollary, each header should declare all external types it references, except for those defined in src/base.h (which will always be included first).

If you need to include any system headers, list them after all SIL headers. It may be useful to further subdivide these into standard system headers and headers for specific system libraries.

Rationale: Using the full pathname of a header tells the reader immediately where the header is located; a relative pathname would force the user to check the location of the source file and manually resolve the relative path. Additionally, if a system header happens to have the same name as a header you create, the compiler may include the system header instead of yours if you give only the filename in the #include directive.

Exception: Use relative pathnames for nested includes in public headers, to avoid requiring particular compiler flags for client code.

Example:

#include "src/base.h" #include "src/foo.h" #include "src/foo/quux.h" // "foo/..." comes before "foo-". #include "src/foo-bar.h" #include <errno.h> #include <stdio.h> #include <FooLibrary/FooBase.h> #include <FooLibrary/FooExtras.h>
Nested #includes
When declaring external types, use a forward declaration instead of a nested #include when possible.

Rationale: Including a header file inside another header file just to get the declaration of a structured type forces all users of the header to pay the cost of loading the nested header. Instead, when possible, use forward declarations of struct and union types. (This generally means you'll need to use "struct type" or "union type" instead of just the type name in function declarations.) Since C++ doesn't allow forward declarations of enums, headers which reference enum types and which may be included from C++ code (for SIL, this generally means all headers outside of the sysdep directory) will have to use nested includes for such types.

Structure tags and typedefs
When defining a struct, union, or enum type, include a typedef which defines the tag name as a type.

Typically, the typedef should precede the definition of the structured type itself, so the type name can be used within the definition (such as when defining a "next" pointer for a list). However, C++ does not allow referencing an enum before it has been defined, so in that case, the typedef must follow the enum.

Rationale: In C++, all tags for structured types (including class, struct, union, and enum) are automatically defined as type names, but in C, an explicit typedef is required for each type. Since C++ will not complain about such typedef statements, they should be included for all structured types visible to C code.


2. Formatting

Character encoding
All source files are encoded in UTF-8, but avoid non-ASCII characters when possible.

Rationale: UTF-8 is currently the de-facto standard for text encoding, and full Unicode (including U'...' character values) is supported by at least GCC and recent versions of Clang. However, support is by no means universal, so try to avoid non-ASCII characters, and test extensively if you do use them.

Line length
Lines should not be longer than 79 columns.

Rationale: 80 columns has proven to be a good balance between avoiding unnecessary wrapping and keeping the text narrow enough to scan easily (that is, without forcing the eyes to move back and forth on each line). 80 columns is also a fairly standard width for terminal programs and editors. However, some such programs have troubles with lines that are exactly 80 columns long (for example, Emacs will wrap the 80th character to the next line when using an 80-column display); for this reason, lines should be kept to 79 characters when possible.

Exceptions:

Indentation
The basic indentation unit is four columns. Outdent half a unit (two columns) for labels, including case labels in a switch block. Indent one unit or to the opening parenthesis for continued lines, except when aligning related subexpressions.

The SIL source files include a trailer comment which causes the Emacs and Vim editors to use the proper indentation settings.

Rationale: Four columns is enough to clearly indicate the nesting depth at a glance, without being so wide that it pushes reasonably nested code off the edge of the screen. (Corollary: if code is indented so much that the line length limit becomes a problem, the nesting level is too deep.) Four columns is also divisible by two to provide an intermediate indentation for labels.

Examples:

int func(int x) { if (x < 0) { goto error; } return x * (x+1); error: return -1; } float my_function_with_a_long_name(int my_first_parameter, float my_second_parameter) { if (my_first_parameter > 0) { return my_function_with_a_long_name( my_first_parameter - 1, sqrtf(my_second_parameter)); } else { return my_second_parameter; } }
Tabs vs. spaces
Always use spaces, never tabs.

Rationale: There is little consensus between editor programs on the width of a tab stop; thus, to properly read code indented with tabs, the reader of the code must make a special effort to configure their software properly. It's far preferable for the (single or few) writers of source code to make the effort to use spaces rather than force the (many) readers to change their editor settings for each program's source code they view.

One statement per line
Only one statement or label is allowed on a line.

Rationale: It can be easy to overlook extra statements on the same line, especially when they are infrequent.

Exception: If all cases in a switch will fit on one line each and contain no more than one statement (excluding break, return, or goto), the statements may be moved to the same lines as their respective case labels. In this case, do not outdent the case labels.

Examples:

x /= 2; // Good. y /= 2; switch (value) { case 10: return 2; // Note indentation. case 20: return 5; default: return 0; } int myfunc(int input) { int result; switch (input) { case -1: /* This could potentially all fit on one line, but that would violate * the one-statement-per-line rule since there are two non-label, * non-"break" statements for this case. */ errors++; result = -1; break; default: result = input+1; break; } return result; }
x /= 2; y /= 2; // Bad. int myfunc(int input) { // ... return 1; error: return 0; // Put the return on a separate line. }
Whitespace around operators
Put one space around binary operators except "->" and ".", and no spaces around a unary operator.

Rationale: Whitespace improves readability when used in moderation. Omitting whitespace around member reference operators and unary operators emphasizes their tighter binding.

Exceptions:

Examples:

i = j * 2; // Or "i = j*2". object->refcount++; result += i*60 + (j+59)/60; /* Sometimes a unary minus can be hard to see next to a long variable name * or complex expression. In that case, add parentheses around the operand * rather than inserting a space. */ value = -(long_named_variable);
x=1; // Never omit spaces around an assignment operator. value = - long_named_variable; // Never insert a space after a unary operator.
Whitespace in function and macro calls
Add spaces after commas; don't add spaces around parentheses.

Exception: It's okay to omit spaces after commas in nested function calls, as long as doing so doesn't hurt readabililty.

Examples:

function(param1, param2); MACRO(param1, strchr(string,'/'), param3);
Parentheses in expressions
Use enough parentheses to make the expression easy to read at a glance. Always use parentheses to separate operators with confusing precedence.

Rationale: Some combinations of operators are particularly susceptible to precedence errors:

The compiler will generally emit a warning if parentheses are missing in any of the cases listed above.

Examples:

if ((x < 0 || x > width) && !allow_out_of_bounds) { // ... } if ((pixel & 0x001F) == 0x1F) { red_is_maximum = 1; } component_sum = (pixel & 0x001F) + ((pixel & 0x03E0) >> 5) + ((pixel & 0x7C00) >> 10);
Parentheses with defined, sizeof, and return
Treat sizeof and defined like functions; don't use unnecessary parentheses around return values.

Rationale: While C does not require parentheses around the arguments to defined or (when the argument is a variable) sizeof, those keywords act like functions in that they return values*, so uses of those keywords should be styled like function calls. return, on the other hand, is not a function and does not generate a value (you couldn't put it on the right side of an assignment operator), so it shouldn't be used like one.
*Technically, defined doesn't "return a value" since it's not recognized by the compiler at all, but the preprocessor translates it into a boolean value, so it's the same sort of beast.

Examples:

#if defined(DEBUG) && defined(__GNUC__) /* GCC magic goes here. */ #endif int data_size(DataStruct *data, int num_data) { return sizeof(*data) * num_data; }
int bad_func(struct foo *ptr, int count) { int size = count * sizeof *ptr; // Looks like "count times sizeof times ptr". size += count * sizeof int; // This doesn't even compile. return(size); // return is not a function! }
Conditional and loop statements
Always use braces; insert a space before the opening parenthesis; add spaces after semicolons in a for statement; put the opening brace on the same line as the closing parenthesis.

Rationale (mandatory braces): Failing to use braces with control statements can easily lead to bugs, such as when attempting to add a second statement to an if without a block.

Exception (opening brace): The opening brace can be moved to the next line if it doesn't fit on the same line, or to avoid confusion between a continued line of the control statement and the first line of the nested block when the two lines have similar indentation (see the second for example below).

Note that if a block is long, it can be useful to annotate the closing brace with the control statement that began the block (see the while example below).

Examples:

if (flag1) { // ... } else if (flag2) { // Treat "else if" as a single keyword. // ... } else { // ... } while (alive()) { // ... } // while (alive()) for (i = 0; i < count; i++) { // ... } /* Here, the opening brace is moved to its own line to avoid confusion between * the continuation of the "for" statement and the first line of the block. */ for (i = 0, x = 1, y = 1; i < count; i++, x *= 2, y *= 2) { // ... }
switch statements
Braces around case blocks are optional; document clearly if a case falls through.

Rationale (optional braces): Unlike other control statements, the use of braces in switch statements has no effect on control flow. In general, use braces when you need to define variables local to that case.

Rationale (documenting fall-through): It can be hard to tell at a glance whether a missing break statement is intentional or not. Documentation helps reassure the reader of the intended behavior, and it also avoids the risk that someone (maybe even you) will accidentally insert a break during a code cleanup session.

Examples:

switch (value) { case 1: // ... break; case 2: // ... /* falls through */ // Document that the missing "break" is intentional. case 3: { // ... break; } // case 3 default: // ... break; } // switch (value)
Function declarations
Use an explicit void with functions that take no parameters. The opening brace for a function body goes on the following line.

Rationale (explicit void): In C (as opposed to C++), an empty parameter list means that the function's parameters are unspecified. This prevents the compiler from checking the number and types of parameters at call sites, so functions which take no parameters should have an explicit void to indicate that fact to the compiler.

Rationale (opening brace on following line): Putting the brace on its own line gives an additional visual indication that the brace starts a new function.

Exception: If the function is both very short (1-2 lines) and defined with static linkage, it is acceptable to put the definition's opening brace on the same line as the function declaration. If the function body fits on the same line as the declaration, the entire function may be written on one line.

Examples:

int get_status(Object *object) { // ... } static inline float sqrf(float x) {return x*x;} // Concise format for a short function.
Function pointers
Explicitly dereference function pointers when calling through them.

Rationale: Using an explicit dereference operation makes it clear to the reader that the thing being called is a function pointer and not an actual function.

Exception: In C, function pointers accessed through a structure do not need to be dereferenced if they are used like C++ instance methods.

(*function_ptr)(args); object->method(object, args); // Where "method" is a function pointer.
function_ptr(args);
Comments
Use // for inline comments, separated from the code by at least two spaces; prefer /*...*/ for block comments.

Functions have a specific header comment format; see the code for details.


3. Naming

General naming rules
Names should be descriptive. Don't abbreviate unnecessarily, but avoid excessive verbosity.

In general, an identifier's name should immediately tell the reader the purpose of the identifier, but it should be concise enough that its length does not obscure the structure of the code. For example, LIMIT would be a poor name for a global constant; the name tells us nothing about what sort of limit it is. But the same LIMIT might make perfect sense in a short function whose sole purpose was to bound its parameter to be less than a certain value, and indeed a longer name would serve no purpose except to clutter up the code.

Similarly, a file's name should make the file's purpose clear to someone looking at a directory listing, but should not be so long as to clutter log messages (which include the name of the source file which generated the message). In the case of filenames, it's acceptable to include the directory path when determining whether a filename is "clear"; for example, resource/core.c clearly refers to core functionality for resource management, and does not need to be expanded to resource/resource-core.c.

Avoid overusing abbreviations, since they can reduce readability by forcing the reader to stop and mentally expand the abbreviation each time the identifier is used. For example, in a function that uses a variable to hold a count of objects, nobj would be a poor name for the variable since its meaning is not immediately obvious to a reader unfamiliar with the code. num_obj would be better, but unless the variable is heavily used throughout the function, num_objects is more friendly to the reader. (However, number_of_objects would be unnecessarily verbose, since a num_ prefix is generally understood to mean "number of".)

Single-letter and similarly short variable names should be avoided except in cases where their meaning is obvious and generally accepted. For example, i is widely accepted as an iterator variable and may be used in that context, but it should not be used for a temporary variable, even in a limited scope. Similarly, short names for types or functions are acceptable when they are clearly derived from similar names in the standard libraries.

Examples:

#define FRAME_RATE_MIN 20 // Clear and concise. #define PI 3.1415926535897932 // Short but well-known; fine even at global scope. static inline double sqr(double x) { // Follows the pattern of sqrt(), so okay. return x*x; } void collide_objects_with_player(void) // Clear and concise. { const int player_x = player->position.x; // Not just "x" or "y". const int player_y = player->position.y; for (int i = 0; i < num_objects; i++) { // "i" is well-known as an iterator. const int object_x = objects[i].position.x; const int object_y = objects[i].position.y; if (player_x == object_x && player_y == object_y) { // ... } } }
#define FRATE 20 // Not obvious what it's used for. #define MINIMUM_NUMBER_OF_FRAMES_PER_SECOND 20 // Way too verbose. void doobjs(void) // Not obvious what the function does. { /* If these were the only coordinates used in the function, "x" and "y" * might be okay, but here they reduce readability because the function * uses two sets of coordinates. */ const int x = player->position.x; const int y = player->position.y; /* "o" is not recognized as an iterator, so unlike "i" it's unacceptable. */ for (int o = 0; o < num_objects; o++) { /* Again, it's unclear from the names what these coordinates are. */ const int x2 = objects[o].position.x; const int y2 = objects[o].position.y; if (x == x2 && y == y2) { // ... } // ... } }
Filename formatting rules
Use accepted filename extensions. Give each file a unique name exclusive of the extension. Avoid non-alphanumeric characters other than the hyphen (-) and underscore (_).

Use the filename extensions listed below for each source file type:

Rationale (filename extensions): While not strictly required on modern operating systems, the filename extension is an accepted way to inform the user of the type of content in the file. Some programs (including compilers and IDEs) also use the file extension to guess the file's content type, and using a nonstandard extension would confuse such programs to the detriment of the user.

Rationale (unique names): All source files are compiled to object files with the same filename extension (typically .o). If two source files in the same directory have the same name but a different extension, their object files would collide, breaking the build. If it is necessary to have two source files in different languages with the same purpose (for example, when implementing a C++ interface to C functions), use the base filename for the source file with the most nontrivial code, and rename other files to avoid object file collision. For example: utilities.c, utilities-cxx.cc, utilities-objc.m

Rationale (non-alphanumeric characters): Non-alphanumeric characters may have special meanings to some systems, preventing files whose names contain those characters from being used properly (or at all!) on such systems. For example, quote characters are used on many systems to enclose filenames containing spaces; conversely, spaces are used on most systems to separate command arguments, and including a space in a filename can cause builds to break in unexpected ways. The only symbols accepted as safe across all systems are the hyphen and underscore. Non-ASCII characters should also be avoided because some users' systems may not be able to display them properly.

Identifier formatting rules
Use MixedCase for type names, UPPER_CASE_WITH_UNDERSCORES for constants and preprocessor macros, and lower_case_with_underscores for functions, variables, and structure members.

Examples:

#define PI 3.1415926535897932 enum OperationStatus { STATUS_OK = 1, STATUS_FAILED = 2, }; typedef enum OperationStatus OperationStatus; // See above about typedefs. typedef struct OperationRecord OperationRecord; struct OperationRecord { OperationStatus status; void *private_data; }; OperationStatus run_operation(OperationRecord *operation) { operation->status = internal_run_operation(operation->private_data); if (operation->status != STATUS_OK) { static int did_warn = 0; if (!did_warn) { DLOG("Operation failed!"); did_warn = 1; } } return operation->status; }
Use of C++ keywords as C identifiers
C++ keywords may be used as identifier names in C code as long as they are appropriate, but take care when such identifiers appear in header files.

C++ reserves a number of keywords which can also be reasonably used as identifier names; for example, try could be a counter for an operation which may need to be retried several times, and this can be used as an object pointer when implementing instance-method-like functions in C. As long as the names are appropriate for the uses to which they are put, they may be freely used in C code.

However, care is needed when such identifiers appear in header files, such as when used as structure field names. In this case, renaming the identifier is usually best, but if the identifier does not need to be referenced by C++ code (for example, if it is a parameter name in a function declaration), it is also permissible to bracket the header with a #define/#undef pair:

#ifdef __cplusplus # define private private_ // Avoid errors when included from C++ source. #endif // ... #ifdef __cplusplus # undef private // Restore the normal meaning of "private" for subsequent C++ code. #endif

4. Other

Algorithm complexity
Use an algorithm of the appropriate computational complexity, but prefer simple-and-slow code to fast-and-complicated code.

While you should try not to introduce unnecessary computational complexity (for example, using a cubic-time algorithm when a quadratic-time algorithm is available), neither should you take "shortcuts" or "clever hacks" to cut down on execution time unless you have hard data demonstrating that such optimizations are of significant benefit to the program (or library) as a whole.

Rationale: This rule could also be phrased as, "Premature optimization is the root of all evil." The history of software development is littered with cases of programmers expending effort on optimizing routines which make no significant contribution to execution time in the first place—and introducing new, hard-to-find bugs as a result of their supposedly "clever" optimizations. Don't repeat their mistakes.

Examples:

char *make_lowercase(char *str) { /* Save the length so we don't recompute it on every loop iteration * (which would take linear time per iteration, making this a * quadratic-time function). */ const size_t len = strlen(str); for (size_t i = 0; i < len; i++) { str[i] = tolower(str[i]); } return str; }
char *make_lowercase(char *str) { char *old_str = str; /* This "optimization" attempts to reduce the number of memory loads * required, but unless the entire purpose of the program is to convert * strings to lowercase, this is utterly unnecessary. Setting aside * the questions of whether it is actually faster or not and whether * the documented assumption below is valid, this code has two subtle * bugs. One, noted below, is the result of copy-and-pasting code * (this was in fact an error made writing this very example); it will * cause every fourth byte to be improperly changed from [\]^_ to * {|}~<DEL>, and similarly for bytes in the ranges 0x80-0x9F and * 0xC0-0xDF. The other is more deadly: if the input string pointer is * not aligned to a multiple of 4 bytes, the program will crash on CPUs * which require aligned addresses for 32-bit integer loads. */ const uint32_t char0_mask = 0xFF000000; const uint32_t char1_mask = 0x00FF0000; const uint32_t char2_mask = 0x0000FF00; const uint32_t char3_mask = 0x000000FF; uint32_t bits; while (bits = *((uint32_t *)str), // Assume we can overrun 1-3 bytes. (bits & char0_mask) && (bits & char1_mask) && (bits & char2_mask) && (bits & char3_mask)) { if ((bits & char0_mask) >= (uint32_t)'A'<<24 && (bits & char0_mask) <= (uint32_t)'Z'<<24) { bits |= 0x20000000; } if ((bits & char1_mask) >= (uint32_t)'A'<<16 && (bits & char1_mask) <= (uint32_t)'Z'<<24) { // Oops, wrong shift! bits |= 0x00200000; } if ((bits & char2_mask) >= (uint32_t)'A'<<8 && (bits & char2_mask) <= (uint32_t)'Z'<<8) { bits |= 0x00002000; } if ((bits & char3_mask) >= (uint32_t)'A' && (bits & char3_mask) <= (uint32_t)'Z') { bits |= 0x00000020; } *((uint32_t *)str) = bits; str += 4; } while (*str) { if (*str >= 'A' && *str <= 'Z') { *str |= 0x20; } str++; } return old_str; }