The following guidelines were observed when writing the SIL source code. Note, however, that they were treated as guidelines, not rules, with the cardinal rule being "readability first".
Rationale: All modern compilers support C99 and C++11, with occasional minor exceptions that generally do not impact SIL. Modern compilers also support anonymous unions; this feature is nonstandard in C99, but it has been added to the C11 standard, so such unions will be forward-compatible. On the flip side, the C11 standard makes C99's complex number types (the <complex.h> header) and variable-length arrays optional, so those features should be avoided.
Rationale (int): In all modern PC-class environments, including devices such as tablets and game consoles which can run general-purpose software, the int type is at least 32 bits wide. (There are some embedded processor environments which use 16-bit ints to better match the hardware's capabilities, but such environments are generally not suited to running SIL-based programs.) Modern programs frequently need to work with data larger than a 16-bit integer can hold; requiring all code to either check for 16-bit overflow or explicitly use a 32-bit type would significantly increase the risk of bugs.
Rationale (pointers): In all modern PC-class environments, pointers to both data and functions are simple scalar values of at least the native word size, ensuring that an int-sized value can be safely stored in and later retrieved from a pointer variable and that a function pointer can be safely converted to and from a void pointer. This technique is useful in certain cases, such as when passing an integer value or function pointer through an interface that takes an opaque pointer argument. While these conversions should be avoided when possible, for the purposes of SIL they may be considered safe.
Note that converting a pointer to int and back is not safe! On systems with 64-bit pointers but 32-bit int, the upper 32 bits of the pointer would be lost in such a conversion.
Rationale (int): We assume that the int type is at least 32 bits wide (see type size assumptions above), so there is no need to use long or int32_t merely to ensure a 32-bit data type. Using int as widely as possible reduces the chance of accidental truncation due to conversion between types of different sizes.
Rationale (sized types): Since we assume int is at least 32 bits wide, there should rarely be a need to specify sized types for local variables. However, sized types can be useful in certain cases, such as:
Rationale (char): While char and int8_t are normally the same internal type, char should be limited to data which is actually textual in nature. For 8-bit numeric data, use int8_t (or uint8_t, but see signed vs. unsigned integers below) to indicate to the reader that the data is numeric. Note in particular that whether char is signed or unsigned depends on the compiler, so using char for a signed 8-bit integer can result in nasty surprises. (For this reason, char variables should only be compared against other data of char type such as character literals, not against integer values.)
Rationale (long and short): Sized integer types make long and short generally unnecessary, but when calling library functions with long- or short-type parameters or return values, it can be more convenient to use those types directly instead of casting back and forth. However, try not to propagate such types outside the immediate locality of the library call.
Rationale: Conversions between signed and unsigned integer types are a perennial source of bugs, so much so that most modern compilers warn about mixing them. The easiest way to avoid these bugs is to not use unsigned types at all. In particular, "this value will never be negative" is not a reason to use an unsigned type; someday the value will go negative, and your code will break.
There are still a few cases in which unsigned types are beneficial:
Rationale (data types): Experience has shown that boolean flags are used sufficiently often to warrant the use of a smaller data type than int when such values are stored in memory. On the other hand, values typically stored in registers, such as function parameters and return values or local variables, do not benefit from using a smaller-sized type. Note that C99 provides the _Bool type, along with the <stdbool.h> header which defines it as bool, but this type is not guaranteed to be link-compatible with the C++ bool type, so it is unsuitable for use in a library such as SIL which may be linked with C++ code.
Rationale (assignment of values): While C treats any nonzero value as true, assigning an arbitrary value directly to a boolean variable can have unexpected results: for example, on a system where long is larger than int, the value LONG_MIN itself is nonzero and therefore true, but assigning it directly to a boolean variable (of type int or uint8_t) will result in a false value due to truncation. It is permissible to copy the value of one boolean variable to another if the first variable's value is known to be safe (either 0 or 1), but do not assume that boolean arguments passed to an interface function have safe values.
Rationale (float): Computations using double values are generally more expensive than computations with float values even on systems with hardware support for double-precision floating point. Some older systems implement double-precision operations in software, in which case the difference in execution time can reach an order of magnitude or more and significantly impact overall program performance as well as code size. For the vast majority of floating-point computations typically performed by SIL-based programs, float provides sufficient precision to ensure correct behavior. Additionally, the consistent use of a single type helps avoid unexpected behavior that can result from mixing precisions, such as loss of precision when a double value is passed through a function which takes a float parameter.
Rationale (long double): Everything above regarding the disadvantages of double applies even more strongly to long double. The range and precision of double are great enough that its use should not introduce any user-visible artifacts; long double is generally suited only to scientific applications, and an argument could even be made that if the precision of double is insufficient for a given application, then the floating-point model itself is inappropriate and the application should use an alternate representation for its numeric values.
An example of appropriate usage of double is in the SIL interface time_now(), which returns a double-precision timestamp in units of seconds. If a float value was returned, it would quickly lose precision to the point of preventing accurate sub-second timing. For example, using an IEEE 754-compliant single-precision type, the timestamp resolution would drop to around 1 millisecond after 8192 seconds (about 2¼ hours); at a frame rate of 60 frames per second, this is more than 5% of a frame, and attempting to use such a low-resolution timestamp for accurate timing could result in noticeable jitter. double, on the other hand, provides nanosecond or better resolution over a time span of 8,388,608 seconds (more than 3 months).
Rationale (f suffix): It can be easy to forget that floating-point literals are double precision by default, but including a double-precision literal in an expression causes all other operands in the expression to be promoted from single to double precision, even if the expression's value is then assigned to a single-precision variable (in which case the value must then be converted again, from double to single precision). Always include the f suffix on floating-point literals to mark them as single precision, except when the literal is intended to be a double-precision value or is used in a double-precision expression.
Rationale (integer literals): Unlike double-precision literals, integer literals do not cause promotion of floating-point operands, so they are generally safe to use in floating-point expressions, and expressions may be easier to read without extraneous ".0"s on such values. However, bear in mind that if both operands to an operator are integers, the operation will be performed as an integer and may consequently overflow; values which have the potential to cause such overflow should be written as floating-point literals, including the f suffix when appropriate.
Exception: It is not necessary (though also not prohibited) to include the f suffix on floating-point literals used in initializers, constant expressions, or function arguments, since such values will be converted to single precision at compile time and thus will have no impact on runtime performance.
Examples:
Rationale (use): C99-style compound literals can be convenient to avoid the extra verbosity of declaring a variable whose only purpose is as an argument to a function; this usage parallels the use of a constructor return value as a function argument in C++. Single-element array literals can also be convenient when a function takes a numeric argument by reference instead of by value.
Rationale (parentheses): Compound literals already require a pair of parentheses and a pair of braces; adding a second enclosing set of parentheses can contribute to "bracket overload" and thus reduce readability. However, keep in mind that without parentheses, a compound literal will be broken into multiple macro arguments at the commas. You shouldn't be using macros anyway, but if you do need to pass a compound literal to a macro, remember to enclose it in parentheses.
Examples:
Rationale: While 0 represents a zero value in any type, use of the most appropriate literal (NULL for pointers, '\0' for characters) helps remind the reader of the data type. However, it is not necessary to write 0.0 or 0.0f for floating-point values (see also floating-point literals above).
Rationale: Conditional expressions in if, for, and while statements (as well as subexpressions of the logical operators && and ||) treat any nonzero value as true, so there is no need to explicitly write "expression != 0", and it can often be more readable to omit the comparison against zero. For example, when a zero value or null pointer indicates the absence of an object, it is generally clearer to write "if (object)" (which can be read "if object exists") than to explicitly compare against zero, which can suggest that zero has some special meaning.
Use an explicit zero if the value zero has a specific meaning, such as the first entry in a zero-indexed list. Also use an explicit zero when testing the return value of a comparison function like strcmp() or memcmp() which returns a positive, zero, or negative value to indicate the result of the comparison; an expression like if (strcmp(...)) tempts the reader to think that the test passes if the strings are equal, when in fact it fails for equal strings. (This can be considered another example of "the value zero has a specific meaning". See also failure return values below.)
If you need to store a flag for whether a value is zero, you can use the logical negation operator "!". However, do not use a double negation to test for a nonzero value; explicitly compare against zero instead.
Examples:
Note that this differs from some common APIs, including POSIX and some standard C library functions, which use zero to indicate success and -1 to indicate failure.
Rationale: When following the rules for comparing against zero, it is semantically clearer for a true value to indicate success and a false value to indicate failure.
Exception: If zero is within the range of legitimate results for the function, such as for a function which returns an array index, use -1 to indicate failure. But if -1 is also a legitimate result, use a return (pointer) parameter to store the result and return only the success/failure status as the function's return value (using zero to indicate failure).
Examples:
Rationale: It can be convenient to use an assignment in the conditional expression of a control statement, to assign a value and test that value at the same time. However, this can also make it harder to follow the flow of the code, so assignments should only be used when that assignment is the primary purpose of the entire expression. Also, a lone assignment expression in a conditional statement can look like a mistyped equality comparison (and indeed, many compilers will emit a warning along those lines), so always enclose the assignment in parentheses and explicitly compare for inequality to zero.
Examples:
Rationale: In the past, a common C idiom was to replace certain arithmetic operations with bitwise operations, such as replacing multiplication by a power of two with the equivalent left-shift operation, on the (generally correct) theory that bitwise operations execute more quickly than arithmetic ones. Modern compilers are perfectly capable of performing this optimization themselves, so there is almost never any need to resort to this hack. (This can also lead to subtle bugs due to precedence errors.)
In the specific case of signed division, using a right shift instead of an arithmetic division will actually change the behavior of the program, since right-shifting a negative value on a system using two's-complement integer representation will round the value away from zero, while division is defined to round the value toward zero. (Note that the C99 standard declares the result of right-shifting a negative value to be implementation-defined, so it is unwise to rely that behavior even if it were desired.)
Exceptions: Bitwise shifts are allowed if you only have the shift count to work with, as in the "num_sectors" example below. Bitwise operators may also be used if they provide a significant performance benefit, such as in a tight loop where the compiler fails to optimize an arithmetic operation; but even in cases where the use of bitwise operators can make a difference, don't use them unless you've profiled the code and you're certain that the arithmetic operator is a significant bottleneck. Remember that premature optimization is the root of all evil.
In some cases, it may not be clear whether an operation is semantically arithmetic or logical. For example, computing the page number from a memory address could be interpreted as either arithmetically dividing by the page size or logically extracting the bits containing the page number. Use your best judgment in such cases, but don't stretch for a logical-operation interpretation just to make an excuse to use bitwise operators.
Examples:
Rationale: const can help prevent errors resulting from accidentally assigning to the wrong variable; as a bonus, it generally helps the compiler optimize better. Use it whenever you initialize a variable that won't be changed, such as when saving the result of a function call.
const can be applied at all pointer levels of a pointer variable, but often one const is good enough. (Also note that some library functions expect const at some levels but not at others, and for multi-level pointers, the constness of each level has to match.)
Examples:
Rationale: In C, file-scope constants are not folded, so defining such a constant in a header using static const would emit a copy of the constant in every object file including the header, needlessly wasting space. Scalar constants should therefore be defined using either #define (but be careful when using macros) or enum; the advantage of enum is that the symbol is included in debug information and can be referenced in a debugger, while the disadvantages are that the syntax is slightly more obtuse and that floating-point values can't be used. Constants local to a function, on the other hand, can usually be compiled directly into the instruction stream as a register load, so there is no problem with just using const. (In this case, static is unnecessary and could potentially waste space in the object file.)
Examples:
Rationale: Although "considered harmful" by some—and indeed, injudicious use of goto can greatly impair code maintainability—the goto statement is useful in consolidating error-handling logic in a single location, and it should be used in preference to repeating the same cleanup code over and over.
Examples:
SIL provides (in base.h) three macros which can be used for checking assertions: ASSERT(), PRECOND(), and STATIC_ASSERT(). At present, the first two macros are essentially identical, but PRECOND() is intended for checking function preconditions, and further use of the macro may be made for that purpose in the future, so prefer PRECOND() over ASSERT() when checking a function argument against a precondition; use ASSERT() for all other cases. STATIC_ASSERT() is for the specific case of assertions which can be evaluated at compile time, such as checking the size of a structure against an expected value.
Rationale (assert early and often): Programmers are only human, and errors will creep into any nontrivial code. Assertions provide a way to check for errors at compile time or runtime before those errors cause crashes, data corruption, or other serious problems.
Rationale (only impossible conditions): By writing an assertion, you are declaring (asserting) that the asserted expression must be true under every possible condition. If there is any possible condition under which the expression might be false, no matter how unlikely, do not use an assertion. For example, never assert that a memory allocation has succeeded, because there's always the possibility that the program will run out of memory (or address space) and the allocation will fail. In such cases, always implement and test proper error handling.
Exceptions (only impossible conditions): You do not need to consider hardware errors such as memory or register corruption when deciding whether a condition is possible; for example, if an interface function checks the value of an argument, its helper functions do not also need to make the same check. You may also assume that system calls and other external functions behave according to their documentation—you may still need to work around bugs in such functions, but you don't need to assume the presence of such bugs until they make themselves known. (Conversely, don't omit a check for a documented failure condition just because the current implementation of the function doesn't generate that condition; or if you do omit it because handling the failure would be complex and difficult to test, make sure to clearly document that fact.)
Rationale (prefer static assertions): Static assertions are checked during the build process and will cause compilation to fail if the assertion does not hold. Since these do not rely on executing a specific code path to test the assertion, you should always use them when possible.
Rationale (include fallback actions): The ASSERT() and PRECOND() macros accept an optional fallback action, which is a statement (or multiple statements) that will be executed if the assertion fails and the program is not running in debug mode. While there is a school of thought which argues that the program should always abort on assertion failure because the internal state has left the designed bounds and further behavior cannot be predicted—and indeed, hard failure can be more appropriate than graceful fallback in security-critical situations—it is often feasible to perform some sort of recovery short of terminating the entire program. For example, if a function which expects a valid file handle receives a null pointer, the function can simply return an error state as it would for an actual error with a valid file handle. This may result in the program terminating itself with an error message, but even that is more user-friendly than simply crashing. (STATIC_ASSERT() does not include a fallback action parameter because the assertion is checked at compile time, so no fallback is necessary. Instead, the macro's second argument is an error message which will be displayed if the compiler supports C11-style static assertions, and which serves as documentation of the assertion in any case.)
Rationale (avoid complex fallback actions): By their very nature, fallback actions for assertions cannot be tested like other code, since (assuming the program does not have any relevant bugs) the asserted condition will never fail, and if it did, the test (which runs in debug mode) would terminate anyway. For this reason, fallback actions should be extremely simple; often, a single return statement is sufficient. In cases where there is no simple way to recover from an assertion failure, prefer to omit the fallback action entirely, especially if there are no serious consequences from the failure.
Examples:
Macros are permitted when they serve a purpose which is difficult or impossible to accomplish otherwise. However, be especially careful of unintended side effects when writing a macro (see the examples below).
Note that many function-like uses of macros—specifically, those which do not include control statements like return that escape the scope of the macro and whose parameters take specific types—can be replaced with static inline functions at no cost to performance. Doing so both avoids the potential problems of macros and allows the compiler to perform its usual type-checking.
Rationale: Preprocessor macros are a powerful metaprogramming tool, but that power can easily hurt readability. Since macros are expanded before the source code is parsed, it's easy to write a macro that has unintended consequences, and it can be difficult to figure out exactly what those consequences were.
Examples:
When including SIL headers in a source file, order the headers componentwise alphabetically by full pathname, excluding the .h filename extension. As a corollary, each header should declare all external types it references, except for those defined in src/base.h (which will always be included first).
If you need to include any system headers, list them after all SIL headers. It may be useful to further subdivide these into standard system headers and headers for specific system libraries.
Rationale (full pathnames): Using the full pathname of a header tells the reader immediately where the header is located; a relative pathname would force the reader to check the location of the source file and manually resolve the relative path. Additionally, if a system header happens to have the same name as a header you create, the compiler may include the system header instead of yours if you give only the filename in the #include directive.
Rationale (SIL headers first): Including all SIL headers before any system or other external headers ensures that SIL headers do not attempt to make use of types or other declarations from those external headers. For example, including <stdio.h> at the top of a source file would mask compiler errors from uses of the FILE type in any subsequently included SIL headers.
Exception: Use relative pathnames instead of full pathnames for nested includes in public headers, to avoid requiring particular compiler flags for client code.
Example:
Rationale: Including a header file inside another header file just to get the declaration of a structured type forces all users of the header to pay the cost of loading the nested header. Instead, when possible, use forward declarations of struct and union types. (This generally means you'll need to use "struct type" or "union type" instead of just the type name in function declarations.) Since C++ doesn't allow forward declarations of enums, headers which reference enum types and which may be included from C++ code (for SIL, this means all public headers) will have to use nested includes for such types.
Typically, the typedef should precede the definition of the structured type itself, so the type name can be used within the definition (such as when defining a "next" pointer for a list). However, C++ does not allow referencing an enum before it has been defined, so in that case, the typedef must follow the enum (or the enum must be defined within the typedef statement).
Rationale: In C++, all tags for structured types (including class, struct, union, and enum) are automatically defined as type names, but in C, an explicit typedef is required for each type. Since C++ will not complain about such typedef statements, they should be included for all structured types visible to C code.
Rationale: As noted in the GCC documentation for the __builtin_expect() intrinsic used to implement branch hints, "programmers are notoriously bad at predicting how their programs actually perform", and what seems like an "obvious" optimization may in fact hurt performance due to things like unexpected calling patterns or CPU idiosyncrasies. However, when checking for errors from system or low-level functions (such as when allocating memory) or verifying function parameters, marking a comparison with UNLIKELY() serves as documentation that the comparison is testing for an exceptional condition without requiring the reader to understand the details of the comparison, while also providing slightly better performance on the non-failing code path.
Rationale: UTF-8 is currently the de facto standard for text encoding, and full Unicode (including L'...' character values) is supported by at least GCC and recent versions of Clang. However, support is by no means universal, so try to avoid characters outside the ASCII range in non-comment source code, and test extensively if you do use them.
Rationale: 80 columns has proven to be a good balance between avoiding unnecessary wrapping and keeping the text narrow enough to scan easily (that is, without forcing the eyes to move back and forth on each line). 80 columns is also a fairly standard width for terminal programs and editors. However, some such programs have troubles with lines that are exactly 80 columns long (for example, Emacs will wrap the 80th character to the next line when using an 80-column display); for this reason, lines should be kept to 79 characters when possible.
Exceptions:
The SIL source tree includes an Emacs directory-local variable list (.dir-locals.el) which causes the Emacs editor to use the proper indentation settings.
Rationale: Four columns is enough to clearly indicate the nesting depth at a glance, without being so wide that it pushes reasonably nested code off the edge of the screen. (Corollary: If code is indented so much that the line length limit becomes a problem, the nesting level is too deep.) Four columns is also divisible by two to provide an intermediate indentation for labels.
Examples:
Rationale: There is little consensus between editor programs on the width of a tab stop; thus, to properly read code indented with tabs, the reader of the code must make a special effort to configure their software properly. It's far preferable for the (single or few) writers of source code to make the effort to use spaces rather than force the (many) readers to change their editor settings for each program's source code they view.
Rationale: It can be easy to overlook extra statements on the same line, especially when they are infrequent.
Exception: If all cases in a switch will fit on one line each and contain no more than one statement (excluding break, return, or goto), the statements may be moved to the same lines as their respective case labels. In this case, do not outdent the case labels.
Examples:
Rationale: Whitespace improves readability when used in moderation. Omitting whitespace around member reference operators and unary operators emphasizes their tighter binding.
Exceptions:
Examples:
Exception: It's okay to omit spaces after commas in nested function calls, as long as doing so doesn't hurt readabililty.
Examples:
Rationale: Some combinations of operators are particularly susceptible to precedence errors:
The compiler will generally emit a warning if parentheses are missing in any of the cases listed above.
Examples:
Rationale: While C does not require parentheses around the
arguments to defined or (when the argument is a variable)
sizeof, those keywords act like functions in that they
return values*, so uses of those keywords should be styled
like function calls. return, on the other hand, does not
behave like a function (it doesn't generate a value, and you couldn't
put it on the right side of an assignment operator), so it shouldn't
be styled like one.
*Technically, defined
doesn't "return a value" since it's not recognized by the compiler at
all, but the preprocessor translates it into a boolean value, so it's
the same sort of beast.
Examples:
Rationale (mandatory braces): Failing to use braces with control statements can easily lead to bugs, such as when attempting to add a second statement to an if without a block.
Exception (opening brace): The opening brace can be moved to the next line if it doesn't fit on the same line, or to avoid confusion between a continued line of the control statement and the first line of the nested block when the two lines have similar indentation (see the second for example below).
If a block is long, it can be useful to annotate the closing brace with the control statement that began the block (see the while example below).
Examples:
Rationale (optional braces): Unlike other control statements, the use of braces in switch statements has no effect on control flow. In general, use braces when you need to define variables local to that case.
Rationale (use of FALLTHROUGH): It can be hard to tell at a glance whether a missing break statement is intentional or not. Documentation helps reassure the reader of the intended behavior, suppresses "code falls through" warnings from modern compilers, and also avoids the risk that someone (maybe even you) will accidentally insert a break during a code cleanup session.
Examples:
Rationale: Using an explicit dereference operation makes it clear to the reader that the thing being called is a function pointer and not an actual function.
Exception: Function pointers accessed through a structure do not need to be explicitly dereferenced if they are used like C++ instance methods.
Examples:
Rationale (explicit void): In C (as opposed to C++), an empty parameter list means that the function's parameters are unspecified. This prevents the compiler from checking the number and types of parameters at call sites, so functions which take no parameters should have an explicit void to indicate that fact to the compiler.
Rationale (opening brace on following line): Putting the brace on its own line gives an additional visual indication that the brace starts a new function.
Exception: If the function is both very short (1-2 lines) and defined with static linkage, it is acceptable to put the definition's opening brace on the same line as the function declaration. If the function body fits on the same line as the declaration, the entire function may be written on one line.
Examples:
Rationale: Using a consistent order for function definitions helps readers follow the program structure.
Exception: Short local functions which are only used in one place, such as a callback function whose pointer is passed to an external API, may be defined immediately before the function that uses them.
SIL code generally prefers the top-down style, but since C requires local (static) functions to be declared before they are used, each local function must be declared and defined at separate locations in the same file, which can get somewhat repetitive. Test sources in particular tend to put "helper" functions at the top, followed by the actual test cases (the "interface" functions by this rule), thus obviating the need for separate declarations.
When using inline comments on multiple consecutive or nearby lines, align the starting columns of the comments as long as doing so doesn't insert an inordinate amount of space between the code and the comment.
Examples:
Rationale (standard header format): Including a header of a fixed format allows readers to quickly determine the purpose of and calling pattern for a function without having to read the function's code.
Rationale (include at declaration): Keeping the header with the declaration allows readers to easily browse the list of declared functions, such as in a header file, without having to scan over the implementing code for each function.
Rationale (do not include at definition): Experience has shown that keeping two copies of the function header will quickly lead to documentation desyncs as function signatures change. This naturally does not apply to static functions which do not have a separate declaration (see order of function definitions above).
Exception: The full header may be omitted for short local functions which have no separate declaration and are defined close to their place of use, provided that a short comment describing the purpose of the function is included instead.
See the source code for details of the function header format. Note that the SIL function header format deliberately does not use markup tokens for tools like Doxygen; the headers are intended to be easily perusable by someone looking directly at the source code, without requiring separate tools to interpret the comments.
In general, an identifier's name should immediately tell the reader the purpose of the identifier, but it should be concise enough that its length does not obscure the structure of the code. For example, LIMIT would be a poor name for a global constant; the name tells us nothing about what sort of limit it is. But the same LIMIT might make perfect sense in a short function whose sole purpose was to bound its parameter to be less than a certain value, and indeed a longer name would serve no purpose except to clutter up the code.
Similarly, a file's name should make the file's purpose clear to someone looking at a directory listing, but should not be so long as to clutter log messages (which include the name of the source file which generated the message). In the case of filenames, it's acceptable to include the directory path when determining whether a filename is "clear"; for example, resource/core.c clearly refers to core functionality for resource management, and does not need to be expanded to resource/resource-core.c.
Avoid overusing abbreviations, since they can reduce readability by forcing the reader to stop and mentally expand the abbreviation each time the identifier is used. For example, in a function that uses a variable to hold a count of objects, nobj would be a poor name for the variable since its meaning is not immediately obvious to a reader unfamiliar with the code. num_obj would be better, but unless the variable is heavily used throughout the function, num_objects is more friendly to the reader. However, number_of_objects would be unnecessarily verbose, since a num_ prefix is generally understood to mean "number of".
Single-letter and similarly short variable names should be avoided except in cases where their meaning is obvious and generally accepted. For example, i is widely accepted as an iterator variable and may be used in that context, but it should not be used for a temporary variable, even in a limited scope. Similarly, short names for types or functions are acceptable when they are clearly derived from similar names in the standard libraries.
Examples:
Functions can be broken down into three major groups:
If a function fits in two or more groups, name it based on the one which best represents the function's overall behavior—though that may also be a sign that the function is too complex and should be refactored into multiple smaller functions (also see below regarding function complexity).
Rationale (part of speech): Using an appropriate part of speech gives the reader a useful hint as to the function's behavior without forcing the reader to check the function's documentation. For example, an accessor function named using a noun phrase reassures the reader that the function does not modify an object passed to it.
Rationale (prefixes): Prefixes or namespaces provide the reader with information about the type of data processed or operation performed by the function, again reducing the need for the reader to consult the function's documentation.
Examples:
Use the filename extensions listed below for each source file type:
Rationale (filename extensions): While not strictly required on modern operating systems, the filename extension is an accepted way to inform the user of the type of content in the file. Some programs (including compilers and IDEs) also use the file extension to guess the file's content type, and using a nonstandard extension would confuse such programs to the detriment of the user.
Rationale (unique names): All source files are compiled to object files with the same filename extension (typically .o). If two source files in the same directory have the same name but a different extension, their object files would collide, breaking the build. If it is necessary to have two source files in different languages with the same purpose (for example, when implementing a C++ interface to C functions), use the base filename for the source file with the most nontrivial code, and rename other files to avoid object file collision. For example: utilities.c, utilities-cxx.cc, utilities-objc.m
Rationale (non-alphanumeric characters): Non-alphanumeric characters may have special meanings to some systems, preventing files whose names contain those characters from being used properly (or at all!) on such systems. For example, quote characters are used on many systems to enclose filenames containing spaces; conversely, spaces are used on most systems to separate command arguments, and including a space in a filename can cause builds to break in unexpected ways. The only symbols accepted as safe across all systems are the hyphen and underscore. Non-ASCII characters should also be avoided because some users' systems may not be able to display them properly.
Note that due to limitations of the POSIX-style library archive (.a) format, code intended to be compiled into a static library on POSIX systems must also ensure that no two files which will be included in the library have the same name exclusive of directory name; thus, dir1/file.o and dir2/file.o cannot be included in the same static library. SIL is not designed to be compiled into a static library and thus does not follow this rule.
Examples:
C++ reserves a number of keywords which can also be reasonably used as identifier names; for example, try could be a counter for an operation which may need to be retried several times, and this can be used as an object pointer when implementing instance-method-like functions in C. As long as the names are appropriate for the uses to which they are put, they may be freely used in C code.
However, care is needed when such identifiers appear in header files, such as when used as structure field names. In this case, renaming the identifier is usually best, but if the identifier does not need to be referenced by C++ code (for example, if it is a parameter name in a function declaration), it is also permissible to bracket the header with a #define/#undef pair:
Rationale: Comments should provide additional information (in other words, "commentary") to the reader. Assume for this purpose that the reader understands the programming language better than you do, so any comments that simply state what the code is doing are useless. Instead, comments should explain why the code is written the way it is, to provide insight to a reader who does not understand the problem you are trying to solve. As a corollary, if you do feel the need to explain what the code is doing, it probably means the code itself is poorly written and should be fixed.
Notwithstanding the above, it can be useful to prepend comments to longer blocks of code which summarize the logic contained in the block. While these would ordinarily be discouraged as "what" comments (as opposed to "why" comments), they can help the reader quickly skim through larger functions without having to read the entire function, much like subheadings in a technical document. Other code styles recommend appropriately named subroutines for this purpose, but SIL style prefers not to extract blocks of logic unique to a particular function (see below regarding function complexity).
Examples:
Rationale: At a high level, functions can be considered the basic building blocks of programs. Accordingly, each function should do exactly one thing, and do it well; a reader should be able to look at a high-level algorithm and tell what it does just by reading the names of the functions it calls (see also the function naming rules). In particular, a function should never have side effects which are not obvious from the function's name.
Functions also enforce encapsulation of data, reducing the risk of unintended interactions between separate blocks of logic and increasing reusability of the code. Any algorithm which is repeated in more than one function (or more than once in the same function) is a good target for extraction into a separate function.
On the flip side, breaking a function up into several subfunctions forces a reader who wants to follow the code flow to jump back and forth between different places in the source code, which can lead to cognitive overload. For this reason, SIL style does not enforce a maximum size on functions, and instead prefers to keep code sequences unique to a particular function within that function, particularly when those sequences are short relative to the function as a whole. In cases where this results in a particularly large function, short summary comments above each block of code can help readers skimming the code to understand the logic more quickly (see above regarding use of comments).
While you should try not to introduce unnecessary computational complexity (for example, using a cubic-time algorithm when a quadratic-time algorithm is available), neither should you take "shortcuts" or "clever hacks" to cut down on execution time unless you have hard data demonstrating that such optimizations are of significant benefit to the program (or library) as a whole.
Rationale: This rule could also be phrased as, "Premature optimization is the root of all evil." The history of software development is littered with cases of programmers expending effort on optimizing routines which make no significant contribution to execution time in the first place—and introducing new, hard-to-find bugs as a result of their supposedly "clever" optimizations. Don't repeat their mistakes.
Examples: