last revised March 6, 2011
effective starting March 6, 2011
Introduction
Overview
Comments
Headers
Constants
Variables
Functions
Structs and Typedefs
Blocks
Security
Error Handling
Modules
Libraries
Makefiles
Sample Program
This document describes source code standards for C/C++ programs developed at Cliffson Solutions, LLC. Unless superseded by client requirements, these standards apply to all software developed by employees and subcontractors of this company for both internal use and client projects.
The purpose of these standards is to bring consistency to the structure of the source code files, including formatting and naming conventions. C and C++ compilers allow a large latitude in how the source code is organized. Although this flexibility has benefits in requiring only a relatively few, simple grammar rules, if some rules for structure and convention are not applied, the code can become very difficult to read and maintain. In fact, an entire group of programmers have taken this to extremes for their own amusement (see the book Obfuscated C and Other Mysteries, Don Libes, Wiley, 1993).
Most clients only care if the code does what it is designed to do, not how it is formatted. However, the practical consideration is that well-formatted, well-documented, and well-organized code is considerably easier to maintain and enhance, saving both the client and developers time and money in the long run. Given that a full-blown application (not just a one-use throw-away utility) may require on average as much as four times the original development time in debugging, long-term maintenance, and feature tweaks over its lifetime, an investment in creating good source code initially is easily paid back. Moreover, applying some discipline to how the code is organized, particularly in variable naming conventions, tends to encourage better logic and fewer side-effects and bugs.
We readily acknowledge that many of the conventions described here are strictly preferences and that coding formats and naming conventions can ignite religious debates on par with the original Crusades. However, the conventions described here have been developed based on decades of writing code and noting what works and what does not. In some cases these may result in a bit more typing, but any programmer worth his/her salt will 1) already be a decent typist, and 2) realizes that clear coding can save at least an order of magnitude in time trying to find functional blocks and in debugging.
We have adopted a slightly modified Allman-BSD format for block formatting, the slight modification being two-space indents instead of the traditional four spaces (two spaces are sufficient to visually indicate blocks without using an inordinate amount of horizontal real estate). This means that both open and close curly brackets that delineate blocks appear on lines by themselves. Many programmers use the compacted style popularized by the fathers of C, Kernighan and Ritchie (The C Programming Language, Kernighan and Richie, Prentice Hall, 1978, 1988), which place the opening curly bracket at the end of the line immediately before the start of the block. However, because modern display terminals are much faster and more flexible and printing technology is faster and cheaper than that available in 1978, such compression is no longer really necessary, and putting the opening bracket on an extra line by itself is well-worth the improvement in readability. See the section on Blocks below for more details.
We also use one- and two-letter prefixes on all variable names to indicate their types. Several other naming schemes and even other programming languages also do this, and the benefit to keeping track if a variable is a pointer or an integer inherently in its name is well-worth the time taken to type an extra letter or two each time it is referenced. Because part of C's power (and frustration) is loose type checking, keeping track of variable types and avoiding undesired operations (i.e., raising a pointer value to an exponent) requires unflagging care and attention during coding. Revisiting code after a prolonged absence where these conventions are not used requires a fair amount of time to rediscover what is doing what.
Moreover, such variables can be easily tied together by their base names: pbuf is a pointer to the string sbuf, and ibuf is a counter or index into the same character array. Using multiple indices in nested loops is not nearly as difficult if each index is named after its use. See the section Variables below for more details.
All code is written assuming a 80-column display. Long lines that are longer than 80 characters must be broken into multiple lines at the logical locations. For example, a long printf() statement with multiple arguments may be broken into more than one line at any comma separating those variables. Line continuation characters (\) may be used if appropriate and necessary. See the sections Blocks for more information.
Traditional C defines /* ... */ as comment delimeters. C++ introduced the use of the // comment prefix, but most modern C compilers will accept both styles even if the source code is pure C. Because the /* ... */ is easily applied to disable large chunks of code, we reserve its use for that purpose and encourage the use of multiple asterisks to draw attention to it. The // prefix is used to initiate comments, with each comment line requiring that prefix.
Comments that start with // are indented to the current level of code indenting.
Example:
int fnTest(int itest) { // this is a test function that returns the square of the supplied int // if error, return NULL; otherwise return 0 // sanity checks /***************************** /* disabled for now if (!test || (itest < 0)) return NULL; *****************************/ return itest * itest; }
We define three types of headers: headers at the top of all source code files, headers that separate major blocks of code, and function headers.
File Headers:
All source code files (*.c), including the main program code (containing the main( ) block), local function or library code (*.c as well), and header files (*.h), are required to contain file headers. The lines of these headers are delimeted with double slashes (//), and must contain the following:
An example of a header for a two-file project:
////////////////////////////////////////////////////////////////////////// // test.c // // // // a program to demonstrate bubble sort algorithm; demonstration is // // based on constants defined in test.h, and requires no user input; // // any command line parameters entered are ignored; output is text // // to stdout; running this program invokes two bubble sort functions, // // // // Written 2/27/2011, L.Pritchett, Cliffson Solutions, LLC // // Revised 2/27/2011, L.Pritchett, Cliffson Solutions, LLC // //////////////////////////////////////////////////////////////////////////
The corresponding header file:
////////////////////////////////////////////////////////////////////////// // test.h // // // // header file to define global constants and functions for test.c; // // refer to test.c for addtional comments and descriptions // // // // Written 2/27/2011, L.Pritchett, Cliffson Solutions, LLC // // Revised 2/27/2011, L.Pritchett, Cliffson Solutions, LLC // //////////////////////////////////////////////////////////////////////////
Dates should be in the format shown above, with four-digit years. Blank lines are used to improve readability.
During initial development of the code, if only a single programmer is making changes, a single revision line with a current date is sufficient. If more than one programmer is working on the same file, each programmer should have a separate revision line, with the latest revision line always appearing last. Details as to the types of revisions made are not required, as during a rapidly evolving project those types of notes should be entered into the project management system or added as the code is checked back into the software repository.
Once the code has gone into production, the last pre-release revision lines should be left intact and new revision lines should be added. At this point each revision line should include summary comments as to the changes made.
Section Headers:
Section headers are simply comment blocks that delineate major blocks of code, including:
These are used primarily as search aides, as well as to prevent the different blocks from simply flowing into one another. To make them stand out visually, they include a long line of = characters to cover the width of the code.
Example:
//========================================================================= // global variables and typedefs //=========================================================================
If appropriate, subheaders may also be used, built from + characters. Subheaders may be useful if a large number of functions are defined but may be grouped by type.
Example:
//========================================================================= // functions //========================================================================= //+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ // input functions //+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Function Headers:
Function headers are basically the comments that start each function definition. Unless the function is obvious and trivial, these headers are required immediately after the curly bracket starting the function definition and before the local variable declarations (or first line of code, if no local variables are used).
These headers should include the purpose of the function, expected inputs, return values, and the behavior upon encountering errors. If any of the supplied values are passed by reference and are modified by the function, this must be noted here as well.
Example:
int fnSquare(int icount) { // accept positive integer and return its square; if variable is invalid, // call error handler // sanity checks if (!icount || (icount < 0)) fnErr_Msg("invalid parm passed to fnSquare()"); return icount * icount; }
Scalar constants should be defined using #define statements. If the entire program is contained in one file, these statements should appear immediately after #include statements at the top of the file. If multiple source code files are involved, all #define statements should be collected in a master header (.h) file. If a project includes multiple source code files, constants which are applicable only within one source code file may be defined at the top of only that file.
So-called "magic numbers" (numbers that are basically constants) should not hardcoded into functions, but should be collected as a set of global #define statements. The rationale for this is easier future updates if those values change: rather than search for all possible occurences of a calculation which use a particular value, it can be changed in one place. This should be done not only for numbers which can reasonably be expected to change over time (e.g., number of employees), but also for any other situtation-dependent value even if those are not anticipated to change (e.g., number of vice presidents) – because at some point they may.
All constants should include the same one- or two-letter lowercase prefixes used in variable names (see Variables below), but the rest of the name is in capital letters. All constant names must be descriptive and meaningful. Abbreviations are allowed for the sake of typing and space efficiencies, but the overall name must be clear and not cryptic.
Examples:
#define sPGM_NAME "test_program" #define fPI 3.14159 #define iEMPLOYEES 334
Constants should be used to define array sizes, particularly character arrays that are used as strings. However, passing maximum string lengths to functions that need those values (see Functions below) should be done with sizeof(), rather than referencing the constant itself.
Constants that are defined in standard libraries may be used as they are, or they may be referenced by #define statements to create constant names with local naming conventions. Although this involves a little extra work, one of the advantages of remapping these names in a common local header file is that some compilers (e.g., a number of the Sun C/C++ compilers) use slightly different constant names than everyone else, and local remappings let us adjust these accordingly in one place without having to search-and-replace through the entire source code.
Global variables should be avoided if all possible. Functions that need to access to such variables should be passed those variables. This includes numerical values, pointers, open file and process pointers, and anything else that is subject to change. Constants must be defined in a single location (see Constants above) and may be referenced globally.
All variables should be named with descriptive and meaningful designations. Abbreviations are allowed if and only if the overall name is not cryptic. Variable names which are comprised of acronyms are allowed only if the acronyms are commonly recognized (e.g., url, ssl, html); otherwise, a short descriptive name must be used. The only exceptions to these rules are simple loop counters (e.g, i, j, k). However, nested loops should use more descriptive counters to make the loop logic more self-documenting.
All variables must include a one- or two-letter prefix that indicates the variable type or use. These are defined as:
Notice that we do not specify if a byte or an integer value is signed or unsigned. In applications where the distinction needs to be made, uc and ui can be used for unsigned chars (bytes) and unsigned ints, respectively. When these are used, type prefixes without the "u" are assumed to be signed.
Normally p is used generically to designate a pointer of any type. In those cases where pointer types are not apparent from the rest of the variable name, subtypes may also be appended. For example, ps and pt may be used to designate a pointer to a string and to a struct, respectively.
If global variables must be used, they must include at least one capitalized letter after the one- or two-letter type designator. All variables in main( ) also follow this convention. Mixed case names (CamelCase or camelBack notation), underscores, or a mixture of the two should be used as necessary for readability. All variables defined in a function, either explicitly or as formal parameters, are always lowercase. Adhering to these conventions will help avoid problems with accidentally modifying global variables in functions, as C's scope protections are pretty weak.
Examples:
int iPi; [global integer or integer used in main( )] float fTotal; [global float or float used in main( )] char sFirst_Name[50]; [global string or string used in main( )] for (i = 0; i < 10; i++) [okay for simple loop] int fnTest(int icount) [integer formal parameter in function prototype] float ftotal; [float defined in function] char surl[100]; [string defined in function]
Although many compilers pre-initialize some variable types (e.g., ints are set to 0), this behavior should not be depended upon to work every time. All variables should be explicitly initialized before use.
Traditionally, command line parms are defined in the prototype for main( ) as int main(int argc, char *argv[ ] ). However, in keeping with our prefix conventions, we use int main(int iparms, char *pparms[ ] ) or int main(int iparms, char **pparms) instead.
Function prototypes are always defined to aid compilers in validating passed parameters. The only exception to this is main( ), which is always defined as either main(void), int main(int iparms, char *pparms[ ] ), or main(int iparms, char **pparms), depending whether or not the program makes use of command line parameters. If main( ) is included in a source code file, it must appear at the top of the code section, after #include, #define, and global declarations are made. This means that all function prototypes must appear before main( ) to provide forward references to functions defined below main( ). This is done intentionally to take advantage of type checking inherent in function prototypes. Alternatively, function prototypes may be defined in a separate header (.h) file as extern prototypes. This is required if the program spans more than one source code file.
All function names are prefixed with fn, followed by a capital letter, followed by mixed-case characters, numbers, and/or underscores. This naming convention is consistent with using mixed-case names for global variables, and it helps to differentiate them from variable names. All function names must be meaningful, descriptive, and avoid acronyms that are not commonly known. For example, fnURL_Parse() is okay, but fnGDIC_ADC_Handler() needs to be more descriptive.
Functions may be defined to be local only within a given source code file. These essentially stay hidden from the main program and may be used when global functions defined in that file in turn need helper functions that should not or need not be exposed to the global scope. Regardless of the scope, these functions are subject to the same naming conventions and requirement for prototypes as global functions.
All functions and function prototypes must define their formal parameters and return value type, even (and especially) if none are expected. Functions which do not use passed variables must be declared as using void.
int fnWait_Until_Char_Received(void)
Functions which do not return values must indicate the return type is void:
void fnSleep(int iseconds)
NOTE: main( ) is always declared to return an int value. This is not optional, as main( ) always returns a success code to the operating system after it finishes. A return value of 0 traditionally indicates success. A non-zero value may indicate a problem or be part of the expected output from the program. The meaning of the return value must be clearly defined in the program header.
All functions should validate their incoming parameters, both that the parameter is not NULL (unless this is a valid possibility) and that values are within expected ranges. For example, if one of the parameters is the maximum length of the string to which a value will be copied, that integer cannot be NULL, nor can it be a zero or negative number. Just because the function is not a library routine and will never be called outside the current program is not sufficient reason to skip validating function parameters. Bounds checking should be applied to final results before being returned as well. See Error Handling below.
All function operations which may operate on input values of arbitrary length must include protections against buffer overflows. For example, a function which manipulates a string and returns that value in a variable passed by reference must require from the caller an integer maximum length value, usually supplied by the caller using sizeof(). Before the value is copied to the final location, the function must verify that sufficient space exists before performing the copy. Just because strings are defined to be very large does not not provide any assurance that a buffer overflow will not occur if this type of check is not made, and such an assumption should never be made.
Returning multiple values from functions may be accomplished by returning an array, pointer to temporary structure in memory, or by directly modifying formal parameters passed by reference. Directly setting global program variables as a way to return multiple values is not allowed.
Structs and typedefs may be declared either globally, within main( ), within a single source code file (when limited scope is desired), or even within functions, although this last case should be extremely rare. All of these are subject to the same naming conventions as variables in the same scope, with t ("type") as the prefix.
Examples:
typedef struct tList_Element { char *pname; char *pfile; } tList_Rec;
All program blocks are indented by two spaces per level. Tabs are never used, as they do not always render consistently across different terminal software. Editors which handle indents as an optimal mix of tabs (assuming tabs are equivalent to eight characters) and spaces must have this feature disabled.
Curly brackets which start blocks always appear on lines by themselves, indented to the level equal to the statement immediately preceeding the block. Ending curly brackets also appear on lines by themselves and are indented to the same level as the corresponding opening bracket.
Example:
int main(void) { int ilines; for (icount = 0; icount < 10; icount++ { printf("line #%d\n", icount); printf("total characters = %d\n", icount * 60); } return 0; }
All statements which are longer than 80 characters after indenting must be manually wrapped to fit in an 80-column display. Breaks in such lines should be made where convenient (e.g., at commas, logical operators such as && or ||, and other natural breaks). Long printf() statements with no natural breaks will need to broken into multiple printf() statements. Continuation lines should always be indented two spaces until the entire logical line is complete.
Example:
printf("this is an example of a really long line: %s\n", bLong_Line ? "absolutely long" : "not!");
Note that single-line blocks do not require curly brackets unless the statement immediately above is broken into continuation lines which are already indented two spaces. In that case, adding curly brackets is required for readability.
Example:
for (i = 0; i < 10; i++) printf("%d\n", i);
Example:
if (bFirst_Name_Found && bLast_Name_Found && bHighSchool_Grad && bEmployed && (cGender == 'M')) { printf("Found match on criteria\n"); }
Variable definitions should be combined by type as much as possible. The order that these appear should follow the sequence:
All variable names should be lined up to start at the same column for readability, with continuation lines indented two spaces from that point.
Example:
char *pbuf; char sbuf[100]; int icount, ilevel, ired, iblue, igreen, iemployees, irecords, inames, itypes, iblocks; char cin, cout; char bdone;
Global constants using #define statements should also be indented so the values start at the same column.
Example:
#define sCOLOR "blue" #define sTABLE_NAME "people"
All program inputs, including databases, files, Web forms, and keyboard input, are checked for missing values, out-of-range values, and over-length values (buffer overflows). All input is considered to be suspect until validated in this fashion.
In addition, all data from Web forms is validated for expected value ranges, and all mandatory data elements must be present. Although Web applications use client-side validation and pre-processing functions (e.g., Javascript and AJAX), all such data must be rechecked when submitted to processing scripts (server-side validation). Javascript routines are far too easy to hack, and all values must be verified.
All data destinated for storage in databases must be validated and sanitized, particularly to avoid SQL injection attacks. All data which may influence calls to operating system functions or shells must be validated and sanitized to prevent injection of malicious software or damaging commands.
Uploaded files, particularly from the Web, should be restricted to the smallest possible set of file types, as validated by file extension and examination of file headers. If these files are made available to other users, they must be subjected to malware scanners.
If session variables are used to track system logins and access controls, the application must provide a logout option to the user. In addition, all such session variables must include application and server timeouts. Network addresses, browser IDs, and cookies may also be used to control open sessions.
Access and handling of personally identifiable information (PII), credit card information, and other sensitive data is strictly controlled. Details of how this data is handled is in our Privacy Policy document.
All sensitive constants defined in header files, such as passwords or encryption keys, must be obfuscated and not left in the source code in plain-text format. This prevents someone from easily discovering these values by running the UNIX strings utility against the compiled executable programs. With that precaution, compiled C/C++ programs are considerably more secure than interpreted script languages than PHP. However, access to source code for sensitive applications must be carefully controlled.
For more information regarding security, refer to our Security Policy document.
Errors may be handled directly by the section of the program that encounters the problem, or a generic error reporting function may be invoked. The latter is preferable, because the format of the error message may be easily modified in one place.
Functions may also return values to the caller to indicate an error has occurred, in which case the caller is reponsible for checking and acting on these. Functions which expect callers to handle errors must clearly document this in the function headers. In such cases, the caller must always check for valid results and never assume a good value is returned.
If the application is a Web application, errors should be detected and reported to the user as part of a valid HTML page, with optional e-mail notifications to the Web site operator. Otherwise, error messages should be written to stderr, sent to syslog services, or written directly to an application-specific log file, again optionally e-mailing appropriate responsible parties. Command line scripts should always set an integer return value from main( ) for use by crond or detection by subsequent scripts in a processing chain. Optionally, e-mail messages to the appropriate maintainers or group e-mail alias may be generated. For critical functions other alert mechanisms, including text messages and SNMP traps, may also be used.
Messages should always be meaningful and never just include an error number. Error numbers may be included to help the programmer quickly locate the problem, although a better approach is to include the name of the function in which the error occurred in the message.
A generic error function should accept the name of the caller and a string specifying the type of error. If writing to log files or e-mailing technical support personnel, the specific value that caused the problem should be included, although this may not be possible in cases of sensitive information (e.g., personally identifiable information or credit card information). Such logged or e-mailed messages should also include the name of the program and the date/time when the error occurred.
If the error is not fatal, the program should gracefully recover and resume; otherwise, open resources should be closed and the program gracefully ended. In every case, the user should be notified that the problem has occurred, what is being done about it, and what the user's next step is. The user should always be left knowing what happened and that someone has been notified about the problem. Error reporting pages should never be dead ends and should always include links to continue browsing.
Programs that depend on processes and technology outside themselves must always provide error checking, flag files, and timeouts, as appropriate. For example, any application that uses network sockets must implement timeouts. If data is transferred over network connections, some form of data integrity check must be performed. Programs that depend on the successful completion of other programs (e.g., data feeds and processing) must use flag files or other means to determine if the previous processing has finished successfully, and not just blindly assume the data is ready at a given time.
Large programming projects commonly split source code development across multiple programmers working with multiple source code files. All such files are maintained within a single directory on one of our servers, although subdirectories may be created along functional lines to make management and organization easier. All projects use a software repository for backup and version control purposes. All programmers working on such projects will be clearly notified which pieces are their responsibility and how function interfaces are expected to work.
Commonly used functions with applicability to multiple programs may be gathered into a centrally compiled and maintained library. Because the impact of these functions is potentially much larger, all changes to function prototypes, calculations, return values, and error handling must be carefully reviewed by the management of Cliffson Solutions before being implemented. All such changes must be thoroughly documented and tested against all programs known to use the affected functions.
Only the Manager of Applications Development has the authority to update working libraries.
Makefiles are used to define dependencies and control compilations of C/C++ projects. These files are subject to the same header requirements, need for intelligible comments, and version controls as source code as detailed in the rest of this document.
The following is an example of a simple, two-file program demonstrating all major expected components in source code developed at Cliffson Solutions, LLC.
Main file:
////////////////////////////////////////////////////////////////////////// // /develop/test1.c // // // // [purpose of program]; [dependencies]; [expected inputs];[expected // // outputs];[other relevant notes regarding its operation] // // // // [identification of client or final destination] // // // // Written [mm/dd/yyyy], [author] // // Revised [mm/dd/yyyy], [author] // ////////////////////////////////////////////////////////////////////////// #include "test1.h"; //======================================================================== // main program //======================================================================== int main(void) { char smessage[] = "This is a test"; if (fnExample1(iCOUNT, smessage)) { printf("Program encountered an error -- aborting\n\n"); return 1; } printf("Program completed successfully\n\n"); return 0; } //======================================================================== // functions //======================================================================== void fnErr_Msg(char *pcaller, char *pmsg) { // print specified error message to stderr fprintf(stderr, "Error encountered in %s: %s\n\n", (pcaller && strlen(pcaller)) ? pcaller : "[unknown function]", (pmsg && strlen(pmsg)) ? pmsg : "[unknown problem]"); return; } int fnExample1(int icount, char *pmsg) { // repeats specified string to stdout; return 0 on success 1 on error int i; if (bPGM_DEBUG) printf("** entering fnExample1()\n"); // sanity checking if (!icount || (icount ≤ 0) || !pmsg || !strlen(pmsg)) { fnErr_Msg("fnExample1()", "bad parameter passed to function"); return 1; } for (i = 0; i < icount; i++) printf("%s", pmsg); if (bPGM_DEBUG) printf("** leaving fnExample1()\n"); return 0; }
Main file:
////////////////////////////////////////////////////////////////////////// // /develop/test1.h // // // // header file for test1.c; refer to test1.c for full documentation // // and comments // // // // Written [mm/dd/yyyy], [author] // // Revised [mm/dd/yyyy], [author] // ////////////////////////////////////////////////////////////////////////// #include <stdio.h> #include <stdlib.h> #include <string.h> //======================================================================== // global constants //======================================================================== #define sPGM_NAME "test1" #define sPGM_VER "1.0" #define bPGM_DEBUG 0 #define iCOUNT 10 //======================================================================== // global variables [if any; includes global typedefs and structs] //======================================================================== [type] [name] [= default_value, if appropriate] //======================================================================== // function prototypes //======================================================================== extern void fnErr_Msg(char *pcaller, char *pmsg); extern int fnExample1(int icount, char *pmsg);