Using C++ has made some aspects of my implementation easier while making other areas more complex. Indeed, these implementation hurdles I have encountered provide me with many ideas to discuss in this column. This issue is no exception.
Recently, I have been wrestling with when and when not to use statics in my programs. The problem at hand was how to provide, and guarantee, the single instance of an object. One way to do this is to use a static data member of a class. The static keyword is one of those constructs of C++ that takes on multiple meanings -- or as I like to say ``multiple personalities'' -- hence, the allusion to schizophrenia in the title.
The confusion over statics comes from its multiple interpretations. When you see the keyword static, it usually takes on one of two interpretations. It may indicate that the variable is ``allocated once at a fixed location'' as opposed to being allocated on the stack. The other interpretation indicates that the variable or function is ``local to a particular scope'' and is not visible everywhere throughout the program like a global variable might be.
Aside from the fact that static takes on different meanings, depending on its context, in some situations, the meanings actually combine so that both interpretations come into play. Furthermore, statics also play a special role with C++ class members. Understanding the usage of statics and their implications will be the focus of this column.
Before embarking on our journey into the world of statics, it is useful to recall some C++ programming language constructs: translation units, storage classes, and scope. For more precise definitions please consult the ARM or the C++ Draft Working Paper .
A C++ program usually consists of several class definitions, perhaps some global variables, class member functions, and other routines. Most programs are also split up into several files. It is typical to have a header file and a source file for each class. The main program might be in its own file as well. As each file is compiled, the code is placed in a unique translation unit. Later, when the program is linked, the various translation units are combined together into a single executable file. As we discuss the remainder of this column, you may use translation units and files interchangeably.
Each identifier (variable name, function name, typedef, class name, etc.) is contained in one or more scopes. C++ uses three primary kinds of scope: file scope, local scope, class scope. The scope of an identifier describes its visibility and determines whether the identifier clashes with identifiers in other parts of the program.
Recall that a translation unit consits of one or more header files and a single source file. An identifier is said to be in file scope if it is not part of a function or class definition. File scope is the outermost scope of a program and encloses both local and class scope. Variables at file scope can be used at any point in the translation unit from the point of their declaration. Variables defined at file scope are also called global variables. Non-static global variables are visible to other translation units.
Variables declared in a function or more formally in a block have local scope. Each function defines a new block and has a distinct local scope. Blocks can also be nested, so that a name declared in a block is local to that block and all blocks contained in it. Arguments to a function are treated as belonging to the outermost block of the function.
Each C++ class also defines a unique scope for its members. We say that the data and function members of a class have class scope. Consider the following class definition:
class A { public: A(); void f(); enum {MAX=80}; int _i; };Identifiers in class scope, such as f() and _i, must be accessed so that the compiler can determine its class. This is done in one of three ways: by applying the . operator to an object instance: a.f() or a._i; by applying operator -> to a pointer to an object instance: aPtr->f() or aPtr->_i; or by applying the :: scope operator to the class type name: A::MAX.
Of course, the member functions, derived classes, and friends of the class do not need to use these operators and can access the static members directly subject to the constraints of the C++ class access specifiers.
Every variable in a C++ program has an associated unit of storage somewhere in memory. C++ language gurus have dubbed this concept the storage class of a variable, of which, C++ has two kinds: automatic and static.
Most variables that you declare, typically as local variables to a function, or the arguments to a function are automatic. This means that they are created and destroyed -- on the stack -- by the compiler as they go in and out of scope. Automatic variables are local to the block they are contained in and the compiler takes care of creating the variables upon entry to the block and destroying them upon exit from the block.
Variables that are tagged with the keyword static persist from the point of execution in which they are created until the program is terminated. Static variables may be global to a particular translation unit or global to a class.
Variables that are local to a function, usually have automatic storage, and are created upon entry to the function and destroyed upon exit. However, if a local variable to a function is tagged static, the variable will persist across multiple invocations of the function.
Suppose we are developing a parser and we wish to count the number of tokens created. We can use a local static variable for this purpose:
void AddToken (const char* token) { static int numTokens; AddSymbol (token, numTokens++); }Because the local variable token is static it is guaranteed to have the initial value zero, since an initial value was not specified in the declaration. Each time AddToken is called it passes the symbol name, token, along with the current value of numTokens, to AddSymbol. The count is incremented after the symbol is added so that it will have the correct value when the next token is added to the table.
Although static local variables retain their value from the point they are created, they still behave as local variables; they cannot be accessed outside the scope of the function. In this case, numToken cannot be accessed outside of AddToken.
When does numTokens first get created, you ask? That is a good question! The answer: the first time AddToken is called. A local static variable, therefore, will only be created when it is needed.
A local, static variable, such as numTokens in AddToken, is only available within the function body in which it is defined. It cannot be accessed outside the scope of the function body.
What if we want the counter to be available to several functions used by the parser? You cringe at the thought of adding a global variable that can be seen throughout the program. If all of the parser routines happen to be in a single translation unit, we can declare the counter at file scope and make it static:
// File: parser.cpp // Globals local to this file: static int numTokens; // etc ...Normally, declaring a variable at file scope results in the creation of a global variable, which other translation units can access by using an extern declaration. However, by preceding the declaration by the keyword static, the compiler restricts the use of that variable to the translation unit (file) in which it is declared. Furthermore, these variables do not conflict with variables in the global namespace or with statics in other translation units, even if they use the same name.
In general, you should try to make your variables and functions as local as they possibly can be. Only make them visible to the parts of the program that need to see them. It is also a good practice to avoid the use of global variables. Your code will be more portable, easier to maintain and debug, and less likely to interfere with code from other libraries.
A C++ class, as you well know, can contain data and functions. It turns out that both data and function members of a class can be made static. Static members behave like ordinary members in many ways. They obey the same C++ class access rules provided by the keywords public, private, and protected. They are contained within the scope of the class, and do not cloud the global namespace or names used in other classes. Finally, they must be accessed using the . or -> operators as discussed above to get inside the class scope boundary. However, unlike normal class members, static members may be accessed directly by applying the :: scope operator to the class type name. It is not even necessary that an object of the class to be constructed to access the static members.
Static data members are shared by all object instances of that class. Each class instance sees and has access to the same static data. The static data is not part of the class object but is made available by the compiler whenever an object of that class comes into scope. Static data members, therefore, behave as global variables for a class.
One of the trickiest ramifications of using a static data member in a class is that it must be initialized, just once, outside the class definition, in the source file. This is due to the fact a header file is typically seen multiple times by the compiler. If the compiler encountered the initialization of a variable multiple times it would be very difficult to ensure that variables were properly initialized. Hence, exactly one initialization of a static is allowed in the entire program.
Consider the following class, A, with a static data member, _id:
// File: a.h class A { public: A(); int _id; };The initialization of a static member is done similarly to the way global variables are initialized at file scope, except that the class scope operator must be used. Typically, the definition is placed at the top of the class source file:
// File: a.cc int A::_id;Because no explicit initial value was specified, the compiler will implicitly initialize _id to zero. An explicit initialization can also be specified:
// File: a.cc int A::_id = 999;In fact, C++ even allows arbitrary expressions to be used in initializers:
// File: a.cc int A::_id = GetId();Static data members can be accessed through the . and -> operators just like other class members. However, because there is just a single instance of the static data member, it is common to imply use the scope operator: A::_id.
A member function of a class may also be made static. Like static data members, there is only one copy of the code for a static member function and that routine is available to all instances of the class.
Unlike ordinary member functions, static member functions do not have a this pointer. This is a surprising repercussion to programmers when they encounter static member functions for the first time. Without a this pointer, static member functions cannot access any non-static members of the class. In fact, if you attempt to access a non-static member (data or function), you are implicitly invoking the use of the this pointer, and the compiler will complain.
The primary use of static member functions is to access and manipulate the static data members of a class. For example, in the A class above, instead of making _id public, we could provide a static member function to return its value:
// File: a.h class A { public: A(); static int Id () { return _id; } private: static int _id; };Static function members can be accessed through the . and -> operators just like other class members. However, like static data members, it is common to simply use the :: scope operator: A::Id().
When you include the code for a C++ member function in the class definition (header) or mark a function with the keyword inline you are telling the compiler to inline the code for this function, if possible. That's right, ``if possible.'' The fact that you have indicated that the function should be inlined is only a hint to the compiler. In fact, inlining of functions is performed differently by each compiler.
When a function is inlined the code for the function is expanded (generated) at the point of the call, rather than generating a call to a function routine. The point of inlining is that for a ``small'' function, the execution cost of the expanded code may in fact be faster than the call needed to make an explicit function call. You should only be inlining functions that are really trivial or have a small function body.
Not all functions can be inlined. Complicated functions, such as those that contain loops or are recursive, are not inlined. Sometimes the particular way you call a supposedly inlined function makes it impossible to perform the expansion. For example, consider the following inline function:
inline void foo () { }and suppose you create a pointer that points to foo:
void (*pfoo)() = foo;Calling foo() directly in your code results in the code being inlined; however, when you call foo() using the pointer pfoo, the compiler cannot inline the code. Instead, the compiler must generate a function body for foo(), otherwise, there is no way to come up with an address for the function, since an inlined function does not have an address!
You are probably thinking, what does this have to do with statics? Well, when the compiler cannot inline a function, it generates a static definition of the function instead. In other words, the code for the function is generated just like the code for a static function. This means that every translation unit in which an inline function cannot be inlined will get its own copy of the function, which can increase the size of your program. Most compilers will report warnings when a function cannot be inlined. You should pay attention to these warnings to determine if some functions should not be inlined in the first place.
There is one more detail with inline functions that you should be aware of. Suppose you have chosen to use a local static variable in an inline function:
inline void foo() { static int index; // ... }Next, suppose for some reason, the compiler cannot inline this function in several translation units. Each translation unit will then have its own static definition of foo(). But, this means that each version of foo() may also have its own copy of the static variable index! Clearly not what you want. So avoid putting static variables inside of non-member inline functions.
When you use static variables it is important to understand when they are created and destroyed. Static variables defined at file scope are initialized and constructed sometime before their first usage in the translation unit in which they are defined. Static objects local to a function are created upon the first time that function is invoked. If the function is never invoked, the static is never created.
Classes that are local to a function cannot have static data members, so all classes which have a static data member must be global. The static data members of a class are initialized following the same rules as static variables defined at file scope, and will be initialized sometime before their first usage in the translation unit in which they are defined. Also realize that the static data members of a class will be created even if you do not declare any objects of that class.
Once a static object is created it persists during the lifetime of the program's execution. We have not said anything about the destruction of static objects. All static objects are destroyed (their destructor is called) at program termination when returning from main() and when calling atexit(). Destruction is done in the reverse order of construction. A destructor for a static object is called only if the the object was created and initialized. This means that a static object local to a function will not need to be destroyed if the function was never called.