Sunday, November 15, 2009

An Introduction to C++ Programming - Part 4

Why templates

The last two articles made some effort in perfecting a stack of integers. Today, I want a stack of doubles. What do I do? Rewrite it all and call it doublestack? It's one alternative. Then I want a stack of char* (another rewrite) and a stack of bicycles (yet a rewrite, and a bizarre view). And then, of course, we end up with 4 versions of stack, all with identical code, just one data member in the internal struct with different type for all of them. Sigh...
There's of course the C kind of solution, always make it a stack of void*, and cast to whatever you want (and just hope you won't cast to the wrong type). No, the latter alternative isn't an alternative in my book. Type safety is essential for writing solid programs (Smalltalk programmers disagree). The former alternative isn't an alternative either. Just think of this little nightmare, 4, at least, more or less identical pieces of code. When you find a bug (when, not if), you have to correct it in as many places. Yuck... OK, so I guess I've explained what templates are for. They're the solution to the above problem. But how?

Function templates

In the first C++ article, I wrote a set of overloaded functions called "print", which printed parameters of different types. The code in each of those functions was exactly the same (exactly the kind of redundancy that's always to be avoided). This is an ideal place for a template. Here's what a template function for printing something can look like:

template
void print(const T& t)
{
cout << "t=" << t << endl;
}
The keyword "template" says we're dealing with a template. When declaring/defining a template, there's always a template parameter list, enclosed in a '<', '>' pair. The template parameter for this template is "class T". This means that the template deals with a type, some type, called T. Despite the keyword "class", T does not have to be a class, it can be any of the built in types, enumerations, structs, and so on (if you have a modern compiler, it will accept the keyword "typename" instead of "class", although "class" will still work). The name "T" is of course arbitrarily chosen. It could be any name. After this comes the function definition, where T is used just as if it was a legal type.
For writing a template function, that's really all there is. Here are some examples using it:

int main(void)
{
print(5); // print
print(3.141592); // print
print("cool"); // print
print(2); // print again.
return 0;
}
Weird? OK, time for some demystifying. The code for the template, is not a function. It's a function template, something which the compiler uses to create functions. This is very much like a cookie cutter. Once you have a cookie cutter, you can make cookies with the shape of the cutter. More or less any kind of cookie can be made with that cutter. When the compiler reads the function template, it does pretty much nothing at all, other than to remember that there's a function template with one template parameter and the name "print". When the compiler reaches "main", and sees the call to "print(5)", it looks for a function "print" taking an "int" as a parameter. There is none, so it expands the template function "print" with the type "int", and actually makes a new function. Note that this is done by the compiler at compile time. The same happens for the other types. The compiler always first checks if there is a function available, and if there isn't, it tries to create it by expanding the function template. This order of things is necessary to avoid unnecessary duplication. After all, "print(2)" uses the same function as "print(5)" does, rather than creating yet another copy of it. Let's compile and run:

[c:\desktop]icc /Q temp_test.cpp

[c:\desktop]temp_test.exe
t=5
t=3.14159
t=cool
t=2

[c:\desktop]
Although it does not seem like it, type safety is by no means compromised here. It's just seen in a somewhat different way. For the function template "print", there's only one requirement on the type T; it must be possible to print it with the "<<" operator to "cout". If the type cannot be printed, a compilation error occurs. To test it, here's what GCC says when trying to print the "intstack" from last month:

[c:\desktop]gcc temp_test2.cpp -fhandle-exceptions -lstdcpp
temp_test2.cpp: In function `void print(const class intstack &)':
temp_test2.cpp:285: no match for `operator <<(class ostream, class
intstack)'
GCC delivered a compilation error, since the type "intstack" cannot be printed with "<<" on "cout" (the error message says "ostream", which is correct. We'll deal with that later this fall/early winter, in one or a few articles on C++ I/O). Here the compiler generated a new function, called a template function, where every occurrence of "T" (only one, in the function parameter list) is replaced with "intstack". After having generated the function, it compiled it, and noticed the error. Note that a function is not generated from a function template until a call is seen (the compiler cannot know what types to generate the function for before that).
Please note the different meanings of the terms "function template", and "template function". The "function template" is what you write. It's a template from which the compiler generates functions. The compiler generated functions are the template functions. The "function template" is the cookie cutter, while the "template function" is the cookie. One example of a template function above, is "print()" (i.e. the "int" version of print).

Templates and exceptions

As you may have noticed, I didn't write an exception specifier for the "print" function template. This was not a mistake, nor was it sloppiness. The drawback with templates is that they make writing exception specifiers a bit difficult. I could try to make the promise that the function "print" does not throw exceptions, by adding the exception specifier "throw ()", but that would not be wise. The problem is that I cannot know what kind of type T will be, and I cannot know if operator<< on that type can throw an exception or not. What if it does? If so, and the function template had an empty exception specifier list, "unexpected" would be called, and the program terminate. Not nice. This problem is something I strongly dislike about C++, but this is how it works, and there's not much to do about it. I wish there was a way to say "The exceptions that might be thrown are the ones from operator<< on T" but there is no way to say that, other than as a comment. Note that not writing an exception specifier means that any exception may be thrown.

Class templates

Just as you can write functions that are independent of type (and yet type safe!) we can write classes that are independent of type. In a sense, class templates exist as builtins in C and C++. You have arrays and pointers (and references) that all act on a type, some type. The type they act on does not change their behaviour, they're still arrays, pointers and references, but of different types. Let's explore writing a simple class template, by improving the old "Range" class from lesson 2. In case you don't remember, the original "Range" looks like this:

struct BoundsError {};
class Range
{
public:
Range(int upper_bound = 0, int lower_bound = 0)
throw (BoundsError);
// Precondition: upper_bound >= lower_bound
// Postconditions:
// lower == upper_bound
// upper == upper_bound

int lowerBound() throw ();
int upperBound() throw ();
int includes(int aValue) throw ();
private:
int lower;
int upper;
};
This class is a range of int. There's no reason, however, why it shouldn't be a range of any type. Writing a class template is in many ways similar to writing a function template:

struct BoundsError {};

template
class Range
{
public:
Range(const T& upper_bound = 0,
const T& lower_bound = 0);
// Precondition: upper_bound >= lower_bound
// Postconditions:
// lower == upper_bound
// upper == upper_bound
// Throws:
// Bounds error on precondition violation
// Whatever T's copy constructor throws.
// Whatever operator < on T throws.

const T& lowerBound() throw ();
const T& upperBound() throw ();
int includes(const T& aValue);
// Throws: Whatever operator>= and operator <= on T
// throws.
private:
T lower;
T upper;
};
As can be seen, after "template ", on line 3, T is used just as if it was a type existing in the language. I've changed the constructor so that it accepts the parameters as const reference instead of by value. The reason is performance if T is a large type (if passed by value, the parameters must be copied and the copying may be an expensive operation). I've also removed the exception specifier, and instead used a comment, since after all, there is no way to know if T throws anything. "lowerBound", "upperBound" and "includes" uses const T& instead of value, for the same reason as the constructor does. "lowerBound" and "upperBound" can safely have empty exception specifiers, since those member functions do not do anything with the T's. They just return a reference to one of them. "includes" on the other hand, does need the unfortunate comment.
Time for the implementation, which will include some news:

template
Range::Range(const T& upper_bound,
const T& lower_bound)
: lower(lower_bound), // copy constructor
upper(upper_bound) // copy constructor
{
if (upper < lower) throw BoundsError();
}

template
const T& Range::lowerBound() throw ()
{
return lower;
}

template
const T& Range::upperBound() throw ()
{
return upper;
}

template
int Range::includes(const T& aValue)
{
return aValue >= lower && aValue <= upper;
}
The syntax for member functions is very much the same as that for function templates. The only difference is that we must refer to the class (of course), and we must specify that it's the template version of the class, by adding "" after the class name. The reason is that we're not dealing with a complete class, but with a class template, and we must be explicit about that. We must also precede every member function with "template ". There isn't much more to say about this. Let's have a look at how it's used:

#include

int main(void)
{
Range ri(100,10);
Range rd(3.141592,-3.141592);
if (ri.includes(55))
{
cout << "[" << ri.lowerBound() << ", "
<< ri.upperBound() << "] includes 55"
<< endl;
}
if (!rd.includes(62))
{
cout << "[" << rd.lowerBound() << ", "
<< rd.upperBound()
<< "] does not include 62" << endl;
}
return 0;
}
Take a careful look at the syntax here. To use a class template, you must explicitly state what type it is for. There is unfortunately no way to say "Range(5,10)" and have the compiler automatically understand that you mean "Range(5,10)". As with function templates, a class template is expanded when it's referred to, so when the compiler first sees "Range", it creates the class, by expanding whatever is needed. The compiler will also treat every member function just as any template function, i.e. the code will not be expanded until it is called from somewhere. The above code calls all members of "Range", but had "includes" not been called, it would not have been expanded. One unfortunate side of this is that "includes" could actually contain errors, and this would be unnoticed by the compiler, until "includes" was called.

Advanced Templates

Now that the basics are covered, we should have a look at some power usage (this section is abstract, so it may require a number of readings).
One unusually clever place for templates, is as something called "traits classes". A traits class is never instantiated, and doesn't contain any data. It just tells things about other classes, that is its sole purpose. The name "traits class" is odd. Originally they were called "baggage classes", since they're useless on their own, and belong to something else, but for some reason some people didn't like the name, so it was changed. My intention is to write a traits class, which tells the name of the type it is specialized for (explanation of that comes below), and to write a special template print, which prints ranges looking like the constructor call for the range, and finally, a function template, which is used to create ranges, without needing to specify the type. When done, I will be able to write:

print(make_range(10,4));
and when executed, see:

Range(10,4)
Magic? No, just templates! Here we go...
A traits class, is a simple class template. The traits class needed here, is one that tells the name of a type. The class template just looks like this:

template
class type_name
{
public:
static const char* as_string();
};
That is, it holds no data, and the member function is declared as "static". This is the way traits classes usually look. No data, and only static member functions. A member function declared static, is different from normal member functions, in that it does not belong to an instance, but belongs to the class itself. Here's an example:

class A
{
private:
int data;
public:
void f(void);
static void g(void);
static void h(void);
};

void A::f(void)
{
cout << data << endl;
}

void A::g(void)
{
cout << data << endl; // error! Cannot access data.
}

void A::h(void)
{
cout << "A::h" << endl;
}

int main(void)
{
A a;
a.f(); // prints something.
a.h(); // prints "A::h"
A::h(); // also prints "A::h"
A::f(); // Error, f is bound to an object, and must be
// called on an object.

return 0;
}
"A::g()" is in error, because it's declared static, and thus not bound to any object, and as such cannot access any member data, since member data belongs to objects. The calls "a.h()" and "A::h()" are synonymous. Since "h" is not tied to an object, it can be called through the class scope operator "A::", which means it's the "h" belonging to the class named "A".
Calling "A::f()" is an error, since it is not static. This means it belongs to an object, and must be called on an object (through the "." operator).
Now back to traits classes. The whole idea for traits classes is one of "specialization". The class template is the general way of doing things, but if you want the class to take some special care for a certain type, you can do what's called a specialization. A member function specialization is usually not declared, just defined, like this:

const char* type_name::as_string()
{
return "char";
}
Of course, if you have a top modern compiler, you'll get a compilation error. The syntax has changed, so compilers very much up to date with the standardization requires you to write like this:

template <>
const char* type_name::as_string()
{
return "char";
}
A minor, but in a sense, understandable difference. The "template <>" part clarifies that it's a template we're dealing with, but the template parameter list is empty, since we're specializing for known types. This is how traits classes usually look. They have a template interface, the class, which declares a number of static member functions. Those member functions are intended to tell something about some other class. Normally, the template member functions are not defined, instead specializations are. Their purpose is only to tell something about other classes, nothing else.
Now, we can use the "type_name" traits class for "char" as follows:

cout << type_name::as_string() << endl;
If we try for a type we haven't specialized, such as "double", we'll get an error when compiling. You can of course make any specializations you like. Please add all the fundamental types. Now over to the print template, which with the above seems fairly simple. It's supposed to accept an instance of a "Range", and print it, just as the constructor call for the "Range" was done. Piece of cake:

template
void print(const Range& r)
{
cout << "Range<" << type_name::as_string()
<< ">(" << r.upperBound() << ", "
<< r.lowerBound() << ")" << endl;
}
Here we see two new things; the parameter for the function template need not be the template parameter itself. It needs to be something that makes use of the template parameter, though (for all except the absolutely newest compilers, all template parameters must be used in the parameter list for the function). The other new thing is how elegantly the "type_name" traits class blends with the function template. For being such an incredibly simple construct, the traits classes are unbelievably useful. Note also that this means we cannot print ranges of types for which the "type_name" traits class is not specialized. Now we're almost there. We can now write:

print(Range(10,5));
And it will work (if we specialize "type_name::as_string()", that is). Now for the last detail; the function template that creates "Range" instances.

template
Range create_range(const T& t1, const T& t2)
{
return Range(t1,t2);
}
Doesn't seem too tricky, now does it? There actually is no catch in this. The function template is by the compiler translated to a template function, using the types of the parameters. If the types differ in a call, the compiler will give you an error message. When the type is known, it will know what kind of "Range" to create and return. We can now write:

print(make_range(10,5));
just as we planned to. Neat, eh? If you want to learn more about traits classes, have a look at Nathan Meyers traits article from the June '95 issue of C++ Report.

Exercises

  • Biggie: Rewrite last months "intstack" as a class template, "stack" What happens if the copy constructor, operator== or destructor of T throws exceptions?
  • When can you, and when can you not use exception specifiers for templates?
  • What are the requirements on the type parameter of the templatized "Range"? Can you use a range of "intstack"?
  • What are the requirements on the type parameter of the templatized "stack"?

Recap

Quite a lot of news this month. You've learned:
  • how to write type independent functions with templates, without sacrificing type safety.
  • how the compiler generates the template functions from your function template.
  • about template classes, which can contain data of a type not known at the time of writing.
  • that templates restricts the usefulness of exception specifiers.
  • how to specialize class templates for known types.
  • how to write and use traits classes.

No comments:

Post a Comment