Sunday, November 15, 2009

An Introduction to C++ Programming - Part 8

Short recap of inheritance

Inheritance can be used to make runtime decisions about things we know conceptually, but not in detail. The employee/engineer/manager inheritance tree was an example of that; by knowing about employees in general, we can handle any kind of employee, including engineers, managers, secretaries, project leaders, and even kinds of employees we haven't yet thought of, for example marketers, salesmen and janitors.

A deficiency in the model

While this is good, it's not quite enough. The classic counter example is a vector drawing program. Such a program usually holds a collection of shapes. A shape can be a square, a circle, a rectangle, a collection of grouped images, text, lines and so on. The problem in the model lies in the common base, shape. You know a number of things for shapes in general; you can draw them on a canvas, you can scale them, you can rotate them and translate them. The problem is, how do you do any of these for a shape in general? How is a generic shape drawn or rotated? It's impossible. It's only the concrete shapes that can be drawn, rotated, translated, scaled, etc. This in itself is not a problem, we can create our base class ``Shape'' with virtual member functions ``drawOn(Canvas&)'', ``rotate(double degrees)'', ``translate(Coordinate c)'', ``scale(double)'' and so on, and make sure to override these in our concrete shape classes. Herein lies the problem. How do we force the descendants to override them? One way (a bad way) is to implement them in the base class in such a way that they bomb with an error message when called. The bad thing with that is that it violates a very simple rule-of-thumb; ``The sooner you catch an error, the better.'' There are 5 phases in which an error can be found; design, edit, compile, link and runtime. Please note the obvious that errors that cannot be detected until runtime might go undetected! How to discover errors at design or edit time is not for this article (or even this article series), but there's a simple way of moving this particular discovery from runtime to compile time.

Pure virtual (abstract base classes)

C++ offers a way of saying ``This member function must be overridden by all descendants.'' Saying so also implies that objects of the class itself can never be instantiated, only objects of classes inheriting from it. This makes sense. What would you do with a generic shape object? It's better to make it impossible to create one by mistake, since it's meaningless anyway.
Here's how a pure abstract base class might be defined:

class Shape
{
public:
virtual void draw(Canvas&) = 0;
virtual void rotate(double angle) = 0;
virtual void translate(Coordinate c) = 0;
virtual void scale(double) = 0;
};
The ``= 0'' ending of a member function declaration makes it pure virtual. Pure virtual means that it must be overridden by descendants. Having one or more pure virtual member functions in a class makes the class an abstract base class. Abstract because you cannot instantiate objects of the class. If you try you'll get compiler errors. A class which has only pure virtual member functions and no data is often called a pure abstract base class, or some times an interface class. The latter is more descriptive; the class defines an interface that descendants must conform to, and any piece of code that can understand the interface can operate on objects implementing the interface (the concrete classes like ``Triangle'', ``Rectangle'', and ``Circle''). The graphically experienced reader has of course noticed that rotation of a circle can be implemented extremely efficiently by doing nothing at all, so how can we take care of that scenario? It's unnecessary to write code that does nothing, is it not? Let's have a look at the alternatives.
  • Let's just ignore it. It won't work, though, since then our ``Circle'' class will be an abstract class (at least one pure virtual is not ``terminated.'')
  • We can change the interface of ''Shape`` such that ``rotate'' is not a pure virtual, and code its implementation to do nothing. This doesn't seem like a good idea because then the programmer implementing the square might forget to implement ``rotate'' without getting compiler errors.
The root of this lies in the illusion that doing nothing at all is the default behaviour, while it is an optimization for circles. As such the ``do nothing at all'' code belongs in ``Circle`` only. In other words, the best solution is with the original pure abstract ``Shape'' class, and an empty implementation for ``Circle::rotate.''

Addressing pure virtuals

I won't write a drawing program, since that'd make this article way too long, and the point would be drowned in all other intricacies of graphical programming. Instead I'll attack another often forgotten issue; addresses. Mailing addresses have different formatting depending on sender and receiver country. If you send something internationally you add the destination country to the address, while for domestic letters that's not necessary. The formatting itself also differs from country to country. Here are a few (simplified) examples:

Sweden

Name
Street Number
{Country-Code}Postal-Code City
{Country-Name}

USA

Name
Number Street
City, State Zip
{Country-Name}

Canada and U.K.

Name
Number Street
City
{Country}
Postal-Code
Then, of course, there are totally different types of addresses. E-mail, Ham Radio call-signs, phone number, fax number, etc. As a simplification for this example I'll treat State and Zip in U.S. addresses as a unit, and I will assume that Postal-Code and State/Zip in U.S. addresses are synonymous (i.e. I'll only have one field that's used either as postal code or as state/zip combination, depending on country). As an exercise you can improve this. Make sure ``State'' is only dealt with in address kinds where it makes sense. The Country-Code as can be seen in the Swedish address example will also be ignored (this too makes for an excellent exercise to include). The address class hierarchy will be done such that other kinds of addresses like e-mail addresses and phone numbers can be added.
Here's the base class:

class Address
{
public:
virtual const char* type() const = 0;
virtual void print(int international=0) const = 0;
virtual void acquire(void) = 0;
virtual ~Address();
};
The idea here is that ``type'' can be used to ask an address object what kind of address it is, a mailing address, e-mail address and so on. If the parameter for ``print'' is non-zero, the address will be printed in international form, (i.e. country name will be added to mailing addresses and international prefixes added to phone numbers). The member function ``acquire'' is used for asking an operator to enter address data. Note that the destructor is virtual, but not pure virtual (what would happen if it was?)

Unselfish protection

All kinds of mailing addresses will share a base, inheriting from ``Address'', that contains the address fields, and ways to access them. This class, however, will not implement any of the formatting pure virtuals from ``Address.'' That must be done by the concrete address classes with knowledge about the country's formatting and naming. The member function ``type'' will be defined here, however, to always return the string ``Mailing address'', since all kinds of mailing addresses are mailing addresses, even if they're Swedish addresses or U.S. Addresses. Access to the address fields is for the concrete classes only, and this is a problem. We've seen how we can make things generally available by declaring them public, or by hiding them from the general public by making them private. Here we want something in between. We want descendants, the concrete address classes, to access the address fields, but only the descendants and no one else. This can be achieved through the third protection level, ``protected.'' Protected means that access is limited to the class itself (of course) and all descendants of it. It is thus looser than private, but much stricter than public.
Here comes the ``MailingAddress'' base class:

class MailingAddress : public Address
{
public:
virtual ~MailingAddress();
const char* type() const;
protected:
MailingAddress();

void name(const char*); // set
const char* name() const; // get

void street(const char*); // set
const char* street() const; // get

void number(const char*); // set
const char* number() const; // get

void city(const char*); // set
const char* city() const; // get

void postalCode(const char*); // set
const char* postalCode() const; // get

void country(const char*); // set
const char* country() const; // get
private:
char* name_data;
char* street_data;
char* number_data;
char* city_data;
char* postalCode_data;
char* country_data;
//
// declared private to disallow them
//
MailingAddress(const MailingAddress&);
MailingAddress& operator=(const MailingAddress&);
};
Here the copy constructor and assignment operator is declared private to disallow copying and assignment. This is not because they conceptually don't make sense, but because I'm too lazy to implement them (and yet want protection from stupid mistakes that would come, no doubt, if I left it to the compiler to generate them). It's the responsibility of this class to manage memory for the data strings, distributing this to the concrete descendants is asking for trouble. As a rule of thumb, protected data is a bad mistake. Having all data private, and always manage the resources for the data in a controlled way, and giving controlled access through protected access member functions will drastically cut down your aspirin consumption. The reason for the constructor to be protected is more or less just aestethical. No one but descendants can construct objects of this class anyway, since some of the pure virtuals from ``Address'' aren't yet terminated.
Now we get to the concrete address classes:

class SwedishAddress : public MailingAddress
{
public:
SwedishAddress();
virtual void print(int international=0) const;
virtual void acquire(void);
};


class USAddress : public MailingAddress
{
public:
USAddress();
virtual void print(int international=0) const;
virtual void acquire(void);
};
As you can see, the definitions of ``USAddress'' and ``SwedishAddress'' are identical. The only difference lies in the implementation of ``print'', and ``acquire''. I've left the destructors to be implemented at the compilers discretion. Since there's no data to take care of in these classes (it's all in the parent class) we don't need to do anything special here. We know the parent takes care of it. Don't be afraid of copy construction and assignment. They were declared private in ``MailingAddress'', which means the compiler cannot create them the ``USAddress'' and ''SwedishAddress.'' Let's look at the implementation. For the ``Address'' base class only one thing needs implementing and that is the destructor. Since the class holds no data, the destructor will be empty:

Address::~Address()
{
}
A trap many beginners fall into is to think that since the destructor is empty, we can save a little typing by declaring it pure virtual and there won't be a need to implement it. That's wrong, though, since the destructor will be called when a descendant is destroyed. There's no way around that. If you declare it pure virtual and don't implement it, you'll probably get a nasty run-time error when the first concrete descendant is destroyed. The observant reader might have noticed a nasty pattern of the authors refusal to get to the point with pure virtuals and implementation. Yes, you can declare a member function pure virtual, and yet implement it! Pure virtual does not illegalize implementation. It only means that the pure virtual version will NEVER be called through virtual dispatch (i.e. by just calling the function on an object, a reference or a pointer to an object.) Since it will never, ever, be called through virtual dispatch, it must be implemented by the descendants, hence the rule that you cannot instantiate objects where pure virtuals are not terminated. By termination, by the way, I mean declaring it in a non pure virtual way. OK, so a pure virtual won't ever be called through virtual dispatch. Then how can one be called? Through explicit qualification. Let's assume, just for the sake of argument, that we through some magic found a way to implement the some reasonable generic behaviour of ``acquire'' in ``Address,'' but we want to be certain that descendants do implement it. The only way to call the implementation of ``acquire'' in ``Address'' is to explicitly write ``Address::acquire.'' This is what explicit qualification means. There's no escape for the compiler; writing it like this can only mean one thing, even if ``Address::acquire'' is declared pure virtual.
Now let's look at the middle class, the ``MailingAddress'' base class.

MailingAddress::~MailingAddress()
{
delete[] name_data;
delete[] street_data;
delete[] number_data;
delete[] city_data;
delete[] postalCode_data;
delete[] country_data;
}
I said when explaining the interface for this class, that it is responsible for handling the resources for the member data. Since we don't know the length of the fields, we oughtn't restrict them, but rather dynamically allocate whatever is needed. The ``delete[]'' syntax is for deleting arrays as opposed to just ``delete'' which deletes single objects. Note that it's legal to delete the 0 pointer. This is used here. If, for some reason, one of the fields are not set to anything, it will be 0. Deleting the 0 pointer does nothing at all. From this to the constructor:

MailingAddress::MailingAddress()
: name_data(0),
street_data(0),
number_data(0),
city_data(0),
postalCode_data(0),
country_data(0)
{
}
The only thing the constructor does is to make sure all pointers are 0, in order to guarantee destructability. The ``type'' and read-access methods are trivial:

const char* MailingAddress::type(void) const
{
return "Mailing address";
}

const char* MailingAddress::name(void) const
{
return name_data;
}

const char* MailingAddress::street(void) const
{
return street_data;
}

const char* MailingAddress::number(void) const
{
return number_data;
}

const char* MailingAddress::city(void) const
{
return city_data;
}

const char* MailingAddress::postalCode(void) const
{
return postalCode_data;
}

const char* MailingAddress::country(void) const
{
return country_data;
}
The write access methods are a bit trickier, though. First we must check if the source and destination are the same, and do nothing in those situations. This is to achieve robustness. While it may seem like a very stupid thing to do, it's perfectly possible to see something like:

name(name());
The meaning of this is, of course, ``set the name to what it currently is.'' We must make sure that doing this works (or find a way to illegalize the construct, but I can't think of any way). If the source and destination are different, however, the old destination must be deleted, a new one allocated on heap and the contents copied. Like this:

void MailingAddress::name(const char* n)
{
if (n != name_data) {
delete[] name_data; // OK even if 0
name_data = new char[strlen(n)+1];
strcpy(name_data,n);
}
}
This is done so many times over and over, exactly the same way for all kinds of data members, that we'll use a convenience function, ``replace,'' to do the job. ``strlen'' and ``strcpy'' are the C library functions from that calculates the length of, and copies strings.

static void replace(char*& data, const char* n)
{
if (data != n) {
delete[] data;
data = new char[strlen(n)+1];
strcpy(data,n);
}
}
Using this convenience function, the write-access member functions will be fairly straight forward:

void MailingAddress::name(const char* n)
{
::replace(name_data,n);
}

void MailingAddress::street(const char* n)
{
::replace(street_data,n);
}

void MailingAddress::number(const char* n)
{
::replace(number_data,n);
}

void MailingAddress::city(const char* n)
{
::replace(city_data,n);
}

void MailingAddress::postalCode(const char* n)
{
::replace(postalCode_data,n);
}

void MailingAddress::country(const char* n)
{
::replace(country_data,n);
}
That was all the ``MailingAddress'' base class does. Now it's time for the concrete classes. All they do is to ask questions with the right terminology and output the fields in the right places:

SwedishAddress::SwedishAddress()
: MailingAddress()
{
country("Sweden"); // what else?
}

void SwedishAddress::print(int international) const
{
cout << name() << endl;
cout << street() << ' ' << number() << endl;
cout << postalCode() << ' ' << city() << endl;
if (international) cout << country() << endl;
}

void SwedishAddress::acquire(void)
{
char buffer[100]; // A mighty long field

cout << "Name: " << flush;
cin.getline(buffer,sizeof(buffer));
name(buffer);

cout << "Street: " << flush;
cin.getline(buffer,sizeof(buffer));
street(buffer);

cout << "Number: " << flush;
cin.getline(buffer,sizeof(buffer));
number(buffer);

cout << "Postal code: " << flush;
cin.getline(buffer,sizeof(buffer));
postalCode(buffer);

cout << "City: " << flush;
cin.getline(buffer,sizeof(buffer));
city(buffer);
}

USAddress::USAddress()
: MailingAddress()
{
country("U.S.A."); // what else?
}

void USAddress::print(int international) const
{
cout << name() << endl;
cout << number() << ' ' << street() << endl;
cout << city() << ' ' << postalCode() << endl;
if (international) cout << country() << endl;
}

void USAddress::acquire(void)
{
char buffer[100]; // Seems like a mighty long field

cout << "Name: " << flush;
cin.getline(buffer,sizeof(buffer));
name(buffer);

cout << "Number: " << flush;
cin.getline(buffer,sizeof(buffer));
number(buffer);

cout << "Street: " << flush;
cin.getline( buffer,sizeof(buffer));
street(buffer);

cout << "City: " << flush;
cin.getline(buffer, sizeof(buffer));
city(buffer);

cout << "State and ZIP: " << flush;
cin.getline(buffer,sizeof(buffer));
postalCode(buffer);
}

A toy program

Having done all this work with the classes, we must of course play a bit with them. Here's an short and simple example program that (of course) also makes use of the generic programming paradigm introduced last month.

int main(void)
{
const unsigned size=10;
Address* addrs[size];
Address** first = addrs; // needed for VACPP (bug?)
Address** last = get_addrs(addrs,addrs+size);

cout << endl << "--------" << endl;

for_each(first,last,print(1));
for_each(first,last,deallocate
());
return 0;
}
OK, that was mean. Obviously there's a function ``get_addrs'', which reads addresses into a range of iterators (in this case pointers in an array) until the array is full, or it terminates for some other reason. Here's how it may be implemented:

Address** get_addrs(Address** first,Address** last)
{
Address** current = first;
while (current != last)
{
cout << endl << "Kind (U)S, (S)wedish or (N)one "
<< flush;

char answer[5]; // Should be enough.
cin.getline(answer,sizeof(answer));
if (!cin) break;

switch (answer[0]) {
case 'U': case 'u':
*current = new USAddress;
break;
case 'S': case 's':
*current = new SwedishAddress;
break;
default:
return current;
}
(**current).acquire();
++current;
}
return current;
}
In part 6 I mentioned that virtual dispatch could replace switch statements, and yet here is one. Could this one be replaced with virtual dispatch as well? It would be unfair of me to say ``no'', but it would be equally unfair of me to propose using virtual dispatch here. The reason is that we'd need to work a lot without gaining anything. Why? We obviously cannot do virtual dispatch on the ``Address'' objects we're about to create, since they're not created yet. Instead we'd need a set of address creating objects, which we can access through some subscript or whatever, and call a virtual creation member function for. Doesn't seem to save a lot of work does it? Probably the selection mechanism for which address creating object to call would be a switch statement anyway! So, that was reading, now for the rest. ``for_each'' does something for every iterator in a range. It could be implemented like this:

template
void for_each(OI first,OI last, const F& functor)
{
while (first != last) {
functor(*first);
++first;
}
}
In fact, in the (final draft) C++ standard, there is a beast called ``for_each'' and behaving almost like this one (it returns the functor). It's pretty handy. Imagine never again having to explicitly loop through a complete collection again. What is ``print'' then? Print is a ``functor,'' or ``function object'' as they're often called. It's something which behaves like a function, but which might store a state of some kind (in this case whether the country should be added to written addresses or not), and which can be passed around like any object. Defining one is easy, although it looks odd at first.

class print
{
public:
print(int i) ;
void operator()(const Address*) const;
private:
int international;
};

print::print(int i)
: international(i)
{
}

void print::operator()(const Address* p) const
{
p->print(international);
cout << endl;
}
What on earth is ``operator()''? It's the member function that's called if we boldly treat the name of an object just as if it was the name of some function, and simply call it. Like this:

print pobject; // define print object.
pobject(1); // pobject.operator()(1);
This is usually called the ``function call'' operator, by the way. The only remaining thing now is ``dealllocate'', but you probably already guessed it looks like this:

template
class deallocate
{
public:
void operator()(T* p) const;
};

template
void deallocate::operator()(T* p) const
{
delete p;
}
This is well enough for one month, isn't it? You know what? You know by now most of the C++ language, and have some experience with the C++ standard class library. Most of the language issues that remain are more or less obscure and little known. We'll look mostly at library stuff and clever ideas for how to use the language from now on.

Recap

This month, you've learned:
  • what pure virtual means, and how you declare pure virtual functions.
  • that despite what most C++ programmers believe, pure virtual functions can be implemented.
  • that the above means that there's a distinction between terminating a pure virtual, and implementing one.
  • why it's a bad idea to make destructors pure virtual.
  • a new protection level, ``protected.''
  • why protected data is bad, and how you can work around it in a clever way.
  • that switch statements cannot always be replaced by virtual dispatch.
  • that there is a ``function call'' operator and how to define and use it.

Exercises

  • Find out what happens if you declare the ``MailingAddress'' destructor pure virtual, and yet define it.
  • Think of two ways to handle the State/Zip problem, and implement both (what are the advantages, disadvantages of the methods?)
  • Rewrite ``get_addrs'' to accept templatized iterators instead of pointers.

No comments:

Post a Comment