Sunday, November 15, 2009

An Introduction to C++ Programming - Part 10

The data representation problem

In the file array as implemented last month, data was always stored in a raw binary format, exactly mirroring the bits as they lay in memory. This works fine for integers and such, but can be disastrous in other situations. Imagine a file array of strings (where string is a ``char*''). With the implementation from last month, the pointer value would be stored, not the data pointed to. When reading, a pointer value is read, and when dereferenced, whatever happens to be at the memory location pointed to (if anything) will be used (which is more than likely to result in a rather quick crash.) Anything with pointers is dangerous when stored in a raw binary format, yet we must somehow allow pointers in the array, and preferably so without causing problems for those using the array with built-in arithmetic types. How can this be done?
In part 4, when templates were introduced, a clever little construct called ``traits classes'' was shown. I then gave this rather terse description: ``A traits class is never instantiated, and doesn't contain any data. It just tells things about other classes, that is its sole purpose.'' Doesn't that smell like something we can use here? A traits class that tells how the data types should be represented on disk?
What do we need from such a traits class? Obviously, we need to know how much disk space each element will take, so a ``size'' member will definitely be necessary, otherwise we cannot know much disk space will be required. We also need to know how to store the data, and how to read it. The easiest way is probably to have member functions ``writeTo'' and ``readFrom'' in the traits class. Thus we can have something looking like this:

template class FileArrayElementAccess
{
public:
static const size_t size;
static void writeTo(T value, ostream& os);
static T readFrom(istream& is);
};
The array is then rewritten to use this when dealing with the data. The change is extremely minor. ``storeElement'' needs to be rewritten as:

template
void FileArray::storeElement(size_t index,
const T& element)
{
// what if index >= array_size?
typedef FileArrayElementAccess traits;
(*pstream).seekp(traits::size*index
+sizeof(array_size), ios::beg);
// what if seek fails?
traits::writeTo(element,*pstream);
// what if write failed?
// what if too much data was written?
}
The change for ``readElement'' is of course analogous. However, as indicated by the last comment, a new error possibility has shown up. What if the ``writeTo'' and ``readFrom'' members of the traits class are buggy and write or read more data to disk than they're allowed to? Since it's the user of the array that must write the traits class (at least for their own data types) we cannot solve the problem, but we can give the user a chance to discover that something went wrong. Unfortunately for writing, the error is extremely severe; it means that the next entry in the array will have its data destroyed... In the traits class, by the way, the constant ``size'', used for telling how many bytes in the stream each ``T'' will occupy, poses a problem with most C++ compilers today (modern ones mostly makes life so much easier.) The problem is that a static variable, and also a static constant, in a class, needs to reside somewhere in memory, and the class declaration is not enough for that. This problem is two-fold. To begin with, where should it be stored? It's very much up to whoever writes the class, but somewhere in the code, there must be something like:

const size_t ArrayFileElementAccess::size = ...;
where ``X'' is the name of the class dealt with by the particular traits specialisation. The second problem is that this is totally unnecessary. What we want is a value that can be used by the compiler at compile time, not a memory location to read a value from. As I mentioned, a modern compiler does make this much easier. In standard C++ it is allowed to write:

template<> class ArrayFileElementAccess
{
public:
const size_t size = ...;
...
};
Note that for some reason that I do not know, this construct is only legal if the type is a constant of an integral or enumeration type. ``size_t'' is such a type, it's some unsigned integral type, probably ``unsigned int'', but possibly ``unsigned long''. The expression denoted ``...'' must be possible to evaluate at compile time. Unless code is written that explicitly takes the address of ``size'', we need not give the constant any space to reside in. The odd construct ``template <>'' is also new C++ syntax, and means that what follows is a specialisation of a previously declared template. For old compilers, however, there's a work-around for integral values, no larger than the largest ``int'' value. We cheat and use an enum instead of a ``size_t''. This makes the declaration:

class ArrayFileElementAccess
{
public:
enum { size= ... };
...
};
This is a bit ugly, but it is perfectly harmless. The advantage gained by adding the traits class is flexibility and safety. If someone wants to use a file array for their own class, they're free to do so. However, they must first write a ``FileArrayElementAccess'' specialisation. Failure to do so will result in a compilation error. This early error detection is beneficial. The sloppy solution from last month would not yield any error until run-time, which means a (usually long) debugging session.

Several arrays in a file

What is needed in order to host several arrays in the same file? One way or the other, there must be a mechanism for finding out where one array begins and another ends. I think the simplest solution, is to let go of the file names, and instead make the constructors accept an ``fstream&''. We can then require that the put and get pointer of the stream must be where the array can begin, and we can in turn promise that the put and get pointer will be positioned at the byte after the array end. Of course, in addition to having a reference to the ``fstream'' in our class, we also need the ``home'' position, to seek relative to, when indexing the array. This becomes easy to write for us, it becomes easy to use as well. For someone requiring only one array in a file, there'll be slightly more code, an ``fstream'' object must be explicitly initialised somewhere, and passed to the constructor of the array, instead of just giving it a name. I think the functionality increase/code expansion exchange is favorable.
In order to improve the likelihood of finding errors, we can waste a few bytes of disk space by writing a well known header and trailer pattern at the beginning and end of the array (before the first element, and after the last one.) If someone wants to allocate an array using an existing file, we can find out if the get pointer is in place for an array start.
The constructor creating a file should, however, first try to read from the file to see if it exists. If it does, it should be created from the file, just like the constructor accepting a stream only does. If the read fails, however, we can safely assume that the file doesn't exist and should instead be created.
The change in the class definition, and constructor implementation is relatively straight forward, if long:

template
class FileArray
{
public:
FileArray(fstream& fs, size_t elements);
// create a new file.

FileArray(fstream& fs);
// use an existing file and get size from there
...
private:
void initFromFile(const char*);

fstream& stream;
size_t array_size; // in elements
streampos home;
};

template
FileArray::FileArray(fstream& fs, size_t elements)
: stream(fs),
array_size(elements)
{
// what if the file could not be opened?
// first try to read and see if there's a begin
// pattern. Either there is one, or we should
// get an eof.

char pattern[6];
stream.read(pattern,6);
if (stream.eof()) {
stream.clear(); // clear error state
// and initialise.

// begin of array pattern.
stream.write("ABegin",6);
// must store size of elements, as last month
const size_t elem_size
=FileArrayElementAccess::size;
stream.write((const char*)&elem_size,
sizeof(elem_size));
// and of course the number of elements
stream.write((const char*)&array_size,
sizeof(array_size));
// Now that we've written the maintenance
// stuff, we know what the home position is.

home = stream.tellp();

// Then we must go the the end and write
// the end pattern.

stream.seekp(home+elem_size*array_size);
stream.write("AEnd",4);

// set put and get pointer to past the end pos.
stream.seekg(stream.tellp());
return;
}

initFromFile(pattern); // shared with other
// stream constructor
if (array_size != elements) {
// Uh oh. The data read from the stream,
// and the size given in the constructor
// mismatches! What now?
stream.clear(ios::failbit);
}

// set put and get pointer to past the end pos.
stream.seekp(stream.tellg());
}

template
FileArray::FileArray(fstream& fs)
: stream(fs)
{
// First read the head pattern to see if
// it's right.
char pattern[6];
stream.read(pattern,6);
initFromFile(pattern);
// set put and get pointer to past the end pos.
stream.seekp(stream.tellg());
}

template
void FileArray::initFromFile(const char* p)
{
// Check if the read pattern is correct
if (strncmp(p,"ABegin",6)) {
// What to do? It was all wrong!
stream.clear(ios::failbit);
// for lack of better,
// set the fail flag.
return;
}
// OK, we have a valid array, now let's see if
// it's of the right kind.
size_t elem_size;
stream.read((char*)&elem_size,sizeof(elem_size));
if (elem_size != FileArrayElementAccess::size)
{
// wrong kind of array, the element sizes
// mismatch. Again, what to do? Let's set
// the fail flag for now.
stream.clear(ios::failbit);
// stupid name for the
// member function, right?
return;
}
// Get the size of the array. Can't do much with
// the size here, though.
stream.read((char*)&array_size,sizeof(array_size));
// Now we're past the header, so we know where the
// data begins and can set the home position.

home = stream.tellg();

stream.seekg(home+elem_size*array_size);

// Now positioned immediately after the last
// element.

char epattern[4];
stream.read(epattern,4);
if (strncmp(epattern,"AEnd",4)) {
// Whoops, corrupt file!
stream.clear(ios::failbit);
return;
}
// Seems like we have a valid array!
}
Other than the above, the only change needed for the array is that seeking will be done relative to ``home'' rather than the beginning of the file (plus the size of the header entries.) The new versions of ``storeElement'' and ``readElement'' become:

template
T FileArray::readElement(size_t index) const
{ // what if index >= max_elements?
typedef FileArrayElementAccess traits;
stream.seekg(home+index*traits::size);
// what if seek fails?

return traits::readFrom(stream);
// what if read fails?
// What if too much data is read?
}

template
void FileArray::storeElement(size_t index,
const T& element)
{ // what if index >= array_size?
typedef FileArrayElementAccess traits;
stream.seekp(home+traits::size*index);
// what if seek fails?
traits::writeTo(element,stream);
// what if write failed?
// what if too much data was written?
}

Temporary file array

Making use of a temporary file to store a file array that's not to be persistent between runs of the application isn't that tricky. The implementation so far makes use of a stream and known data about the beginning of the stream, number of elements and size of the elements. This can be used for the temporary file as well. The only thing we need to do is to create the temporary file first, open it with an fstream object, and tie the stream reference to that object, and remember to delete the file in the destructor.
What's the best way of creating something and making sure we remember to undo it later? Well, of course, creating a new helper class which creates the file in its constructor and removes it in its destructor. Piece of cake. The only problem is that we shouldn't always create a temporary file, and when we do, we can handle it a bit different from what we do with a ``global'' file that can be shared. For example, we know that we have exclusive rights to the file, and that it won't be reused, so there's no need for the extra information in the beginning and end. So, how's a temporary file created? The C++ standard doesn't say, and neither is there any support for it in the old de-facto standard. I don't think C does either. There are, however, two functions ``tmpnam'' and ``tempnam'' defined as commonly supported extensions to C. They can be found in . I have in this implementation chosen to use ``tempnam'' as it's more flexible. ``tempnam'' works like this: it accepts two string parameters named ``dir'' and ``prefix''. It first attempts to create a temporary file in the directory pointed to by the environment variable ``TMPDIR''. If that fails, it attempts to create it in the directory indicated by the ``dir'' parameter, unless it's 0, in which case a hard-coded default is attempted. It returns a ``char*'' indicating a name to use. The memory area pointed to is allocated with the C function ``malloc'', and thus must be deallocated with ``free'' and not delete[].
Over to the implementation details:
We add a class called temporaryfile, which does the above mentioned work. We also add a member variable ``pfile'' which is of type ``ptr''. Remember the ``ptr'' template from last month? It's a smart pointer that deallocates whatever it points to in its destructor. It's important that the member variable ``pfile'' is listed before the ``stream'' member, since initialisation is done in the order listed, and the ``stream'' member must be initialised from the file object owned by ``pfile''. We also add a constructor with the number of elements as its sole parameter, which makes use of the temporary file.

class temporaryfile
{
public:
temporaryfile();
~temporaryfile();
iostream& stream();
private:
char* name;
fstream fs;
};

temporaryfile::temporaryfile()
: name(::tempnam(".","array")),
fs(name, ios::in|ios::out|ios::binary)
{
// what if tmpnam fails and name is 0
// what if fs is bad?
}

temporaryfile::~temporaryfile()
{
fs.close();
::remove(name);
// what if remove fails?
::free(name);
}
In the above code, ``tempnam'', ``remove'' and ``free'' are prefixed with ``::``, to make sure that it's the names in global scope that are meant, just in case someone enhances the class with a few more member functions whose name might clash. For the sake of syntactical convenience, I have added yet another operator to the ``ptr'' class template:

template class ptr
{
public:
ptr(T* tp=0) : p(tp) {};
~ptr() { delete p; };
T* operator->(void) const { return p; };
T& operator*(void) const { return *p;};
private:
ptr(const ptr&);
ptr& operator=(const ptr&);
T* p;
};
It's the ``operator->'' that's new, which allows us to write things like ``p->x,'' where p is a ``ptr'', and the type ``X'' contains some member named ``x''. The return type for ``operator->'' must be something that ``operator->'' can be applied to. The explanation sounds recursive, but it makes sense if you look at the above code. ``ptr::operator->()'' returns an ``X*''. ``X*'' is something you can apply the built in ``operator->'' to (which gives you access to the elements.)

template
FileArray::FileArray(size_t elements)
: pfile(new temporaryfile),
stream(pfile->stream()),
array_size(elements),
home(stream.tellg())
{
const size_t elem_size=
FileArrayElementAccess::size;
// put a char just after the end to make
// sure there's enough free disk space.
stream.seekp(home+array_size*elem_size);
char c;
stream.write(&c,1);
// what to do if write fails?
// set put and get pointer to past the end pos
stream.seekg(stream.tellp());
}
That's it! The rest of the array works exactly as before. No need to rewrite anything else.

Code reuse

If you're an experienced C programmer, especially experienced with programming embedded systems where memory constraints are tough and you also have a good memory, you might get a feeling that something's wrong here.
What I'm talking about is something I mentioned the first time templates were introduced: ``Templates aren't source code. The source code is generated by the compiler when needed.'' This means that if we in a program uses FileArray, FileArray, FileArray and FileArray (where ``X'' and ``Y'' are some classes,) there will be code for all four types. Now, have a close look at the member functions and see in what way ``FileArray::FileArray(iostream& fs, size_t elements)'' differs from ``FileArray::FileArray(iostream& fs, size_t elements)''. Please do compare them.
What did you find? The only difference at all is in the handling of the member ``elem_size'', yet the same code is generated several times with that as the only difference. This is what is often referred to as the template code bloat of C++. We don't want code bloat. We want fast, tight, and slick applications.
Since the only thing that differs is the size of the elements, we can move the rest to something that isn't templatised, and use that common base everywhere. I've already shown how code reuse can be done by creating a separate class and have a member variable of that type. In this article I want to show an alternative way of reusing code, and that is through inheritance. Note very carefully that I did not say public inheritance. Public inheritance models ``is-A'' relationships only. We don't want an ``is-A'' relationship here. All we want is to reuse code to reduce code bloat. This is done through private inheritance. Private inheritance is used far less than it should be. Here's all there is to it. Create a class with the desired implementation to reuse and inherit privately from it. Nothing more, nothing less. To a user of your class, it matters not at all if you chose not to reuse code at all, reuse through encapsulation of a member variable, or reuse through private inheritance. It's not possible to refer to the descendant class through a pointer to the private base class, private inheritance is an implementation detail only, and not an interface issue.
To the point. What can, and what can not be isolated and put in a private base class? Let's first look at the data. The ``stream'' reference member can definitely be moved to the base, and so can the ``pfile'' member for temporary files. The ``array_size'' member can safely be there too and also the ``home'' member for marking the beginning of the array on the stream. By doing that alone we have saved just about nothing at all, but if we add as a data member in the base class the size (on disk) for the elements, and we can initialise that member through the ``FileArrayElementAccess::size'' traits member, all seeking in the file, including the initial seeking when creating the file array, can be moved to the base class. Now a lot has been gained. Left will be very little. Let's look at the new improved implementation:
Now for the declaration of the base class.

class FileArrayBase
{
public:
protected:
FileArrayBase(iostream& io,
size_t elements,
size_t elem_size);
FileArrayBase(iostream& io);
FileArrayBase(size_t elements, size_t elem_size);
iostream& seekp(size_t index) const;
iostream& seekg(size_t index) const;
size_t size() const; // number of elements
size_t element_size() const;
private:
class temporaryfile
{
public:
temporaryfile();
~temporaryfile();
iostream& stream();
private:
char* name;
fstream fs;
};
void initFromFile(const char* p);
ptr pfile;
iostream& stream;
size_t array_size;
size_t e_size;
streampos home;
};
The only surprise here should be the nesting of the class ``temporaryfile.'' Yes, it's possible to define a class within a class. Since the ``temporaryfile'' class is defined in the private section of ``FileArrayBase'', it's inaccessible from anywhere other than the ``FileArrayBase'' implementation. It's actually possible to nest classes in class templates as well, but few compilers today support that. When implementing the member functions of the nested class, it looks a bit ugly, since the surrounding scope must be used.

FileArrayBase::temporaryfile::temporaryfile()
: name(::tempnam(".","array")),
fs(name,ios::in|ios::out|ios::binary)
{
// what if tmpnam fails and name is 0
// what if fs is bad?
}

FileArrayBase::temporaryfile::~temporaryfile()
{
fs.close();
::remove(name);
// What if remove fails?
::free(name);
}

iostream& FileArrayBase::temporaryfile::stream()
{
return fs;
}
The implementation of ``FileArrayBase'' is very similar to the ``FileArray'' earlier. The only difference is that we use a parameter for the element size, instead of the traits class.

FileArrayBase::FileArrayBase(iostream& io,
size_t elements,
size_t elem_size)
: stream(io),
array_size(elements),
e_size(elem_size)
{
char pattern[sizeof(ArrayBegin)];
stream.read(pattern,sizeof(pattern));
if (stream.eof()) {
stream.clear(); // clear error state
// and initialize.
// begin of array pattern.
stream.write(ArrayBegin,sizeof(ArrayBegin));

// must store size of elements
stream.write((const char*)&elem_size,
sizeof(elem_size));

// and of course the number of elements
stream.write((const char*)&array_size,
sizeof(array_size));

// Now that we've written the maintenance
// stuff, we know what the home position is.
home = stream.tellp();

// Then we must go the the end and write
// the end pattern.

stream.seekp(home+elem_size*array_size);
stream.write(ArrayEnd,sizeof(ArrayEnd));

// set put and get pointer to past the end pos.
stream.seekg(stream.tellp());
return;
}
initFromFile(pattern); // shared with other
// stream constructor

if (array_size != elements) {
// Uh oh. The data read from the stream,
// and the size given in the constructor
// mismatches! What now?

stream.clear(ios::failbit);
}
if (e_size != elem_size) {
stream.clear(ios::failbit);
}
// set put and get pointer to past the end pos.
stream.seekp(stream.tellg());
}
To make life a little bit easier, I've assumed two arrays of char named ``ArrayBegin'' and ``ArrayEnd'', which hold the patterns to be used for marking the beginning and end of an array on disk.

FileArrayBase::FileArrayBase(iostream& io)
: stream(io)
{
char pattern[sizeof(ArrayBegin)];
stream.read(pattern,sizeof(pattern));
initFromFile(pattern);

// set put and get pointer to past the end pos.
stream.seekp(stream.tellg());
}

FileArrayBase::FileArrayBase(size_t elements,
size_t elem_size)
: pfile(new temporaryfile),
stream(pfile->stream()),
array_size(elements),
e_size(elem_size),
home(stream.tellg())
{
stream.seekp(home+array_size*e_size);
char c;
stream.write(&c,1);
// set put and get pointer to past the end pos.
stream.seekg(stream.tellp());
}

void FileArrayBase::initFromFile(const char* p)
{
// Check if the read pattern is correct
if (strncmp(p,ArrayBegin,sizeof(ArrayBegin))) {
// What to do? It was all wrong!
stream.clear(ios::failbit); // for lack of better,
// set the fail flag.
return;
}
// OK, we have a valid array, now let's see if
// it's of the right kind.
stream.read((char*)&e_size,sizeof(e_size));

// Get the size of the array. Can't do much with
// the size here, though.
stream.read((char*)&array_size,sizeof(array_size));

// Now we're past the header, so we know where the
// data begins and can set the home position.
home = stream.tellg();
stream.seekg(home+e_size*array_size);
// Now positioned immediately after the last
// element.
char epattern[sizeof(ArrayEnd)];
stream.read(epattern,sizeof(epattern));
if (strncmp(epattern,ArrayEnd,sizeof(ArrayEnd)))
{
// Whoops, corrupt file!
stream.clear(ios::failbit);
return;
}
// Seems like we have a valid array!
}

iostream& FileArrayBase::seekg(size_t index) const
{
// what if index is out of bounds?
stream.seekg(home+index*e_size);
// what if seek failed?
return stream;
}

iostream& FileArrayBase::seekp(size_t index) const
{
// What if index is out of bounds?
stream.seekp(home+index*e_size);
// What if seek failed?
return stream;
}

size_t FileArrayBase::size() const
{
return array_size;
}

size_t FileArrayBase::element_size() const
{
return e_size;
}
Apart from the tricky questions, it's all pretty straight forward. The really good news, however, is how easy this makes the implementation of the class template ``FileArray''.

template
class FileArray : private FileArrayBase
{
public:
FileArray(iostream& io, size_t size);// create one.
FileArray(iostream& io); // use existing array
FileArray(size_t elements); // create temporary
T operator[](size_t index) const;
FileArrayProxy operator[](size_t index);
size_t size() { return FileArrayBase::size(); };
private:
FileArray(const FileArray&); // illegal
FileArray& operator=(const FileArray&);
// illegal

T readElement(size_t index) const;
void storeElement(size_t index, const T& elem);
friend class FileArrayProxy;
};
Now watch this!

template
FileArray::FileArray(iostream& io, size_t size)
: FileArrayBase(io,
elements,
FileArrayElementAccess::size)
{
}

template
FileArray::FileArray(iostream& io)
: FileArrayBase(io)
{
// what if element_size is wrong?
}

template
FileArray::FileArray(size_t elements)
: FileArrayBase(elements,
FileArrayElementAccess::size)
{
}

template
T FileArray::operator[](size_t index) const
{
// what if index>= size()?
return readElement(index);
}

template
FileArrayProxy
FileArray::operator[](size_t index)
{
// what if index>= size()?
return FileArrayProxy(*this, index);
}

template
T FileArray::readElement(size_t index) const
{
// what if index>= size()?
iostream& s = seekg(index); // parent seekg
return FileArrayElementAccess::readFrom(s);
// what if read failed?
// What if too much data was read?
return t;
}

template
void FileArray::storeElement(size_t index,
const T& element)
{ // what if index>= size()?
iostream& s = seekp(index); // parent seekp
// what if seek fails?
FileArrayElementAccess::writeTo(element,s);
// what if write failed?
// What if too much data was written?
}
How much easier can it get? This reduced code bloat, and also makes the source code easier to understand, extend and maintain.

What can go wrong?

Already in the very beginning of this article series, part 1, I introduced exceptions; the C++ error handling mechanism. Of course exceptions should be used to handle the error situations that can occur in our array class. When I introduced exceptions, I didn't tell the whole truth about them. There was one thing I didn't tell, because at that time it wouldn't have made much sense. That one thing is that when exceptions are caught, dynamic binding works, or to use wording slightly more English-like, we can create exception class hierarchies with public inheritance, and we can choose what level to catch. Here's a mini example showing the idea:

class A {};
class B : public A {};
class C : public A {};
class B1 : public B{};

void f() (throw A); // may throw any of the above

void x()
{
try {
f();
}
catch (B& b) {
// **1
}
catch (C& c) {
// **2
}
catch (A& a) {
// **3
}
}
At ``**1'' above, objects of class ``B'' and class ``B1'' are caught if thrown from ``f''. In ``**2'' objects of class ``C'' (and descendants of C, if any are declared elsewhere) are caught. At ``**3'' all others from the ``A'' hierarchy are caught. This may seem like a curious detail of purely academic worth, but it's extremely useful. We can use abstraction levels for errors. For example, we can have a root class ``FileArrayException'', from which all other exceptions regarding the file array inherits. We can see that there are clearly two kinds of errors that can occur in the file array; abuse and environmental issues outside the control of the programmer. For abuse I mean things like indexing outside the valid bounds, and with environmental issues I mean faulty or full disks (Since there are several programs running, a check if there's enough disk space is still taking a chance. Even if there was enough free space when the check was made, that space may be occupied when the next statement in the program is executed.)
A reasonable start for the exception hierarchy then becomes:

class FileArrayException {};
class FileArrayLogicError
: public FileArrayException {};
class FileArrayRuntimeError
: public FileArray Exception {};
Here ``FileArrayLogicError'' are for clear violations of the not too clearly stated preconditions, and ``FileArrayRuntimeError'' for things that the programmer may not have a chance to do something about. In a perfectly debugged program, the only exceptions ever thrown from file arrays will be of the ``FileArrayRuntimeError'' kind. We can divide those further into:

class FileArrayCreateError
: public FileArrayRuntimeError {};
For whenever the creation of the array fails, regardless of why (it's not very easy to find out if it's a faulty disk or lack of disk space, for example.)

class FileArrayStreamError
: public FileArrayRuntimeError {};
If after creation, something goes wrong with a stream; for example if seeking or reading/writing fails.

class FileArrayDataCorruptionError
: public FileArrayRuntimeError {};
If an array is created from an old existing file, and we note that the header or trailer doesn't match the expected.

class FileArrayBoundsError
: public FileArrayLogicError {};
Addressing outside the legal bounds.

class FileArrayElementSizeError
: public FileArrayLogicError {};
If the read/write members of the element access traits class are faulty and either write too much (thus overwriting the data for the next element) or reads too much (in which case the last few bytes read will be garbage picked from the next element.) It's of course possible to take this even further. I think this is quite enough, though.
Now we have a reasonably fine level of error reporting, yet an application that wishes a coarse level of error handling can choose to catch the higher levels of the hierarchy only.
As an exercise, I invite you to add the throws to the code. Beware, however; it's not a good idea to add exception specifications to the member functions making use of the T's (since you cannot know which operations on T's that may throw, and what they do throw.) You can increase the code size and eligibility gain from the private inheritance of the implementation in the base by putting quite a lot of the error handling there.

Iterators

An iterator into a file array is something whose behavior is analogous to that of pointers into arrays. We want to be able to create an iterator from the array (in which case the iterator refers to the first element of the array.) We want to access that element by dereferencing the iterator (unary operator *,) and we want iterator arithmetic with integers.
An easy way of getting there is to let an iterator contain a pointer to a file array, and an index. Whenever the iterator is dereferenced, we return (*array)[index]. That way we even have error handling for iterator arithmetic that lead us outside the valid range for the array given for free from the array itself. The iterator arithmetics becomes simple too, since it's just ordinary arithmetics on the index type. The implementation thus seems easy; all that's needed is to define the operations needed for the iterators, and the actions we want. Here's my idea:
  • creation from array yields iterator referring to first element
  • copy construction and assignment are of course well behaved.
  • moving forwards and backwards with operator++ and operator--.
  • addition of array and ``long int'' value ``n'' yields iterator referring to n:th element of array.
  • iterator+=n (where n is of type long int) adds n to the value of the index in the iterator. This addition is never an error; it's dereferencing the iterator that's an error if the index is out of range. Operator -= is analogous.
  • iterator+n yields a new iterator referring to the iterator.index+n:th element of the array, and analogous for operator-.
  • iterator1-iterator2 yields a long int which is the difference between the indices of the iterators. If iterator1 and iterator2 refer to different arrays, it's an error and we throw an exception.
  • iterator1==iterator2 returns non-zero if the arrays and indices of iterator1 and iterator2 are equal.
  • iterator1!=iterator2 returns !(iterator1==iterator2)
  • *iterator returns whatever (*array)[index] returns, i.e a
  • leArrayProxy. * iterator[n] returns (*array)[index+n].
  • iterator1< iterator2.index. If the iterators refer to different arrays, it's an error and we throw an exception. Likewise for operator>.
  • iterator1>=iterator2 returns !(iterator1<=.
I think the above is an exhaustive list. Neither of the above is difficult. It's just a lot of code to write, and thus a good chance of making errors. With a little thought, however, quite a lot of code can be reused over and over, thus reducing the amount to write and also the risk for errors. As an example, a rule of thumb when writing a class for which an object ``o'' and some other value ``v'' the operations ``o+=v'', ``o+v'' and ``v+o'' are well defined and behaves like they do for the built in types (which they really ought to, unless you want to give the class users some rather unhealthy surprises) is to define ``operator+='' as a member of the class, and two versions of operator+ that are implemented with ``operator+=''. Here's how it's done in the iterator example:

template
class FileArrayIterator
{
public:
FileArrayIterator(FileArray& f);
FileArrayIterator& operator+=(long n);
FileArrayProxy operator*();
FileArrayProxy operator[](long n);
...
private:
FileArray* array;
unsigned long index;
};

template FileArrayIterator
operator+(const FileArrayIterator& i, long n);

template FileArrayIterator
operator+(long n, const FileArrayIterator& i);

template
FileArrayIterator::FileArrayIterator(
const FileArray& a
)
: array(&a),
index(0)
{
}

template
FileArrayIterator::FileArrayIterator(
const FileArrayIterator& i
)
: array(i.array),
index(i.index)
{
}

template
FileArrayIterator&
FileArrayIterator::operator+=(long n)
{
index+=n;
return *this;
}

template FileArrayIterator
operator+(const FileArrayIterator& i, long n)
{
FileArrayIterator it(i);
return it+=n;
}

template FileArrayIterator
operator+(long n, const FileArrayIterator& i)
{
FileArrayIterator it(i);
return it+=n;
}

template
FileArrayProxy FileArrayIterator::operator*()
{
return (*array)[index];
}

template
FileArrayProxy
FileArrayIterator::operator[](long n)
{
return (*array)[index+n];
}
Surely, the code for the two versions of ``operator+'' must be written, but since its behaviour is defined in terms of ``operator+='' it means that if we have an error, there's only one place to correct it. There's no need to display all the code here in the article, you can study it in the sources. The above shows how it all works, though, and as you can see, it's fairly simple.

Recap

This month the news in short was:
  • You can increase flexibility for your templates without sacrificing ease of use or safety by using traits classes.
  • Enumerations in classes can be used to have class-scope constants of integral type.
  • Modern compilers do not need the above hack. Defining a class-scope static constant of an integral type in the class declaration is cleaner and more type safe.
  • Standard C++ and even C, does not have any support for the notion of temporary files. Fortunately there are commonly supported extensions to the languages that do.
  • Private inheritance can be used for code reuse.
  • Private inheritance is very different from public inheritance. Public inheritance models ``is-A'' relationships, while private inheritance models ``is-implemented-in-terms-of'' relationships.
  • A user of a class that has privately inherited from something else cannot take advantage of this fact. To a user the private inheritance doesn't make any difference.
  • Private inheritance is in real-life used far less than it should be. In many situations where public inheritance is used, private inheritance should've been used.
  • Exception catching is polymorphic (i.e. dynamic binding works when catching.)
  • The polymorphism of exception catching allows us to create an arbitrarily fine-grained error reporting mechanism while still allowing users who want a coarse error reporting mechanism to use one (they'll just catch classes near the root of the exception class inheritance tree.)
  • Always implement binary operator+, operator-, operator* and operator/ as functions outside the classes, and always implement them in terms of the operator+=, operator-=, operator*= and operator/= members of the classes.

Exercises

  • Alter the file array such that it's possible to instantiate two (or more) kinds of FileArray in the same program, where the alternatives store the data in different formats. (hint, the alternatives will all need different traits class specialisations.)
  • What's the difference between using private inheritance of a base class, and using a member variable of that same class, for reusing code?
  • In which situations is it crucial which alternative you choose?

An Introduction to C++ Programming - Part 10

The data representation problem

In the file array as implemented last month, data was always stored in a raw binary format, exactly mirroring the bits as they lay in memory. This works fine for integers and such, but can be disastrous in other situations. Imagine a file array of strings (where string is a ``char*''). With the implementation from last month, the pointer value would be stored, not the data pointed to. When reading, a pointer value is read, and when dereferenced, whatever happens to be at the memory location pointed to (if anything) will be used (which is more than likely to result in a rather quick crash.) Anything with pointers is dangerous when stored in a raw binary format, yet we must somehow allow pointers in the array, and preferably so without causing problems for those using the array with built-in arithmetic types. How can this be done?
In part 4, when templates were introduced, a clever little construct called ``traits classes'' was shown. I then gave this rather terse description: ``A traits class is never instantiated, and doesn't contain any data. It just tells things about other classes, that is its sole purpose.'' Doesn't that smell like something we can use here? A traits class that tells how the data types should be represented on disk?
What do we need from such a traits class? Obviously, we need to know how much disk space each element will take, so a ``size'' member will definitely be necessary, otherwise we cannot know much disk space will be required. We also need to know how to store the data, and how to read it. The easiest way is probably to have member functions ``writeTo'' and ``readFrom'' in the traits class. Thus we can have something looking like this:

template class FileArrayElementAccess
{
public:
static const size_t size;
static void writeTo(T value, ostream& os);
static T readFrom(istream& is);
};
The array is then rewritten to use this when dealing with the data. The change is extremely minor. ``storeElement'' needs to be rewritten as:

template
void FileArray::storeElement(size_t index,
const T& element)
{
// what if index >= array_size?
typedef FileArrayElementAccess traits;
(*pstream).seekp(traits::size*index
+sizeof(array_size), ios::beg);
// what if seek fails?
traits::writeTo(element,*pstream);
// what if write failed?
// what if too much data was written?
}
The change for ``readElement'' is of course analogous. However, as indicated by the last comment, a new error possibility has shown up. What if the ``writeTo'' and ``readFrom'' members of the traits class are buggy and write or read more data to disk than they're allowed to? Since it's the user of the array that must write the traits class (at least for their own data types) we cannot solve the problem, but we can give the user a chance to discover that something went wrong. Unfortunately for writing, the error is extremely severe; it means that the next entry in the array will have its data destroyed... In the traits class, by the way, the constant ``size'', used for telling how many bytes in the stream each ``T'' will occupy, poses a problem with most C++ compilers today (modern ones mostly makes life so much easier.) The problem is that a static variable, and also a static constant, in a class, needs to reside somewhere in memory, and the class declaration is not enough for that. This problem is two-fold. To begin with, where should it be stored? It's very much up to whoever writes the class, but somewhere in the code, there must be something like:

const size_t ArrayFileElementAccess::size = ...;
where ``X'' is the name of the class dealt with by the particular traits specialisation. The second problem is that this is totally unnecessary. What we want is a value that can be used by the compiler at compile time, not a memory location to read a value from. As I mentioned, a modern compiler does make this much easier. In standard C++ it is allowed to write:

template<> class ArrayFileElementAccess
{
public:
const size_t size = ...;
...
};
Note that for some reason that I do not know, this construct is only legal if the type is a constant of an integral or enumeration type. ``size_t'' is such a type, it's some unsigned integral type, probably ``unsigned int'', but possibly ``unsigned long''. The expression denoted ``...'' must be possible to evaluate at compile time. Unless code is written that explicitly takes the address of ``size'', we need not give the constant any space to reside in. The odd construct ``template <>'' is also new C++ syntax, and means that what follows is a specialisation of a previously declared template. For old compilers, however, there's a work-around for integral values, no larger than the largest ``int'' value. We cheat and use an enum instead of a ``size_t''. This makes the declaration:

class ArrayFileElementAccess
{
public:
enum { size= ... };
...
};
This is a bit ugly, but it is perfectly harmless. The advantage gained by adding the traits class is flexibility and safety. If someone wants to use a file array for their own class, they're free to do so. However, they must first write a ``FileArrayElementAccess'' specialisation. Failure to do so will result in a compilation error. This early error detection is beneficial. The sloppy solution from last month would not yield any error until run-time, which means a (usually long) debugging session.

Several arrays in a file

What is needed in order to host several arrays in the same file? One way or the other, there must be a mechanism for finding out where one array begins and another ends. I think the simplest solution, is to let go of the file names, and instead make the constructors accept an ``fstream&''. We can then require that the put and get pointer of the stream must be where the array can begin, and we can in turn promise that the put and get pointer will be positioned at the byte after the array end. Of course, in addition to having a reference to the ``fstream'' in our class, we also need the ``home'' position, to seek relative to, when indexing the array. This becomes easy to write for us, it becomes easy to use as well. For someone requiring only one array in a file, there'll be slightly more code, an ``fstream'' object must be explicitly initialised somewhere, and passed to the constructor of the array, instead of just giving it a name. I think the functionality increase/code expansion exchange is favorable.
In order to improve the likelihood of finding errors, we can waste a few bytes of disk space by writing a well known header and trailer pattern at the beginning and end of the array (before the first element, and after the last one.) If someone wants to allocate an array using an existing file, we can find out if the get pointer is in place for an array start.
The constructor creating a file should, however, first try to read from the file to see if it exists. If it does, it should be created from the file, just like the constructor accepting a stream only does. If the read fails, however, we can safely assume that the file doesn't exist and should instead be created.
The change in the class definition, and constructor implementation is relatively straight forward, if long:

template
class FileArray
{
public:
FileArray(fstream& fs, size_t elements);
// create a new file.

FileArray(fstream& fs);
// use an existing file and get size from there
...
private:
void initFromFile(const char*);

fstream& stream;
size_t array_size; // in elements
streampos home;
};

template
FileArray::FileArray(fstream& fs, size_t elements)
: stream(fs),
array_size(elements)
{
// what if the file could not be opened?
// first try to read and see if there's a begin
// pattern. Either there is one, or we should
// get an eof.

char pattern[6];
stream.read(pattern,6);
if (stream.eof()) {
stream.clear(); // clear error state
// and initialise.

// begin of array pattern.
stream.write("ABegin",6);
// must store size of elements, as last month
const size_t elem_size
=FileArrayElementAccess::size;
stream.write((const char*)&elem_size,
sizeof(elem_size));
// and of course the number of elements
stream.write((const char*)&array_size,
sizeof(array_size));
// Now that we've written the maintenance
// stuff, we know what the home position is.

home = stream.tellp();

// Then we must go the the end and write
// the end pattern.

stream.seekp(home+elem_size*array_size);
stream.write("AEnd",4);

// set put and get pointer to past the end pos.
stream.seekg(stream.tellp());
return;
}

initFromFile(pattern); // shared with other
// stream constructor
if (array_size != elements) {
// Uh oh. The data read from the stream,
// and the size given in the constructor
// mismatches! What now?
stream.clear(ios::failbit);
}

// set put and get pointer to past the end pos.
stream.seekp(stream.tellg());
}

template
FileArray::FileArray(fstream& fs)
: stream(fs)
{
// First read the head pattern to see if
// it's right.
char pattern[6];
stream.read(pattern,6);
initFromFile(pattern);
// set put and get pointer to past the end pos.
stream.seekp(stream.tellg());
}

template
void FileArray::initFromFile(const char* p)
{
// Check if the read pattern is correct
if (strncmp(p,"ABegin",6)) {
// What to do? It was all wrong!
stream.clear(ios::failbit);
// for lack of better,
// set the fail flag.
return;
}
// OK, we have a valid array, now let's see if
// it's of the right kind.
size_t elem_size;
stream.read((char*)&elem_size,sizeof(elem_size));
if (elem_size != FileArrayElementAccess::size)
{
// wrong kind of array, the element sizes
// mismatch. Again, what to do? Let's set
// the fail flag for now.
stream.clear(ios::failbit);
// stupid name for the
// member function, right?
return;
}
// Get the size of the array. Can't do much with
// the size here, though.
stream.read((char*)&array_size,sizeof(array_size));
// Now we're past the header, so we know where the
// data begins and can set the home position.

home = stream.tellg();

stream.seekg(home+elem_size*array_size);

// Now positioned immediately after the last
// element.

char epattern[4];
stream.read(epattern,4);
if (strncmp(epattern,"AEnd",4)) {
// Whoops, corrupt file!
stream.clear(ios::failbit);
return;
}
// Seems like we have a valid array!
}
Other than the above, the only change needed for the array is that seeking will be done relative to ``home'' rather than the beginning of the file (plus the size of the header entries.) The new versions of ``storeElement'' and ``readElement'' become:

template
T FileArray::readElement(size_t index) const
{ // what if index >= max_elements?
typedef FileArrayElementAccess traits;
stream.seekg(home+index*traits::size);
// what if seek fails?

return traits::readFrom(stream);
// what if read fails?
// What if too much data is read?
}

template
void FileArray::storeElement(size_t index,
const T& element)
{ // what if index >= array_size?
typedef FileArrayElementAccess traits;
stream.seekp(home+traits::size*index);
// what if seek fails?
traits::writeTo(element,stream);
// what if write failed?
// what if too much data was written?
}

Temporary file array

Making use of a temporary file to store a file array that's not to be persistent between runs of the application isn't that tricky. The implementation so far makes use of a stream and known data about the beginning of the stream, number of elements and size of the elements. This can be used for the temporary file as well. The only thing we need to do is to create the temporary file first, open it with an fstream object, and tie the stream reference to that object, and remember to delete the file in the destructor.
What's the best way of creating something and making sure we remember to undo it later? Well, of course, creating a new helper class which creates the file in its constructor and removes it in its destructor. Piece of cake. The only problem is that we shouldn't always create a temporary file, and when we do, we can handle it a bit different from what we do with a ``global'' file that can be shared. For example, we know that we have exclusive rights to the file, and that it won't be reused, so there's no need for the extra information in the beginning and end. So, how's a temporary file created? The C++ standard doesn't say, and neither is there any support for it in the old de-facto standard. I don't think C does either. There are, however, two functions ``tmpnam'' and ``tempnam'' defined as commonly supported extensions to C. They can be found in . I have in this implementation chosen to use ``tempnam'' as it's more flexible. ``tempnam'' works like this: it accepts two string parameters named ``dir'' and ``prefix''. It first attempts to create a temporary file in the directory pointed to by the environment variable ``TMPDIR''. If that fails, it attempts to create it in the directory indicated by the ``dir'' parameter, unless it's 0, in which case a hard-coded default is attempted. It returns a ``char*'' indicating a name to use. The memory area pointed to is allocated with the C function ``malloc'', and thus must be deallocated with ``free'' and not delete[].
Over to the implementation details:
We add a class called temporaryfile, which does the above mentioned work. We also add a member variable ``pfile'' which is of type ``ptr''. Remember the ``ptr'' template from last month? It's a smart pointer that deallocates whatever it points to in its destructor. It's important that the member variable ``pfile'' is listed before the ``stream'' member, since initialisation is done in the order listed, and the ``stream'' member must be initialised from the file object owned by ``pfile''. We also add a constructor with the number of elements as its sole parameter, which makes use of the temporary file.

class temporaryfile
{
public:
temporaryfile();
~temporaryfile();
iostream& stream();
private:
char* name;
fstream fs;
};

temporaryfile::temporaryfile()
: name(::tempnam(".","array")),
fs(name, ios::in|ios::out|ios::binary)
{
// what if tmpnam fails and name is 0
// what if fs is bad?
}

temporaryfile::~temporaryfile()
{
fs.close();
::remove(name);
// what if remove fails?
::free(name);
}
In the above code, ``tempnam'', ``remove'' and ``free'' are prefixed with ``::``, to make sure that it's the names in global scope that are meant, just in case someone enhances the class with a few more member functions whose name might clash. For the sake of syntactical convenience, I have added yet another operator to the ``ptr'' class template:

template class ptr
{
public:
ptr(T* tp=0) : p(tp) {};
~ptr() { delete p; };
T* operator->(void) const { return p; };
T& operator*(void) const { return *p;};
private:
ptr(const ptr&);
ptr& operator=(const ptr&);
T* p;
};
It's the ``operator->'' that's new, which allows us to write things like ``p->x,'' where p is a ``ptr'', and the type ``X'' contains some member named ``x''. The return type for ``operator->'' must be something that ``operator->'' can be applied to. The explanation sounds recursive, but it makes sense if you look at the above code. ``ptr::operator->()'' returns an ``X*''. ``X*'' is something you can apply the built in ``operator->'' to (which gives you access to the elements.)

template
FileArray::FileArray(size_t elements)
: pfile(new temporaryfile),
stream(pfile->stream()),
array_size(elements),
home(stream.tellg())
{
const size_t elem_size=
FileArrayElementAccess::size;
// put a char just after the end to make
// sure there's enough free disk space.
stream.seekp(home+array_size*elem_size);
char c;
stream.write(&c,1);
// what to do if write fails?
// set put and get pointer to past the end pos
stream.seekg(stream.tellp());
}
That's it! The rest of the array works exactly as before. No need to rewrite anything else.

Code reuse

If you're an experienced C programmer, especially experienced with programming embedded systems where memory constraints are tough and you also have a good memory, you might get a feeling that something's wrong here.
What I'm talking about is something I mentioned the first time templates were introduced: ``Templates aren't source code. The source code is generated by the compiler when needed.'' This means that if we in a program uses FileArray, FileArray, FileArray and FileArray (where ``X'' and ``Y'' are some classes,) there will be code for all four types. Now, have a close look at the member functions and see in what way ``FileArray::FileArray(iostream& fs, size_t elements)'' differs from ``FileArray::FileArray(iostream& fs, size_t elements)''. Please do compare them.
What did you find? The only difference at all is in the handling of the member ``elem_size'', yet the same code is generated several times with that as the only difference. This is what is often referred to as the template code bloat of C++. We don't want code bloat. We want fast, tight, and slick applications.
Since the only thing that differs is the size of the elements, we can move the rest to something that isn't templatised, and use that common base everywhere. I've already shown how code reuse can be done by creating a separate class and have a member variable of that type. In this article I want to show an alternative way of reusing code, and that is through inheritance. Note very carefully that I did not say public inheritance. Public inheritance models ``is-A'' relationships only. We don't want an ``is-A'' relationship here. All we want is to reuse code to reduce code bloat. This is done through private inheritance. Private inheritance is used far less than it should be. Here's all there is to it. Create a class with the desired implementation to reuse and inherit privately from it. Nothing more, nothing less. To a user of your class, it matters not at all if you chose not to reuse code at all, reuse through encapsulation of a member variable, or reuse through private inheritance. It's not possible to refer to the descendant class through a pointer to the private base class, private inheritance is an implementation detail only, and not an interface issue.
To the point. What can, and what can not be isolated and put in a private base class? Let's first look at the data. The ``stream'' reference member can definitely be moved to the base, and so can the ``pfile'' member for temporary files. The ``array_size'' member can safely be there too and also the ``home'' member for marking the beginning of the array on the stream. By doing that alone we have saved just about nothing at all, but if we add as a data member in the base class the size (on disk) for the elements, and we can initialise that member through the ``FileArrayElementAccess::size'' traits member, all seeking in the file, including the initial seeking when creating the file array, can be moved to the base class. Now a lot has been gained. Left will be very little. Let's look at the new improved implementation:
Now for the declaration of the base class.

class FileArrayBase
{
public:
protected:
FileArrayBase(iostream& io,
size_t elements,
size_t elem_size);
FileArrayBase(iostream& io);
FileArrayBase(size_t elements, size_t elem_size);
iostream& seekp(size_t index) const;
iostream& seekg(size_t index) const;
size_t size() const; // number of elements
size_t element_size() const;
private:
class temporaryfile
{
public:
temporaryfile();
~temporaryfile();
iostream& stream();
private:
char* name;
fstream fs;
};
void initFromFile(const char* p);
ptr pfile;
iostream& stream;
size_t array_size;
size_t e_size;
streampos home;
};
The only surprise here should be the nesting of the class ``temporaryfile.'' Yes, it's possible to define a class within a class. Since the ``temporaryfile'' class is defined in the private section of ``FileArrayBase'', it's inaccessible from anywhere other than the ``FileArrayBase'' implementation. It's actually possible to nest classes in class templates as well, but few compilers today support that. When implementing the member functions of the nested class, it looks a bit ugly, since the surrounding scope must be used.

FileArrayBase::temporaryfile::temporaryfile()
: name(::tempnam(".","array")),
fs(name,ios::in|ios::out|ios::binary)
{
// what if tmpnam fails and name is 0
// what if fs is bad?
}

FileArrayBase::temporaryfile::~temporaryfile()
{
fs.close();
::remove(name);
// What if remove fails?
::free(name);
}

iostream& FileArrayBase::temporaryfile::stream()
{
return fs;
}
The implementation of ``FileArrayBase'' is very similar to the ``FileArray'' earlier. The only difference is that we use a parameter for the element size, instead of the traits class.

FileArrayBase::FileArrayBase(iostream& io,
size_t elements,
size_t elem_size)
: stream(io),
array_size(elements),
e_size(elem_size)
{
char pattern[sizeof(ArrayBegin)];
stream.read(pattern,sizeof(pattern));
if (stream.eof()) {
stream.clear(); // clear error state
// and initialize.
// begin of array pattern.
stream.write(ArrayBegin,sizeof(ArrayBegin));

// must store size of elements
stream.write((const char*)&elem_size,
sizeof(elem_size));

// and of course the number of elements
stream.write((const char*)&array_size,
sizeof(array_size));

// Now that we've written the maintenance
// stuff, we know what the home position is.
home = stream.tellp();

// Then we must go the the end and write
// the end pattern.

stream.seekp(home+elem_size*array_size);
stream.write(ArrayEnd,sizeof(ArrayEnd));

// set put and get pointer to past the end pos.
stream.seekg(stream.tellp());
return;
}
initFromFile(pattern); // shared with other
// stream constructor

if (array_size != elements) {
// Uh oh. The data read from the stream,
// and the size given in the constructor
// mismatches! What now?

stream.clear(ios::failbit);
}
if (e_size != elem_size) {
stream.clear(ios::failbit);
}
// set put and get pointer to past the end pos.
stream.seekp(stream.tellg());
}
To make life a little bit easier, I've assumed two arrays of char named ``ArrayBegin'' and ``ArrayEnd'', which hold the patterns to be used for marking the beginning and end of an array on disk.

FileArrayBase::FileArrayBase(iostream& io)
: stream(io)
{
char pattern[sizeof(ArrayBegin)];
stream.read(pattern,sizeof(pattern));
initFromFile(pattern);

// set put and get pointer to past the end pos.
stream.seekp(stream.tellg());
}

FileArrayBase::FileArrayBase(size_t elements,
size_t elem_size)
: pfile(new temporaryfile),
stream(pfile->stream()),
array_size(elements),
e_size(elem_size),
home(stream.tellg())
{
stream.seekp(home+array_size*e_size);
char c;
stream.write(&c,1);
// set put and get pointer to past the end pos.
stream.seekg(stream.tellp());
}

void FileArrayBase::initFromFile(const char* p)
{
// Check if the read pattern is correct
if (strncmp(p,ArrayBegin,sizeof(ArrayBegin))) {
// What to do? It was all wrong!
stream.clear(ios::failbit); // for lack of better,
// set the fail flag.
return;
}
// OK, we have a valid array, now let's see if
// it's of the right kind.
stream.read((char*)&e_size,sizeof(e_size));

// Get the size of the array. Can't do much with
// the size here, though.
stream.read((char*)&array_size,sizeof(array_size));

// Now we're past the header, so we know where the
// data begins and can set the home position.
home = stream.tellg();
stream.seekg(home+e_size*array_size);
// Now positioned immediately after the last
// element.
char epattern[sizeof(ArrayEnd)];
stream.read(epattern,sizeof(epattern));
if (strncmp(epattern,ArrayEnd,sizeof(ArrayEnd)))
{
// Whoops, corrupt file!
stream.clear(ios::failbit);
return;
}
// Seems like we have a valid array!
}

iostream& FileArrayBase::seekg(size_t index) const
{
// what if index is out of bounds?
stream.seekg(home+index*e_size);
// what if seek failed?
return stream;
}

iostream& FileArrayBase::seekp(size_t index) const
{
// What if index is out of bounds?
stream.seekp(home+index*e_size);
// What if seek failed?
return stream;
}

size_t FileArrayBase::size() const
{
return array_size;
}

size_t FileArrayBase::element_size() const
{
return e_size;
}
Apart from the tricky questions, it's all pretty straight forward. The really good news, however, is how easy this makes the implementation of the class template ``FileArray''.

template
class FileArray : private FileArrayBase
{
public:
FileArray(iostream& io, size_t size);// create one.
FileArray(iostream& io); // use existing array
FileArray(size_t elements); // create temporary
T operator[](size_t index) const;
FileArrayProxy operator[](size_t index);
size_t size() { return FileArrayBase::size(); };
private:
FileArray(const FileArray&); // illegal
FileArray& operator=(const FileArray&);
// illegal

T readElement(size_t index) const;
void storeElement(size_t index, const T& elem);
friend class FileArrayProxy;
};
Now watch this!

template
FileArray::FileArray(iostream& io, size_t size)
: FileArrayBase(io,
elements,
FileArrayElementAccess::size)
{
}

template
FileArray::FileArray(iostream& io)
: FileArrayBase(io)
{
// what if element_size is wrong?
}

template
FileArray::FileArray(size_t elements)
: FileArrayBase(elements,
FileArrayElementAccess::size)
{
}

template
T FileArray::operator[](size_t index) const
{
// what if index>= size()?
return readElement(index);
}

template
FileArrayProxy
FileArray::operator[](size_t index)
{
// what if index>= size()?
return FileArrayProxy(*this, index);
}

template
T FileArray::readElement(size_t index) const
{
// what if index>= size()?
iostream& s = seekg(index); // parent seekg
return FileArrayElementAccess::readFrom(s);
// what if read failed?
// What if too much data was read?
return t;
}

template
void FileArray::storeElement(size_t index,
const T& element)
{ // what if index>= size()?
iostream& s = seekp(index); // parent seekp
// what if seek fails?
FileArrayElementAccess::writeTo(element,s);
// what if write failed?
// What if too much data was written?
}
How much easier can it get? This reduced code bloat, and also makes the source code easier to understand, extend and maintain.

What can go wrong?

Already in the very beginning of this article series, part 1, I introduced exceptions; the C++ error handling mechanism. Of course exceptions should be used to handle the error situations that can occur in our array class. When I introduced exceptions, I didn't tell the whole truth about them. There was one thing I didn't tell, because at that time it wouldn't have made much sense. That one thing is that when exceptions are caught, dynamic binding works, or to use wording slightly more English-like, we can create exception class hierarchies with public inheritance, and we can choose what level to catch. Here's a mini example showing the idea:

class A {};
class B : public A {};
class C : public A {};
class B1 : public B{};

void f() (throw A); // may throw any of the above

void x()
{
try {
f();
}
catch (B& b) {
// **1
}
catch (C& c) {
// **2
}
catch (A& a) {
// **3
}
}
At ``**1'' above, objects of class ``B'' and class ``B1'' are caught if thrown from ``f''. In ``**2'' objects of class ``C'' (and descendants of C, if any are declared elsewhere) are caught. At ``**3'' all others from the ``A'' hierarchy are caught. This may seem like a curious detail of purely academic worth, but it's extremely useful. We can use abstraction levels for errors. For example, we can have a root class ``FileArrayException'', from which all other exceptions regarding the file array inherits. We can see that there are clearly two kinds of errors that can occur in the file array; abuse and environmental issues outside the control of the programmer. For abuse I mean things like indexing outside the valid bounds, and with environmental issues I mean faulty or full disks (Since there are several programs running, a check if there's enough disk space is still taking a chance. Even if there was enough free space when the check was made, that space may be occupied when the next statement in the program is executed.)
A reasonable start for the exception hierarchy then becomes:

class FileArrayException {};
class FileArrayLogicError
: public FileArrayException {};
class FileArrayRuntimeError
: public FileArray Exception {};
Here ``FileArrayLogicError'' are for clear violations of the not too clearly stated preconditions, and ``FileArrayRuntimeError'' for things that the programmer may not have a chance to do something about. In a perfectly debugged program, the only exceptions ever thrown from file arrays will be of the ``FileArrayRuntimeError'' kind. We can divide those further into:

class FileArrayCreateError
: public FileArrayRuntimeError {};
For whenever the creation of the array fails, regardless of why (it's not very easy to find out if it's a faulty disk or lack of disk space, for example.)

class FileArrayStreamError
: public FileArrayRuntimeError {};
If after creation, something goes wrong with a stream; for example if seeking or reading/writing fails.

class FileArrayDataCorruptionError
: public FileArrayRuntimeError {};
If an array is created from an old existing file, and we note that the header or trailer doesn't match the expected.

class FileArrayBoundsError
: public FileArrayLogicError {};
Addressing outside the legal bounds.

class FileArrayElementSizeError
: public FileArrayLogicError {};
If the read/write members of the element access traits class are faulty and either write too much (thus overwriting the data for the next element) or reads too much (in which case the last few bytes read will be garbage picked from the next element.) It's of course possible to take this even further. I think this is quite enough, though.
Now we have a reasonably fine level of error reporting, yet an application that wishes a coarse level of error handling can choose to catch the higher levels of the hierarchy only.
As an exercise, I invite you to add the throws to the code. Beware, however; it's not a good idea to add exception specifications to the member functions making use of the T's (since you cannot know which operations on T's that may throw, and what they do throw.) You can increase the code size and eligibility gain from the private inheritance of the implementation in the base by putting quite a lot of the error handling there.

Iterators

An iterator into a file array is something whose behavior is analogous to that of pointers into arrays. We want to be able to create an iterator from the array (in which case the iterator refers to the first element of the array.) We want to access that element by dereferencing the iterator (unary operator *,) and we want iterator arithmetic with integers.
An easy way of getting there is to let an iterator contain a pointer to a file array, and an index. Whenever the iterator is dereferenced, we return (*array)[index]. That way we even have error handling for iterator arithmetic that lead us outside the valid range for the array given for free from the array itself. The iterator arithmetics becomes simple too, since it's just ordinary arithmetics on the index type. The implementation thus seems easy; all that's needed is to define the operations needed for the iterators, and the actions we want. Here's my idea:
  • creation from array yields iterator referring to first element
  • copy construction and assignment are of course well behaved.
  • moving forwards and backwards with operator++ and operator--.
  • addition of array and ``long int'' value ``n'' yields iterator referring to n:th element of array.
  • iterator+=n (where n is of type long int) adds n to the value of the index in the iterator. This addition is never an error; it's dereferencing the iterator that's an error if the index is out of range. Operator -= is analogous.
  • iterator+n yields a new iterator referring to the iterator.index+n:th element of the array, and analogous for operator-.
  • iterator1-iterator2 yields a long int which is the difference between the indices of the iterators. If iterator1 and iterator2 refer to different arrays, it's an error and we throw an exception.
  • iterator1==iterator2 returns non-zero if the arrays and indices of iterator1 and iterator2 are equal.
  • iterator1!=iterator2 returns !(iterator1==iterator2)
  • *iterator returns whatever (*array)[index] returns, i.e a
  • leArrayProxy. * iterator[n] returns (*array)[index+n].
  • iterator1< iterator2.index. If the iterators refer to different arrays, it's an error and we throw an exception. Likewise for operator>.
  • iterator1>=iterator2 returns !(iterator1<=.
I think the above is an exhaustive list. Neither of the above is difficult. It's just a lot of code to write, and thus a good chance of making errors. With a little thought, however, quite a lot of code can be reused over and over, thus reducing the amount to write and also the risk for errors. As an example, a rule of thumb when writing a class for which an object ``o'' and some other value ``v'' the operations ``o+=v'', ``o+v'' and ``v+o'' are well defined and behaves like they do for the built in types (which they really ought to, unless you want to give the class users some rather unhealthy surprises) is to define ``operator+='' as a member of the class, and two versions of operator+ that are implemented with ``operator+=''. Here's how it's done in the iterator example:

template
class FileArrayIterator
{
public:
FileArrayIterator(FileArray& f);
FileArrayIterator& operator+=(long n);
FileArrayProxy operator*();
FileArrayProxy operator[](long n);
...
private:
FileArray* array;
unsigned long index;
};

template FileArrayIterator
operator+(const FileArrayIterator& i, long n);

template FileArrayIterator
operator+(long n, const FileArrayIterator& i);

template
FileArrayIterator::FileArrayIterator(
const FileArray& a
)
: array(&a),
index(0)
{
}

template
FileArrayIterator::FileArrayIterator(
const FileArrayIterator& i
)
: array(i.array),
index(i.index)
{
}

template
FileArrayIterator&
FileArrayIterator::operator+=(long n)
{
index+=n;
return *this;
}

template FileArrayIterator
operator+(const FileArrayIterator& i, long n)
{
FileArrayIterator it(i);
return it+=n;
}

template FileArrayIterator
operator+(long n, const FileArrayIterator& i)
{
FileArrayIterator it(i);
return it+=n;
}

template
FileArrayProxy FileArrayIterator::operator*()
{
return (*array)[index];
}

template
FileArrayProxy
FileArrayIterator::operator[](long n)
{
return (*array)[index+n];
}
Surely, the code for the two versions of ``operator+'' must be written, but since its behaviour is defined in terms of ``operator+='' it means that if we have an error, there's only one place to correct it. There's no need to display all the code here in the article, you can study it in the sources. The above shows how it all works, though, and as you can see, it's fairly simple.

Recap

This month the news in short was:
  • You can increase flexibility for your templates without sacrificing ease of use or safety by using traits classes.
  • Enumerations in classes can be used to have class-scope constants of integral type.
  • Modern compilers do not need the above hack. Defining a class-scope static constant of an integral type in the class declaration is cleaner and more type safe.
  • Standard C++ and even C, does not have any support for the notion of temporary files. Fortunately there are commonly supported extensions to the languages that do.
  • Private inheritance can be used for code reuse.
  • Private inheritance is very different from public inheritance. Public inheritance models ``is-A'' relationships, while private inheritance models ``is-implemented-in-terms-of'' relationships.
  • A user of a class that has privately inherited from something else cannot take advantage of this fact. To a user the private inheritance doesn't make any difference.
  • Private inheritance is in real-life used far less than it should be. In many situations where public inheritance is used, private inheritance should've been used.
  • Exception catching is polymorphic (i.e. dynamic binding works when catching.)
  • The polymorphism of exception catching allows us to create an arbitrarily fine-grained error reporting mechanism while still allowing users who want a coarse error reporting mechanism to use one (they'll just catch classes near the root of the exception class inheritance tree.)
  • Always implement binary operator+, operator-, operator* and operator/ as functions outside the classes, and always implement them in terms of the operator+=, operator-=, operator*= and operator/= members of the classes.

Exercises

  • Alter the file array such that it's possible to instantiate two (or more) kinds of FileArray in the same program, where the alternatives store the data in different formats. (hint, the alternatives will all need different traits class specialisations.)
  • What's the difference between using private inheritance of a base class, and using a member variable of that same class, for reusing code?
  • In which situations is it crucial which alternative you choose?

An Introduction to C++ Programming - Part 9

In parts 5 and 6, the basics of I/O were introduced, with formatted reading and writing from standard input and output. We'll now have a look at I/O for files. In a sense, it's better to stop using the term I/O here, and instead use streams and streaming, since the ideas expressed here and in parts 5 and 6 can be used for other things than I/O, for example in-memory formatting of data (we'll see that at the very end of this article.)

Files

In what way is writing ``Hello world'' on standard output different from writing it to a file? The question is worth some thought, since in many programming languages there is a distinct difference. Is the message different? Is the format (as seen from the program) different? I cannot see any difference in those aspects. The only thing that truly differs is the media where the formatted message ends up. In the former case, it's on your screen, but for file I/O it's in a file somewhere on your hard disk. In other words, there is very little difference, or at least, there's very much in common.
As we've seen so far, commonality is expressed either through inheritance or templates, depending on what's common and what's not. To refresh your memory, templates are used when we want the same kind of behaviour, independent of data. For example a stack of some data type. Inheritance is used when you want similar, but in some important aspects different, behaviour at runtime for the same kind of data. We saw this for the staff hierarchy and mailing addresses in parts 7 and 8. In this case it's inheritance that's the correct solution, since the data will be the same, but where it will end up (and most notably, how it does end up there) differs. (Incidentally, there's a good case for using templates too, regarding the type of characters used. The C++ standard does indeed have templatized streams, just for differing between character types. Few compilers today support this, however. See the ``Standards Update'' towards the end of the article for more information.)
The inheritance tree for stream types look like this:

The way to read this is that there's a base class named ``ios'', from which the classes ``istream'' and ``ostream'' inherit. The classes ``ifstream'' and ``ofstream'' in their turn inherit from ``istream'' and ``ostream'' respectively. The ``f'' in the names imply that they're file streams. Then there's the odd ones, ``iostream'', which inherits from both ``istream'' and ``ostream'', and ``fstream'' which inherits from both ``ifstream'' and ``ofstream.'' Inheriting from two bases is called multiple inheritance, and is by many seen as evil. Many programming languages have banned it: Objective-C, Java, Smalltalk to mention a few, while other programming languages, like Eiffel, go to the other extreme and allow you to inherit the same base several times Personally I think multiple inheritance is very useful if used right, but it can cause severe problems. Here is a situation where it's used in the right way. Anyway, this means that ``fstream'' is a file stream for both reading and writing, while ``iostream'' is an abstract stream for both reading and writing. More often than you think, you probably don't want to use the ``iostream'' or ``fstream'' classes.
This inheritance, however, means that all the stream insertion and extraction functions (the ``operator>>'' and ``operator<<'') you've written, will work just as they do with file streams. Now, wasn't that neat? In other words, the only things you need to learn for file based I/O are the details that are specific to files.

File Streams

The first thing you need to know before you can use file streams is how to create them. The parts of interest look like this:

class ifstream : public istream
{
ifstream();
ifstream(const char* name,
int mode=ios::in);
void open(const char* name,
int mode=ios::in);
...
};

class ofstream : public ostream
{
ofstream();
ofstream(const char* name,
int mode=ios::out);
void open(const char* name,
int mode=ios::out);
...
};

class fstream : public ofstream, public ifstream
{
fstream();
fstream(const char* name,
int mode);
void open(const char* name,
int mode);
...
};
You get access to the classes by #including . The empty constructors always create a file stream object that is not tied to any file. To tie such an object to a file, a call to ``open'' must be made. ``open'' and the constructors with parameters behaves identically. ``name'' is of course the name of the file. Since you normally use either ``ifstream'' or ``ofstream'' and rarely ``fstream'', this is normally the only parameter you need to supply. Sometimes, however, you need to use the ``mode'' parameter. It's a bit field, in which you use bitwise or (``operator|'') for any of the values ``ios::in'', ``ios::out'', ``ios::ate'', ``ios::app'', ``ios::trunc'', and finally ``ios::binary.'' Some implementations also provide ``ios::nocreate'' and ``ios::noreplace,'' but those are extensions. Some implementations do not have ``ios::binary,'' while others call it ``ios::bin.'' These variations of course makes it difficult to write portable C++ today. Fortunately, the six ones listed first are required by the standard (although, they belong to class ``ios_base,'' rather than ``ios.'') The meaning of these are:

ios::in open for reading

ios::out open for writing

ios::ate open with the get and set pointer at the end
(see Seeking for info) of the file.

ios::app open for append, that is, any write you make
to the file will be appended to the file.

ios::trunc scrap all data in the file if it already exists.

ios::binary open in binary mode, that is, do not do the brain
damaged LF<->CR/LF conversions that OS/2,
DOS, CP/M (RIP), Windows, and probably other
operating systems, so often insist on. The reason
some implementations do not have ios::binary
is that many operating systems do not have this
conversion, so there's no need for it.

ios::noreplace cause the open to fail if the file already exists.

ios::nocreate cause the open to fail if the file doesn't exist.
Of course combinations like ``ios::noreplace | ios::nocreate'' doesn't make sense -- the failure is guaranteed. On many implementations today there's also a third parameter for the constructors and ``open;'' a protection parameter. How this parameter behaves is very operating system dependent.
Now for some simple usage:

#include

int main(int argc, char* argv[])
{
if (argc != 2) {
cout << ``Usage: `` << argv[0] << ``filename'' << endl;
return 1; // error code
}

ofstream of(argv[1]); // create the ofstream object
// and open the file.

if (!of) { // something went wrong
cout << ``Error, cannot open `` << argv[1] << endl;
return 2;
}

// Now the file stream object is created. Write to it!
of << ``Hello file!'' << endl;
return 0;
}
As you can see, once the stream object is created, its usage is analogous to that of ``cout'' that you're already familiar with. Of course reading with ``ifstream'' is done the same way, just use the object as you've used ``cin'' earlier. The file stream classes also have a member function ``close'', that by force closes the file and unties the stream object from it. Few are the situations when you need to call this member function, since the destructors do close the file.
Actually this is all there is that's specific to files.

Binary streaming

So far we've dealt with formatted streaming only, that is, the process of translating raw data into a human readable form, or translating human readable data into the computer's internal representation. Some times you want to stream raw data as raw data, for example to save space in a file. If you look at a file produced by, for example a word processor, it's most likely not in a human readable form. Note that binary streaming does not necessarily mean using the ``ios::binary'' mode when opening a file (although, that is indeed often the case.) They're two different concepts. Binary streaming is what you use your stream for, raw data that is, and opening a file with the ``ios::binary'' mode, means turning the brain damaged LF<->CR/LF translation off.
Binary streaming is done through the stream member functions :

class ostream ...
{
public:
ostream& write(const char* s, streamsize n);
ostream& put(char c);
ostream& flush();
...
};

class istream ...
{
public:
istream& read(char* s, streamsize n);
int get();
istream& get(char& c);
istream& get(char* s, streamsize n, char delim='\n');
istream& getline(char* s, streamsize n,
char delim='\n');
istream& ignore(streamsize n=1, int delim=EOF);
};
The writing interface is extremely simple and straight forward, while the reading interface includes a number of small but important differences. Note that these member functions are implemented in classes ``istream'' and ``ostream,'' so they're not specific to files, although files are where you're most likely to use them. Let's have a look at them, one by one:

ostream& ostream::write(const char* s, streamsize n);
Write ``n'' characters to the stream, from the array pointed to by ``s.'' ``streamsize'' is a signed integral data type. Despite ``streamsize'' being signed, you're of course not allowed to pass a negative size here (what would that mean?) Exactly the characters found in ``s'' will be written to the stream, no more, no less.

ostream& ostream::put(char c);
Inserts the character into the stream.

ostream& ostream::flush();
Force the data in the stream to be written (file streams are usually buffered.)

istream& istream::read(char* s, streamsize n);
Read ``n'' characters into the array pointed to by ``s.'' Here you better make sure that the array is large enough, or unpleasant things will happen. Note that only the characters read from the stream are inserted into the array. It will not be zero terminated, unless the last character read from the stream indeed is '\0'.

int istream::get();
Read one character from the stream, and return it. The value is an ``int'' instead of ``char'' since the return value might be ``EOF'' (which is not uniquely representable as a ``char.'')

istream& istream::get(char& c);
Same as above, but read the character into ``c'' instead. Here a ``char'' is used instead of an ``int,'' since you can check the value directly by calling ``.eof()'' on the reference returned.

istream& istream::get(char* s, streamsize n,
char delim='\n');
This one's similar to ``read'' above, but with the difference that it reads at most ``n'' characters. It stops if the delimiter character is found. Note that when the delimiter is found, it is not read from the stream.

istream& istream::getline(char* s, streamsize n,
char delim='\n');
The only difference between this one and ``get'' above, is that this one does read the delimiter from the stream. Note, however, that the delimiter is not stored in the array.

istream& istream::ignore(streamsize n=1,
int delim=EOF);
Reads at most ``n'' characters from the stream, but doesn't store them anywhere. If the delimiter character is read, it stops there. Of course, if the delimiter is ``EOF'' (as is the default) it does not read past ``EOF,'' that's physically impossible.

Array on file

An example: Say we want to store an array of integers in a file, and we want to do this in raw binary format. Naturally we want to be able to read the array as well. A reasonable way is to first store a size (in elements) followed by the data. Both the size and the data will be in raw format.

#include

void storeArray(ostream& os, const int* p, size_t elems)
{
os.write((const char*)&elems,sizeof(elems));
os.write((const char*)p, elems*sizeof(*p));
}
The above code does a lot of ugly type casting, but that's normal for binary streaming. What's done here is to use brute force to see the address of ``elems'' as a ``const char*'' (since that's what ``write'' expects) and then say that only the ``sizeof(elems)'' bytes from that pointer are to be read. What this actually does is to write out the raw memory that ``elems'' resides in to the stream. After this, it does the same kind of thing for the array. Note that ``sizeof(*p)'' reports the size of the type that ``p'' points to. I could as well have written ``sizeof(int),'' but that is a dangerous duplication of facts. It's enough that I've said that ``p'' is a pointer to ``int.'' Repeating ``int'' again just means I'll forget to update one of them when I change the type to something else. To read such an array into memory requires a little more work:

#include

size_t readArray(istream& is, int*& p)
{
size_t elems;
is.read((char*)&elems, sizeof(elems));
p = new int[elems];
is.read((char*)elems, elems*sizeof(*p));
return elems;
}
It's not particularly hard to follow; first read the number of elements, then allocate an array of that size, and read the data into it.

Seeking

Up until now we have seen streams as, what it sounds like, continuous streams of data. Sometimes however, there's a need to move around, both backward and forward. Streams like standard input and standard output are truly continuous streams, within which you cannot move around. Files, in contrast, are true random access data stores. Random access streams have something called position pointers. They're not to be confused with pointers in the normal C++ sense, but it's something referring to where in the file you currently are. There's the put pointer, which refers to the next position to write data to, if you attempt to write anything, and the get pointer, which refers to the next position to read data from. An ostream of course only has the put pointer, and an istream only the get pointer. There's a total of 6 new member functions that deal with random access in a stream:

streampos istream::tellg();

istream& istream::seekg(streampos);

istream& istream::seekg(streamoff, ios::seek_dir);

streampos ostream::tellp();

ostream& ostream::seekp(streampos);

ostream& ostream::seekp(streamoff, ios::seek_dir);
``streampos'', which you get from ``tellg'' and ``tellp'' is an absolute position in a stream. You cannot use the values for anything other than ``seekg'' and ``seekp''. You especially cannot examine a value and hope to find something useful there (i.e. you can, but what you find out might hold only for the current release of your specific compiler, other compilers, or other releases of the same compiler, might show different characteristics for ``streampos.'') Well, there are two other things you can do with ``streampos'' values. You can subtract two values, and get a ``streamoff'' value, and you can add a ``streamoff'' value to a ``streampos'' value. ``streamoff,'' by the way, is some signed integral type, probably a ``long.'' By using the value returned from ``tellg'' or ``tellp,'' you have a way of finding your way back, or do relative searches by adding/subtracting ``streamoff'' values.
The ``seekg'' and ``seekp'' methods accept a ``streamoff'' value and a direction, and work in a slightly different way. You search your way to a position relative to the beginning of the stream, the end of the stream, or the current position, the selection of which, is done through the ``ios::seek_dir'' enum, which has these three values ``ios::beg'', ``ios::end'' and ``ios::cur.'' To make the next write occur on the very first byte of the stream, call ``os.seekp(0,ios::beg),'' where ``os'' is some random access ``ostream.''
In any reasonable implementation, any of the seek member functions use lazy evaluation. That is, when you call any of the seek member functions, the only thing that happens is that some member variable in the stream object changes value. It's not until you actually read or write, something truly happens on disk (or wherever the stream data resides.)

A stream array, for really huge amounts of data

Suppose we have a need to access enormous amounts of simple data, say 10 million floating point numbers. It's not a very good idea to just allocate that much memory, at least not on my machine with a measly 64Mb RAM. It'll not just make this application crawl, but probably the whole system due to excessive paging. Instead, let's use a file to access the data. This makes for slow access, for sure, but nothing else will suffer.
Here's the idea. The array must be possible to use with any data type, including user defined classes. Its usage must resemble that of real arrays as much as possible, but extra functionality that arrays do not have, such as asking for the number of elements in it, is OK. There must be a type, resembling pointers to arrays, that can be used for traversing it. We do not want the size of the array to be part of its type (if you've programmed in Pascal, you know why.) In addition to arrays, we want some measures of safety from stupid mistakes, such as addressing beyond the range of the array, and also for errors that arrays cannot have (disk full, cannot create file, disk corruption, etc.) We also want to say that an array is just a part of a file and not necessarily an entire file. This would allow the user to create several arrays within the same file. To prevent this article from growing way too long, quite a few of the above listed features will be left for next month. The things to cover this month are: An array of built-in fundamental types only, which lacks pointers and is limited to one file per array. We'll also skip error handling for now (you can add it as an exercise, I'll raise some interesting questions along the way,) and add that too next month.
First of all, the array must be a template, so it can be used to store arbitrary types. Since we do not want the size to be part of the type signature, the size is not a template parameter, but a parameter for the constructor. Of course, we cannot have the entire array duplicated in memory (then all the benefits will be lost,) instead we will search for the data on file every time it's needed.
Here's the outline for the class.

template

class FileArray
{
public:
FileArray(const char* name, size_t elements);
// Create a new array and set the size.

FileArray(const char* name);
// Create an array from an existing file, get the
// size from the file.

// use compiler defined destructor.

T operator[](size_t index) const;
??? operator[](size_t index);

size_t size() const;
private:
// don't want these to be used.
FileArray(const FileArray&);
FileArray& operator=(const FileArray&);
...
};
As can be expected, ``operator[]'' can be overloaded, which is handy for providing a familiar syntax. However, already here we see a problem. What's the non-const ``operator[]'' to return? To see why this is a problem, ask yourself what you want ``operator[]'' to do. I want ``operator[]'' to do two things, depending on where it's used; like this:

FileArray x;
...
x[5] = 4;
int y = x[3];
When ``operator[]'' is on the left hand side of an assignment, I want to write data to the file, and if its on the right hand side of an assignment, I want to read data from the file. Ouch. Warning: I've often seen it suggested that the solution is to have the const version read and return a value, and the non-const version write a value. As slick as it would be, it's wrong and it won't work. The const version is called for const array objects, the non-const version for non-const array objects.
Instead what we have to do is to pull a little trick. The trick is, as so often in computer science, to add another level of indirection. This is done by not taking care of the problem in ``operator[],'' but rather let it return a type, which does the job. We create a class template, looking like this:

template
class FileArrayProxy
{
public:
FileArrayProxy& operator=(const T&); // write value
operator T() const; // read a value

// compiler generated destructor

FileArrayProxy&
operator=(const FileArrayProxy& p);

FileArrayProxy(const FileArrayProxy&);
private:
... all other constructors.
FileArray& array;
const size_t index;
};
We have to make sure, of course, that there are member functions in ``FileArray'' that can read and write (and of course, those functions are not the ``operator[],'' since then we'd have an infinite recursion.) All constructors, except for the copy constructors, are made private to prevent users from creating objects of the class whenever they want to. After all, this class is a helper for the array only, and is not intended to ever even be seen. This, however, poses a problem; with the constructors being private, how can ``FileArray::operator[]()'' create and return one? Enter another C++ feature: friends. Friends are a way of breaking encapsulation. What?!?! Yes, what you read is right. Friends break encapsulation, and (this is the real shock) that's a good thing! Friends break encapsulation in a controlled way. We can, in ``FileArrayProxy'' declare ``FileArray'' to be a friend. This means that ``FileArray'' can access everything in ``FileArrayProxy,'' including things that are declared private. Paradoxically, violating encapsulation with friendship strengthens encapsulation when done right. The only alternative here to using friendship, is to make the constructors public, but then anyone can create objects of this class, and that's what we wanted to prevent. Friends are useful for strong encapsulation, but it's important to use it only in situations where two (or more classes) are so tightly bound to one another that they're meaningless on their own. This is the case with ``FileArrayProxy.'' It's meaningless without ``FileArray,'' thus ``FileArray'' is declared a friend of ``FileArrayProxy.'' The declaration then becomes:

template

class FileArrayProxy
{
public:
FileArrayProxy& operator=(const T&); // write a value
operator T() const; // read a value
// compiler generated destructor

FileArrayProxy& // read from p and then write
operator=(const FileArrayProxy& p);

// compiler generated copy contructor
private:
FileArrayProxy(FileArray& fa, size_t n);
// for use by FileArray only.

FileArray& array;
const size_t index;

friend class FileArray;
};
We can now start implementing the array. Some problems still lie ahead, but I'll mention them as we go.

// farray.hpp
#ifndef FARRAY_HPP
#define FARRAY_HPP

#include
#include // size_t

template class FileArrayProxy;
// Forward declaration necessary, since FileArray
// returns the type.

template class FileArray
{
public:
FileArray(const char* name, size_t size); // create
FileArray(const char* name); // use existing array
T operator[](size_t size) const;
FileArrayProxy operator[](size_t size);
size_t size() const;
private:
FileArray(const FileArray&); // illegal
FileArray& operator=(const FileArray&);

// for use by FileArrayProxy
T readElement(size_t index) const;
void storeElement(size_t index, const T&);

fstream stream;
size_t max_size;

friend class FileArrayProxy;
};
The functions for reading and writing are made private members of the array, since they're not for anyone to use. Again, we need to make use of friendship to grant ``FileArrayProxy'' the right to access them. Let's define them right away

template
T FileArray::readElement(size_t index) const
{
T t;
stream.seekg(sizeof(max_size)+index*sizeof(T));
// what if seek fails?

stream.read((char*)&t, sizeof(t));
// what if read fails?

return t;
}
All of a sudden, we face an unexpected problem. The above code won't compile. The member function is declared ``const'', and as such, all member variables are ``const'', and neither ``seekg'' nor ``read'' are allowed on constant streams. The problem is one of differing between logical constness and bitwise constness. This member function is logically ``const'', as it does not alter the array in any way. However, it is not bitwise const; the stream member changes. C++ cannot understand logical constness, only bitwise constness. If you have a modern compiler, the solution is very simple; you declare ``stream'' to be ``mutable fstream stream;'' in the class definition. I, however, have a very old compiler, so I have to find a different solution. This solution is, yet again, one of adding another level of indirection. I can have a pointer to an ``fstream.'' When in a ``const'' member function, the pointer is also ``const'', but not what it points to (there's a difference between a constant pointer, and a pointer to a constant.) The only reasonable way to achieve this is to store the stream object on the heap, and in doing this I introduce a possible danger; what if I forget to delete the pointer? Sure, I'll delete it in the destructor, but what if an exception is thrown already in the constructor, then the destructor will never execute (since no object has been created that must be destroyed.) Do you remember the ``thing to think of until this month?'' The clues were, destructor, pointer and delete. Thought of anything? What about this extremely simple class template?

template
class ptr
{
public:
ptr(T* pt);
~ptr();

T& operator*() const;
private:
ptr(const ptr&); // we don't want copying
ptr& operator=(const ptr&); // nor assignment

T* p;
};

template
ptr::ptr(T* pt)
: p(pt)
{
}

template
ptr::~ptr()
{
delete p;
}

template
T& ptr::operator*() const
{
return *p;
}
This is probably the simplest possible of the family known as ``smart pointers.'' I'll probably devote a whole article exclusively for these some time. Whenever an object of this type is destroyed, whatever it points to is deleted. The only thing we have to keep in mind when using it, is to make sure that whatever we feed it is allocated on heap (and is not an array) so it can be deleted with operator delete. This solves our problem nicely. When this thing is a constant, the thing pointed to still isn't a constant (look at the return type for ``operator*,'' it's a ``T&,'' not a ``const T&.'') So, instead of using an ``fstream'' member variable called ``stream,'' let's use a ``ptr'' member named ``pstream.'' With this change, ``readElement'' must be slightly rewritten:

template
T FileArray::readElement(size_t index) const
{
(*pstream).seekg(sizeof(max_size)+index*sizeof(T));
// what if seek fails?

T t;
(*pstream).read((char*)&t, sizeof(t));
// what if read fails?

return t;
}
I bet the change wasn't too horrifying.

template
void FileArray::storeElement(size_t index,
const T& elem)
{
(*pstream).seekp(sizeof(max_size)+index*sizeof(T),
ios::beg);
// what if seek fails?

(*pstream).write((char*)&elem, sizeof(elem));
// what if write failed?
}
Now for the constructors:

template
FileArray::FileArray(const char* name, size_t size)
: pstream(new fstream(name, ios::in|ios::out|ios::binary)),
max_size(size)
{
// what if the file could not be opened?

// store the size on file.
(*pstream).write((const char*)&max_size,
sizeof(max_size));
// what if write failed?

// We want to write a value (any value) at the end
// to make sure there is enough space on disk.

T t;
storeElement(max_size-1,t);
// What if this fails?
}

template
FileArray::FileArray(const char* name)
: pstream(new fstream(name, ios::in|ios::out|ios::binary)),
max_size(0)
{
// get the size from file.
(*pstream).read((char*)&max_size,
sizeof(max_size));
// what if read fails or max_size == 0?
// How do we know the file is even an array?
}
The access members:

template
T FileArray::operator[](size_t size) const
{
// what if size >= max_size?
return readElement(size);
// What if read failed because of a disk error?
}

template
FileArrayProxy FileArray::operator[](size_t size)
{
// what if size >= max_size?
return FileArrayProxy(*this , size);
}
Well, this wasn't too much work, but then, as can be seen by the comments, there's absolutely no error handling here. I've left out the ``size'' member function, since its implementation is trivial. Next in line is ``FileArrayProxy.''

template
class FileArrayProxy
{
public:
// copy constructor generated by compiler
operator T() const;
FileArrayProxy& operator=(const T& t);
FileArrayProxy&
operator=(const FileArrayProxy& p);
// read from one array and write to the other.
private:
FileArrayProxy(FileArray& f, size_t i);

size_t index;
FileArray& fa;

friend class FileArray;
};
The copy constructor is needed, since the return value must be copied (return from ``FileArray::operator[],'') and it must be public for this to succeed. The one that the compiler generates for us, which just copies all member variables, will do just fine. The compiler doesn't generate a default constructor (one which accepts no parameters,) since we have explicitly defined a contructor. The assignment operator is necessary, however. Sure, the compiler will try to generate one for us if we don't, but it will fail, since references (``fa'') can't be rebound. Note, however, that if we instead of a reference had used a pointer, it would succeed, but the result would *NOT* be what we want. What it would do is to copy the member variables, but what we want to do is to read data from one array and write it to another. Now for the implementation:

template
FileArrayProxy::FileArrayProxy(FileArray& f,
size_t i)
: index(i),
fa(f)
{
}

template
FileArrayProxy::operator T() const
{
return fa.readElement(index);
}

template
FileArrayProxy&
FileArrayProxy::operator=(const T& t)
{
fa.storeElement(index,t);
return *this;
}

template
FileArrayProxy& FileArrayProxy::operator=(
const FileArrayProxy& p
)
{
fa.storeElement(index,p);
return *this;
}

#endif // FARRAY_HPP
That was it. Can you see what happens with the proxy? Let's analyze a small code snippet:

1 FileArray arr("file",10);
2 arr[2]=0;
3 int x=arr[2];
4 arr[0]=arr[2];
On line two, ``arr.operator[](2)'' is called, which creates a ``FileArrayProxy'' from ``arr'' with the index 2. The object, which is a temporary and does not have a name, has as its member ``fa'' a reference to ``arr'', and as its member ``index'' the value 2. On this temporary object, ``operator=(int)'' is executed. This operator in turn calls ``fa.storeElement(index, t),'' where ``index'' is still 2 and the value of ``t'' is 0. Thus, ``arr[2]=0'' ends up as ``arr.storeElement(2,0)''. On line 3, a similar proxy is created through the call to ``operator[](2)'' This time, however, the ``operator int() const'' is called. This member function in turn calls ``fa.readElement(2)'' and returns its value, thus ``int x=arr[2]'' translates to ``int x=arr.readElement(2).'' On line 4, finally, ``arr[0]=arr[2]'' creates two temporary proxies, one referring to index 0, and one to index 2. The assignment operator is called, which in turn calls ``fa.storeElement(0,p)'', where p is the temporary proxy referring to element 2. Since ``storeElement'' wants an ``int,'' ``p.operator int() const'' is called, which calls ``arr.readElement(2).'' In other words ``arr[0] = arr[2]'' generates the code ``arr.storeElement(0, arr.readElement(2)).'' As you can see, the proxies don't add any new functionality, they're just syntactic sugar, albeit very useful. With them we can treat our file arrays very much like any kind of array. There's one thing we cannot do:

int* p = &arr[2];
int& x = arr[3];
*p=2;
x=5;
With ordinary arrays, the above would be legal and have well defined semantics, assigning arr[2] the value 2, and arr[3] the value 5. With our file array we cannot do this, but unfortunately the compiler does not prevent it (a decent compiler will warn that we're binding a constant or pointer to a temporary.) We'll mend that hole next month (think about how) and also add iterators, which will allow us to use the file arrays almost exactly like real ones.

In memory data formatting

One often faced problem is that of converting strings representing some data to that data, or vice versa. With the aid of ``istrstream'', ``ostrstream'' and ``strstream'', this is easy. For example, say we have a string containing digits, and want those digits as an integer, the thing to do is to create an ``istrstream'' object from the string. An example will explain:

char* s = "23542";
istrstream is(s);
int x;
is >> x;
After executing this snippet, ``x'' will have the value 23542. ``istrstream'' isn't much more exciting than that. ``ostrstream'' on the other hand is more exciting. There are two alternative uses for ``ostrstream.'' One where you have an array you want to store data in, and one where you want the ``ostrstream'' to create it for you, as needed (usually because you have no idea what size the buffer must have.) The former usage is like this:

char buffer[24];
ostrstream os(buffer, sizeof(buffer));
double x=23.34;
os << "x=" << x << ends;
The variable ``buffer'' will contain the string ``x=23.34'' after this snippet. The stream manipulator ``ends'' zero terminates the buffer. Zero termination is not done by default, since the stream cannot know where to put it, and besides you might not always want it. The other variant, where you don't know how large a buffer you will need, is generally more useful (I think.)

ostrstream os;
double x=23.34, y=34.45;
os << x << '*' << y << '=' << x*y << ends;
const char* p = os.str();
const size_t length=os.pcount();

// work with p and length.
os.freeze(0); // release the memory.
I think the example pretty much shows what this kind of usage does. The member function ``str'' returns a pointer to the internal buffer (which is then frozen, that is, the stream guarantees that it will not deallocate the buffer, nor overwrite it. Attempts to alter the stream while frozen, will fail.) ``pcount'' returns the number of characters stored in the buffer. Last ``freeze'' can either freeze the buffer, or ``unfreeze'' it. The latter is done by giving it a parameter with the value 0. I find this interface to be unfortunate. It's so easy to forget to release the buffer (by simply forgetting to call ``os.freeze(0)'') and that leads to a memory leak. ``strstream'' finally, is just like ``fstream'' the combined read/write stream.
The string streams can be found in the header (or for some compilers .)

Standards update

With the C++ standard, a lot of things have changed regarding streams. As I mentioned already last month, the headers are actually and , and the names std::istream, std::ostream, etc. The streams are templatized too, which both makes life easier and not. The underlying type for std::ostream is:

std::basic_ostream
class traits=std::char_traits >
``charT'' is the basic type for the stream. For ``ostream'' this is ``char'' (ostream is actually a typedef.) There's another typedef, ``std::wostream'', where the underlying type is ``wchar_t'', which on most systems probably will be 16-bit Unicode. The class template ``char_traits'' is a traits class which holds the type used for EOF, the value of EOF, and some other house keeping things. Why the standard has removed the file stream open modes ios::create and ios::nocreate is beyond me, as they're extremely useful.
Casting is ugly, and it's hard to see in large code blocks. There are four new cast operators, that are highly visible, in the standard. They're (in approximate order of increasing danger,) dynamic_cast, static_cast, const_cast and reinterpret_cast. In the binary streaming seen in this article, reinterpret_cast would be used, as a way of saying, ``Yeah, I know I'm violating type safety, but hey, I know what I'm doing, OK?'' The good thing about it is that it's so visible that anyone doubting it can easily spot the dangerous lines and have a careful look. The syntax is: os.write(reinterpret_cast(&variable), sizeof(variable));
Finally, the generally useful strstreams has been replaced by ``std::istringstream'', ``std::ostringstream'' and ``std::stringstream'' (plus wide variants, std::wistringstream, etc.) defined in the header . They do not operate on ``char*'', but on strings (there is a string class, or again, rather a string class template, where the most important template parameter is the underlying character.) ``std::ostringstream'' does not suffer from the freeze problem that ``ostrstream'' does.

Recap

The news this month were:
  • streams dealing with files, or in-memory formatting, are used just the same way as the familiar ``cout'' and ``cin,'' which saves both learning and coding (the already written ``operator<<'' and ``operator>>'' can be used for all kinds of streams already.)
  • streams can be used for binary, unformatted I/O too. This normally doesn't make sense for ``cout'' and ``cin'' or in-memory formatting (as the name implies,) but it's often useful when dealing with files.
  • It is possible to move around in streams, at least file streams and in-memory formatting streams. It's generally not possible to move around in ``cin'' and ``cout.''
  • proxy classes can be used to differentiate read and write operations for ``operator[]'' (the construction can of course be used elsewhere too, but it's most useful in this case.)
  • friends break encapsulation in a way that, when done right, strengthens encapsulation.
  • there's a difference between logical const and bitwise const, but the C++ compiler doesn't know and always assumes bitwise const.
  • truly simple smart pointers can save some memory management house keeping, and also be used as a work around for compilers lacking ``mutable'' (i.e. the way of declaring a variable as non-const for const members, in other words, how to differentiate between logical and bitwise const.)
  • streams can be used also for in-memory formatting of data.

Exercises

  • Improve the file array such that it accepts a ``stream&'' instead of a file name, and allows for several arrays in the same file.
  • Improve the proxy such that ``int& x=arr[2]'' and ``int* p=&arr[1]'' becomes illegal.
  • Add a constructor to the array that accepts only a ``size_t'' describing the size of the array, which creates a temporary file and removes it in its destructor.
  • What happens if we instantiate ``FileArray'' with a user defined type? Is it always desireable? If not, what is desireable? If you cannot define what's desireable, how can instantiation with user defined types be banned?
  • How can you, using the stream interface, calculate the size of a file?