Parse Tree

Virtual member functions, virtual destructors, pure virtual functions, protected data members.

I will demonstrate the use of polymorphism in an example of a data structure—the arithmetic tree. An arithmetic expression can be converted into a tree structure whose nodes are arithmetic operators and leaf nodes are numbers. Figure 2-3 shows the example of a tree that corresponds to the expression 2 * (3 + 4) + 5. Analyzing it from the root towards the leaves we first encounter the plus node, whose children are the two terms that are to be added. The left child is a product of two factors. The left factor is number 2 and the right factor is the sum of 3 and 4. The right child of the top level plus node is number 5. Notice that the tree representation doesn’t require any parentheses or the knowledge of operator precedence. It uniquely describes the calculation to be performed.

Figure 2-3 The arithmetic tree corresponding to the expression 2 * (3 + 4) + 5.

We will represent the nodes of the arithmetic tree as objects inheriting from a single class Node. The direct descendants of the Node are NumNode representing a number and BinNode representing a binary operator. For simplicity, we will restrict ourselves to only two classes derived from BinNode, the AddNode and the MultNode. Figure 2-4 shows the class hierarchy I have just described. Abstract classes are the classes that cannot be instantiated, they only serve as parents for other classes. I’ll explain this term in a moment

Figure 2-4 The class hierarchy of nodes.

What are the operations we would like to perform on a node? We would like to be able to calculate its value and, at some point, destroy it. The Calc method returns a double as the result of the calculation of the node’s value. Of course, for some nodes the calculation may involve the recursive calculations of its children. The method is const since it doesn’t change the node itself. Since each type of node has to provide its own implementation of the Calc method, we make this function virtual. However, there is no "default" implementation of Calc for an arbitrary Node. The function that has no implementation (inherited or otheriwise) is called pure virtual. That’s the meaning of = 0 in the declaration of Calc.

A class that has one or more pure virtual functions is called an abstract class and it cannot be instantiated (no object of this class can be created). Only classes that are derived from it, and which provide their own implementations of all the pure virtual functions, can be instantiated. Notice that our sample arithmetic tree has instances of AddNodes, MultNodes and NumNodes, but no instances of Nodes or BinNodes.

A rule of thumb is that, if a class has a virtual function, it probably needs a virtual destructor as well--and once we decide to pay the overhead of a vtable pointer, all subsequent virtual functions don’t increase the size of the object. So, in such a case, adding a virtual destructor doesn't add any significant overhead.

In our case we can anticipate that some of the descendant nodes will have to destroy their children in their destructors, so we really need a virtual destructor. A destructor cannot be made pure virtual, because it is actually called by the destructors of the derived classes. That's why I gave it an empty body. (Even though I made it inline, the compiler will create a function body for it, because it needs to stick a pointer to it into the virtual table).

source

class Node
{
public:
    virtual ~Node () {}
    virtual double Calc () const = 0;
};

NumNode stores a double value that is initialized in its constructor. It also overrides the Calc virtual function. In this case, Calc simply returns the value stored in the node. class NumNode: public Node { public: NumNode (double num) : _num (num ) {} double Calc () const; private: const double _num; }; double NumNode::Calc () const { cout << "Numeric node " << _num << endl; return _num; } BinNode has two children that are pointers to nodes. They are initialized in the constructor and deleted in the destructor—this is why I could make them const pointers (but not pointers to const, since I have to call the non-const method on them—the destructor). The Calc method is still pure virtual, inherited from Node, only the descendants of BinNode will know how to implement it. class BinNode: public Node { public: BinNode (Node * pLeft, Node * pRight) : _pLeft (pLeft), _pRight (pRight) {} ~BinNode (); protected: Node * const _pLeft; Node * const _pRight; }; BinNode::~BinNode () { delete _pLeft; delete _pRight; } This is where you first see the advantage of polymorphism. A binary node can have children which are arbitrary nodes. Each of them can be a number node, an addition node, or a multiplication node. There are nine possible combinations of children—it would be silly to make separate classes for each of them (consider, for instance, AddNodeWithLeftMultNodeAndRightNumberNode). We had no choice but to accept and store pointers to children as more general pointers to Nodes. Yet, when we call destructors through them, we need to call different functions to destroy different nodes. For instance, AddNode has a different destructor than a NumNode (which has an empty one), and so on. This is why we had to make the destructors of Nodes virtual. Notice that the two data members of BinNode are not private—they are protected. This qualification is slightly weaker than private. A private data member or method cannot be accessed from any code outside of the implementation of the given class (or its friends). Not even from the code of the derived class. Had we made _pLeft and _pRight private, we’d have to provide public methods to set and get them. That would be tantamount to exposing them to everybody. By making them protected we are letting classes derived from BinNode manipulate them, but, at the same time, bar anybody else from doing so. Table 1 Access specifier Who can access such member? public anybody protected the class itself, its friends and derived classes private only the class itself and its friends The class AddNode is derived from BinNode. class AddNode: public BinNode { public: AddNode (Node * pLeft, Node * pRight) : BinNode (pLeft, pRight) {} double Calc () const; }; It provides its own implementation of Calc. This is where you see the advantages of polymorphism again. We let the child nodes calculate themselves. Since the Calc method is virtual, they will do the right thing based on their actual class, and not on the class of the pointer (Node *). The two results of calling Calc are added and the sum returned. double AddNode::Calc () const { cout << "Adding\n"; return _pLeft->Calc () + _pRight->Calc (); } Notice how the method of AddNode directly accesses its parent’s data members _pLeft and _pRight. Were they declared private, such access would be flagged as an error by the compiler. For completeness, here’s the implementation of the MultNode and a simple test program. class MultNode: public BinNode { public: MultNode (Node * pLeft, Node * pRight) : BinNode (pLeft, pRight) {} double Calc () const; }; double MultNode::Calc () const { cout << "Multiplying\n"; return _pLeft->Calc () * _pRight->Calc (); } int main () { // ( 20.0 + (-10.0) ) * 0.1 Node * pNode1 = new NumNode (20.0); Node * pNode2 = new NumNode (-10.0); Node * pNode3 = new AddNode (pNode1, pNode2); Node * pNode4 = new NumNode (0.1); Node * pNode5 = new MultNode (pNode3, pNode4); cout << "Calculating the tree\n"; // tell the root to calculate itself double x = pNode5->Calc (); cout << x << endl; delete pNode5; // and all children } Do you think you can write more efficient code by not using polymorphism? Think twice! If you're still not convinced, go on a little sidetrip into the alternative universe of C.