C AVL Tree “Generic Package”

 

Version 1.5

Author:  Walt Karas

 

1  Introduction

 

This document explains how to use a C AVL Tree “generic package”.  The C macro processor is used to make the package code independent of the type of data to be stored in the tree, and the memory management scheme for tree nodes.  Adelson-Velskii and Landis Balanced Binary Search Trees (or AVL Trees) are described in many good textbooks on fundamental data structures.  The best web page I’ve been able to find on the topic is “A Visual Basic AVL Tree Container Class”.  I have also written a C++ version of this data structure.

 

This document, as well as the source code it describes, is in the public domain.

 

To avoid possible confusion about the terms that I use in this document (and in the source comments), here is a summary description of AVL Trees.  An AVL Tree is a set of nodes (or elements).  Each node is associated with a unique key value.  The key values can be ordered from least to greatest.  Each node in the tree may (or may not) have a less child node, and it may (or may not) have a greater child node.  If node A is a child of node B, then B is the parent of A.  If A is the less child of B, A’s key must be less than B’s key.  Similarly, if A is the greater child of B, A’s key must be greater than B’s key.  (Because of the way that binary search trees are typically diagrammed, the less child is commonly called the left child, and the greater child is commonly called the right child.)  All nodes in a tree have exactly one parent, except for the root node, which has no parent.  Node A is a descendant of node C if C is A’s parent, or if A’s parent is a descendant of C.  If a node is not the root of the entire tree, it is the root of a subtree consisting of the node and all its descendants.  The less subtree of a node is the subtree whose root is the less child of the node.  The greater subtree of a node is the subtree whose root is the greater child of the node.  The depth of a node is one more than the depth of its parent.  The depth of the root node is 1.  The depth of a tree is the maximum node depth.  The balance factor of a node is the depth of its greater subtree minus the depth of its less subtree, with non-existent subtrees being considered to have a depth of 0.  In an AVL tree, the balance factor of any node can only be -1, 0 or 1.

 

There are several open-source C and C++ implementations of AVL Trees available (see “Hot Links”, then “Data Structures” at C/C++ User’s Group Page ).  But as far as I know, this is the only one that manipulates the nodes of the tree using abstract “handles” instead of concrete pointers.  If all the nodes are in a single array, you can use node indexes as handles instead of node pointers.  This approach makes it possible to compress the size of the nodes if memory is tight.  Using indexes as handles (instead of pointers) can make tree persistence as simple as writing the node array out with a single disk write, and reading it back in with a single disk read.  The package also allows for a tree to be in secondary storage, with nodes being “paged” in and out of memory.

 

The package code makes no use of recursion.  The implementation is stack-friendly in general, except perhaps for the iter structure.  Variables of type iter contain an array of handles whose dimension is the maximum tree depth minus one.

 

Since key comparisons can potentially be complex, the code avoids repeated comparisons of the same pair of node key values.

 

2  Source Files

 

The interface to an instantiation of the package is generated by the header file cavl_if.h .

 

The implementation of an instantiation of the package is generated by the header file cavl_impl.h .

 

The package header files reserve the right to change the definition of any macro name beginning with the prefix L__ (upper-case L followed by two underscores).

 

I have written two examples to show how to instantiate and use this package.  Example 1 shows how to create an instantiation used in multiple compilation units.  The interface header to the instantiation is in cavl_ex1.h .  The implementation of the instantiation is in cavl_ex1i.c .  Usage of the instantiation is shown in cavl_ex1u.c .  Example 2 shows how to create an instantiation used in a single compilation unit.  cavl_ex2.c contains the code for example 2.

 

A test suite for the package is in the (C++) file test_cavl.cpp .

 

All of this code compiles with a contemporary version of GNU GCC, and with Visual C++ .NET.

 

All input macros are required, unless explicitly specified as being optional.

 

3  Inputs to Both cavl_if.h and cavl_impl.h

 

These macros must be defined prior to the inclusion of cavl_if.h or cavl_impl.h in a compilation unit.

 

3.1  AVL_UNIQUE

 

A macro with one parameter.  The parameter must be a valid C identifier.  The macro should use token concatenation to alter the identifier passed as a parameter, and make it unique for the package instantiation.  The alteration defined by this macro is applied to the identifiers for all output definitions/declarations generated by cavl_if.h, except those that begin with the prefix AVL_ or avl_ .  If, for example, the macro is defined as:

 

#define AVL_UNIQUE(ID) xyz_ ## ID

 

then the identifier for the insert function will be altered to xyz_insert .

 

This macro is optional.  If it is undefined, there is no alteration of any identifiers.

 

3.2  AVL_PRIVATE

 

If this optional macro is defined, all of the functions declared by cavl_if.h will be static functions.  (If it’s defined, it doesn’t matter what it’s defined to be.)

 

3.3  AVL_HANDLE

 

A parameterless macro that expands to the type specifier for the handle type.  By “type specifier” I mean that a statement of the form:

 

AVL_HANDLE identifier;

 

must expand to a valid definition of a handle named “identifier”.  Each node has to be associated with a node handle, which is a unique value of the handle type.  Pointers and integral types can be handles.  Handles must be passable as value parameters, assignable, and comparable for equality.

 

3.4  AVL_KEY

 

A parameterless macro that expands to the type specifier for the key type.  Each node has to be associated with a key, which is a unique value of the key type.  The difference between a key and a handle is that a node can be conveniently “looked up” using its handle, but it can’t be conveniently looked up using its key.  In fact, the whole point of this package is to make it convenient to look up a node given its key.  Key values must be passable by value as parameters.

 

3.5  AVL_MAX_DEPTH

 

A parameterless macro that expands to an integral expression, which gives the maximum tree depth.  You almost certainly want to choose the maximum depth based on the maximum number of nodes that could possibly be in a tree at any given time.  To do this, let the maximum depth be M such that:

 

MN(M)

<=

maximum number of nodes

<

MN(M + 1)

 

where MN(d) means the minimum number of nodes in an AVL Tree of depth d.  Here is a table of MN(d) values for d from 2 to 45.

 

d

MN(d)

2

2

3

4

4

7

5

12

6

20

7

33

8

54

9

88

10

143

11

232

12

376

13

609

14

986

15

1,596

16

2,583

17

4,180

18

6,764

19

10,945

20

17,710

21

28,656

22

46,367

23

75,024

24

121,392

25

196,417

26

317,810

27

514,228

28

832,039

29

1,346,268

30

2,178,308

31

3,524,577

32

5,702,886

33

9,227,464

34

14,930,351

35

24,157,816

36

39,088,168

37

63,245,985

38

102,334,154

39

165,580,140

40

267,914,295

41

433,494,436

42

701,408,732

43

1,134,903,169

44

1,836,311,902

45

2,971,215,072

 

(In general, MN(d) = MN(d – 1) + MN(d – 2) + 1 .)

 

If, in a particular instantiation, the maximum number of nodes in a tree is 1,000,000, the maximum depth should be 28. You pick 28 because MN(28) is 832,039, which is less than or equal to 1,000,000, and MN(29) is 1,346,268, which is strictly greater than 1,000,000.

 

If you insert a node that would cause the tree to grow to a depth greater than the maximum you gave, the results are undefined.

 

Each increase of 1 in the value of  MAX_DEPTH increases the size of a variable of type iter (defined below) by sizeof(handle).  The only other use of MAX_DEPTH is as the size of bit arrays used at various places in the code.  The number of bytes in a bit array is the size rounded up to a multiple of the number of bits in a long, and divided by the number of bits in a byte.  All this is a roundabout way of saying that, if you don’t use iter structures, you can guiltlessly add a big safety margin to the value of MAX_DEPTH.

 

3.6  AVL_INSIDE_STRUCT

 

This parameterless macro is optional.  If defined, it should expand to one or more field definitions to be included in the avl structure generated by cavl_if.h .

 

The reason for this macro is to allow you place fields in the structure that are needed by your SET / GET macros.  If you are using array indexes as handles, this macro could be used to place the array of nodes or a pointer to the array of nodes into the avl structure.  Defining this macro causes an extra parameter to be passed to an internal function, so using it when you don’t need to results in slightly-less-than-optimal code.

 

3.7  AVL_BUILD_ITER_TYPE

 

This optional parameterless macro expands to the type of the iterators passed to the build function.  You only have to define this macro if you use the build function.

 

3.8  AVL_SIZE

 

An optional parameterless macro that expands to the type specifier for the size type, which must be char, short, int or long, signed or unsigned.  A variable of this type must be large enough the hold the maximum possible number of nodes in the tree.  This macro is optional.  If it is not defined, unsigned long is used by default.  This type is only used by the build function, so don’t bother to define it if you don’t use the build function.

 

4  Inputs to cavl_impl.h Only

 

These macros must be defined prior to the inclusion of cavl_impl.h in a compilation unit.  The definitions of these macros may use the symbol L__tree.  L__tree is a pointer to the avl structure for the tree, or a (parameterless) macro that expands to an expression yielding a pointer to the avl structure for the tree.

 

4.1  AVL_GET_LESS, AVL_GET_GREATER

 

These macros take two parameters.  They must expand to expressions that give the handle of the less/greater child of the node whose handle is given as the first parameter.  If the second parameter is an expression whose result is non-zero, and the child node is in secondary storage, it has to be read into memory.  Otherwise, the child node does not have to be read into memory.  Ignore the second parameter in your implementation of these macros if your instantiation makes no use of secondary storage.

 

4.2  AVL_SET_LESS, AVL_SET_GREATER

 

These macros take two parameters, both of which are node handles.  The second node is set as the less/greater child of the first node.  They must expand to complete statements.  (That is to say, an invocation of the macro shouldn’t require a semicolon after it.)

 

4.3  AVL_GET_BALANCE_FACTOR

 

This macro takes one parameter, a node handle.  It must expand to an expression returning the balance factor of the node.

 

4.4  AVL_SET_BALANCE_FACTOR

 

This macro takes two parameters.  The first parameter is a node handle, and the second parameter is a balance factor (-1, 0 or 1).  The balance factor given is set as the balance factor of the node.  This macro must expand to a complete statement.

 

4.5  AVL_COMPARE_KEY_NODE

 

This macro takes two parameters.  The first parameter is a key value, and the second parameter is a node handle.  The macro compares the key with the key of a node.  It must expand to an expression that returns a negative (integral) value if the key is less than the node’s key.  The macro must return zero if the key is the same as the node’s key.  A positive value must be returned if the key is greater than the node’s key.

 

4.6  AVL_COMPARE_NODE_NODE

 

This macro takes two parameters.  Both parameters must be node handles.  The macro compares the key of the first node to the key of the second node.  It must expand to an expression that returns a negative (integral) value if the first node’s key is less than the second node’s key, zero if the two nodes have the same key value, or a positive value if the first node’s key is greater than the second node’s key.

 

4.7  AVL_NULL

 

This parameterless macro must expand to an expression yielding the null handle value.  The null value is a particular invalid handle value.

 

4.8  AVL_READ_ERRORS_HAPPEN

 

If this optional macro is defined, it indicates that nodes are stored in secondary storage, and that errors can occur when reading nodes from secondary into primary storage.

 

4.9  AVL_READ_ERROR

 

This parameterless macro expands to an expression returning an integral value.  If the integral value is non-zero, this indicates an error has occurred while reading a node from secondary into primary storage.  It is not necessary to define this macro if AVL_READ_ERRORS_HAPPEN is undefined.

 

4.10  AVL_BUILD_ITER_VAL

 

This macro takes one parameter, the name (lvalue) of a variable of type AVL_BUILD_ITER_TYPE.  The macro must expand to an expression that returns the handle of the node that the iterator (passed as the parameter) refers to.  You should not define this macro if AVL_BUILD_ITER_TYPE is not defined.

 

4.11  AVL_BUILD_ITER_INCR

 

This macro takes one parameter, the name (lvalue) of a variable of type AVL_BUILD_ITER_TYPE.  The macro must expand to a complete statement that increments the iterator (passed as the parameter) so that it refers to the next node in the sequence.  You should not define this macro if AVL_BUILD_ITER_TYPE is not defined.

 

4.12  AVL_IMPL_MASK

 

This optional parameterless macro should expand to a mask that controls which function bodies are generated by the header file cavl_impl.h .  It should be created using the function bit masks, which have the AVL_IMPL_ prefix, defined by cavl_if.h.  If this macro is undefined, it defaults to AVL_IMPL_ALL (all function bodies generated).

 

5  Outputs from cavl_if.h

 

These data types and function prototypes are generated by the header file cavl_if.h .  cavl_impl.h generates the bodies of the functions whose prototypes are generated by cavl_if.h .  cavl_impl.h should be included (at least) once for each distinct package instantiation, in (at least) one compilation unit to be included in the executable.

 

If the macro AVL_UNIQUE is defined, the name of each function and data type (whose name does not begin with avl_ ) will be altered by AVL_UNIQUE .

 

5.1  Structure avl

 

This structure is defined as:

 

typedef struct

  {

    AVL_INSIDE_STRUCT

 

    AVL_HANDLE root;

  }

avl;

 

(ignoring any alteration of the identifier avl by AVL_UNIQUE).  An individual variable of this type represents an individual AVL tree.  root is the handle of the root node of the tree.  If the tree is empty, root contains the null value.  (The identifier root is not altered by AVL_UNIQUE.)

 

5.2  Function init

 

void init(avl *tree);

 

Puts a tree into an initial, empty state.  It can be used to initialize a tree, or to remove all nodes from a previously initialize tree.  This function simply assigns the null value to the root field in the avl structure.  You can initialize the root field to null using a static initializer if you prefer.

 

5.2  Function insert

 

AVL_HANDLE insert(avl *tree, AVL_HANDLE h);

 

Insert the node with the given handle into the tree.  The node must be associated with a key value.  The initial values of the node’s less/greater child handles and its balance factor are don’t-cares.  If successful, this function returns the handle of the inserted node.  If the node to insert has the same key value as a node that’s already in the tree, the insertion is not performed, and the handle of the node already in the tree is returned.  Returns the null value if there is an error reading secondary storage.  Calling this function invalidates all currently-existing iter structures (that are iterating over this tree).

 

5.3  Enum avl_search_type

 

Defined as:

 

typedef enum

  {

    AVL_EQUAL = 1,

    AVL_LESS = 2,

    AVL_GREATER = 4,

    AVL_LESS_EQUAL = AVL_EQUAL | AVL_LESS,

    AVL_GREATER_EQUAL = AVL_EQUAL | AVL_GREATER

  }

avl_search_type;

 

5.4  Function search

 

AVL_HANDLE search(avl *tree, AVL_KEY k, avl_search_type st);

 

Searches for a particular node in the tree, returning its handle if the node is found, and the null value if the node is not found.  The node to search for depends on the value of the st parameter.

 

Value of st

Node to search for

EQUAL

Node whose key is equal to the key k.

LESS

Node whose key is the maximum of the keys of all the nodes with keys less than the key k.

GREATER

Node whose key is the minimum of the keys of all the nodes with keys greater than the key k.

LESS_EQUAL

Node whose key is the maximum of the keys of all the nodes with keys less than or equal to the key k.

GREATER_EQUAL

Node whose key is the minimum of the keys of all the nodes with keys greater than or equal to the key k.

 

5.5  Function search_least

 

AVL_HANDLE search_least(avl *tree);

 

Returns the handle of the node whose key is the minimum of the keys of all the nodes in the tree.  Returns the null value if the tree is empty or an error occurs reading from secondary storage.

 

5.6  Function search_greatest

 

AVL_HANDLE search_greatest(avl *tree);

 

Returns the handle of the node whose key is the maximum of the keys of all the nodes in the tree.  Returns the null value if the tree is empty or an error occurs reading from secondary storage.

 

5.7  Function remove

 

AVL_HANDLE remove(avl *tree, AVL_KEY k);

 

Removes the node with the given k from the tree.  Returns the handle of the node removed.  Returns the null value if there is no node in the tree with the given key, or an error occurs reading from secondary storage.  Calling this function invalidates all currently-existing iter structures (that are iterating over this tree).

 

5.8  Function is_empty

 

int is_empty(avl *tree);

 

Returns non-zero if the tree is empty.

 

5.9  Function build

 

int build(avl *tree, AVL_BUILD_ITER_TYPE p, size num_nodes);

 

Builds a tree from a sequence of nodes that are sorted ascendingly by their key values.  The number of nodes in the sequence is given by num_nodes.  p is a forward iterator that initially refers to the first node in the sequence.  Any nodes in the tree (prior to calling this function) are purged.  The iterator will be incremented one last time when it refers to the last node in the sequence.  build returns zero if a read error occurs while trying to build the tree, non-zero otherwise.  The time complexity of this function is O(n  x  log n), but it is more efficient than inserting the nodes in the sequence one at a time, and the resulting tree will generally have better balance.

 

5.10  Structure iter

 

This structure (typedef) contains the state of a bi-directional iteration through an AVL tree.  Once the iteration is started the tree_ field in the structure contains a pointer to the avl structure of the tree being iterated over.  If you use the value of any other field in the iter structure, or you change tree_ or any other field in the structure, you’re on your own.  There can be an unlimited number of iterations over a tree in progress simultaneously.

 

A useful characteristic of the AVL tree data structure is the ability to do a final, destroying iteration over the nodes.  Iteration must be in a single direction only.  Once the iterator passes a node, the contents of the node can be discarded or overwritten.  See the clear_env function Example 1 to see how a destructive iteration is done.

 

5.11  Function start_iter

 

void start_iter(avl *tree, iter *it, AVL_KEY k, avl_search_type st);

 

Causes the iterator to refer to a particular node in the tree that is specified as the first parameter.  If the particular node cannot be found in the tree, or if a read error occurs, the iterator is put into the null state.  The particular node to refer to is determined by the st parameter.

 

Value of st

Node to refer to

EQUAL

Node whose key is equal to the key k.

LESS

Node whose key is the maximum of the keys of all the nodes with keys less than the key k.

GREATER

Node whose key is the minimum of the keys of all the nodes with keys greater than the key k.

LESS_EQUAL

Node whose key is the maximum of the keys of all the nodes with keys less than or equal to the key k.

GREATER_EQUAL

Node whose key is the minimum of the keys of all the nodes with keys greater than or equal to the key k.

 

5.12  Function start_iter_least

 

void start_iter_least(avl *tree, iter *it);

 

Cause the iterator to refer to the node with the minimum key in the given tree.  Puts the iterator into the null state if the tree is empty or a read error occurs.

 

5.13  Function start_iter_greatest

 

void start_iter_greatest(avl *tree, iter *it);

 

Cause the iterator to refer to the node with the maximum key in the given tree.  Puts the iterator into the null state if the tree is empty or a read error occurs.

 

5.14  Function get_iter

 

AVL_HANDLE get_iter(iter *it);

 

Returns the handle of the node that the iterator refers to.  Returns the null value if the iterator is in the null state.  The return value is undefined if the iter structure was not passed to a start function.

 

5.15  Function incr_iter

 

void incr_iter(iter *it);

 

Causes the iterator to refer to the node whose key is the next highest after the key of the node the iterator currently refers to.  Puts the iterator into the null state if the key of the node currently referred to is the maximum of the keys of all the nodes in the tree, or if a read error occurs.  Has no effect if the iterator is already in the null state.  Effect undefined if iter structure was not passed to a start function.

 

5.16  Function decr_iter

 

void decr_iter(iter *it);

 

Causes the iterator to refer to the node whose key is the next lowest after the key of the node the iterator currently refers to.  Puts the iterator into the null state if the key of the node currently referred to is the minimum of the keys of all the nodes in the tree, or if a read error occurs.  Has no effect if the iterator is already in the null state.  Effect undefined if iter structure was not passed to a start function.

 

5.17 Function init_iter

 

void init_iter(iter *it);

 

Initializes the given iterator to the null state.  An iterator does not have to be initialized prior to passing it to the “start_iter” functions.

 

5.18 Function subst

 

AVL_HANDLE subst(avl *tree, AVL_HANDLE new_node);

 

If the node whose handle is passed as new_node has the same key as a node that is already in the tree, then the node already in the tree is removed from the tree and replaced by the new node.  If there is no node in the tree with the same key as the new node, no substitution is made, and the null value is returned.  If a substitution is made, the handle of the node that was removed from the tree is returned.  The null value is returned if a read error occurs. Calling this function invalidates all currently-existing iter structures (that are iterating over this tree).

 

5.19  Implementation Bit Masks

 

These parameterless macros each expand to a distinct positive integer value.  Each value masks a single bit.  There is one macro for each of the function prototypes generated by the interface header file.

 

AVL_IMPL_INIT

AVL_IMPL_IS_EMPTY

AVL_IMPL_INSERT

AVL_IMPL_SEARCH

AVL_IMPL_SEARCH_LEAST

AVL_IMPL_SEARCH_GREATEST

AVL_IMPL_REMOVE

AVL_IMPL_BUILD

AVL_IMPL_START_ITER

AVL_IMPL_START_ITER_LEAST

AVL_IMPL_START_ITER_GREATEST

AVL_IMPL_GET_ITER

AVL_IMPL_INCR_ITER

AVL_IMPL_DECR_ITER

AVL_IMPL_INIT_ITER

AVL_IMPL_SUBST

 

This parameterless macro masks all the bits.

 

AVL_IMPL_ALL

 

6  Version History

 

Version 1.0

 

 

Version 1.1

 

 

Version 1.2

 

 

Version 1.3

 

 

Version 1.4

 

 

Version 1.5