Big Online Book of Linux Ada Programming

18 Data Structures

Good programmers write good programs. Great programmers write good programs and good data structures. Organizing your data is as important as the program that crunches the data and produces a result.

Unfortunately, my experiences in the corporate world have taught me that that the only data structure used is the single dimensional array. When results are the only goal and more processing power is the cure for bad software design, arrays are easy to implement (they are built into Ada). Even the worst programmer knows how to use an array. And arrays are easy to understand. Try to use a linked list, and a programmer can get into trouble with his boss for using risky, "advanced" technology.

Alternatively, programmers will sometimes rely on the complexity and overhead of databases when a simplier solution using the correct data structure would be faster and easier to implement.

If you are lucky enough to work for a company that uses more than arrays, this chapter will discuss how to use other kinds of data structures in Ada.

18.1 Using the Booch Components

Like Ada, C++ has no advanced data structures built into the language. To provide a standard set of data structures, what is now called the Standard Template Library was developed to provide the tools necessary to organize most types of data.

Perhaps because of an oversight, Ada 95 with all its annexes has no equivalent to the C++ Standard Template Library. (Ada 2005 has a data strucutre library.) There are no standard packages providing common data structures. The Gnat compiler fills part of this void with packages for creating simple tables and hash tables.

The Booch components are a set of C++ objects created by Grady Booch. These were later ported to Ada 95. The components contain sets of general purpose data structures. The Booch components are available from AdaPower.Net or in RPM format from the Ada Linux Team. This is one popular choice for Ada's unofficial "Standard Template Library".

The components are organized into three main categories: tools, support and structs. The tools cover many items already implemented in the standard Ada or Gnat packages, such as searching, sorting and pattern recognition. Support refers to components that implement the tools and structs.

The structs (data structures) are the primary interest of Ada programmers. These are further subcategorized by the user's requirements: bounded (where the size is known at compile-time or there's no heap allocation), unbounded (using dynamic allocation and item caching), or the dynamic (a compromize between bounded and unbounded). The default if no others are available is unbounded.

Dynamic and unbounded types can specify a storage manager to use. The storage manager is a program that allocates memory. Use Global_Heap package if you have no preference.

Unbounded structures allocate memory whenever a new item is added to the structure.

Dynamic structures allocate memory in fixed-sized chunks. Each chunk is large enough for several items. The chunk size is set when the dynamic data structure is first created, but it can be altered at any time. When a chunk is full, the structure is grows by the size of another chunk. This reduces the number of memory allocations to improve performance.

Each dynamic structure includes these subprograms:

Create - Change the chunk size for the collection
Set_Chunk_Size - Change the chunk size for the collection
Preallocate - Increase the size of the collection immediately
Chunk_Size - Returns the current chunk size

The Booch components are organzied in a hierarchy of packages. The BC package is the top-most package. BC defines the basic execptions that can be raised by the various components:

  Container_Error : exception;
  Duplicate : exception;
  Illegal_Pattern : exception;
  Is_Null : exception;
  Lexical_Error : exception;
  Math_Error : exception;
  Not_Found : exception;
  Not_Null : exception;
  Not_Root : exception;
  Overflow : exception;
  Range_Error : exception;
  Storage_Error : exception;
  Synchronization_Error : exception;
  Underflow : exception;
  Should_Have_Been_Overridden : exception;
  Not_Yet_Implemented : exception;

The data structure components are:

Data Structure	Booch Packages	Description
Bags	`bc-containers-bags-bounded bc-containers-bags-dynamic bc-containers-bags-unbounded`	Unordered collection of items. Duplicates are counted but not actually stored.
Collections	`bc-containers-collections-bounded bc-containers-collections-dynamic bc-containers-collections-unbounded`	Ordered collection of items. Duplicates are allowed and stored.
Deques	`bc-containers-deques-bounded bc-containers-deques-dynamic bc-containers-deques-unbounded`	Double-ended queues
Single linked Lists	`bc-containers-lists-single`	A sequence of 0 or more items with a head and a pointer to each successive item.
Double linked Lists	`bc-containers-lists-double`	A sequence of 0 or more items with a head and a pointer to both successive and previous items.
Maps	`bc-containers-maps-bounded bc-containers-maps-dynamic bc-containers-maps-unbounded`	A set with relationships between pairs of items.
Queues	`bc-containers-queues-bounded bc-containers-queues-dynamic bc-containers-queues-unbounded`	First in, first out list.
Ordered (Priority) Queues	`bc-containers-queues-ordered-bounded bc-containers-queues-ordered-dynamic bc-containers-queues-ordered-unbounded`	A sorted list, items removed from the front.
Rings	`bc-containers-rings-bounded bc-containers-rings-dynamic bc-containers-rings-unbounded`	A deque with only one endpoint.
Sets	`bc-containers-sets-bounded bc-containers-sets-dynamic bc-containers-sets-unbounded`	Unordered collection of items. Duplicates are not allowed.
Stacks	`bc-containers-stacks-bounded bc-containers-stacks-dynamic bc-containers-stacks-unbounded`	Last in, first out list.
AVL Trees	`bc-containers-trees-avl`	Balanced binary trees
Binary Trees	`bc-containers-trees-binary-in_order bc-containers-trees-binary-post_order bc-containers-trees-binary-pre_order`	A list with two successors per item.
Multiway Trees	`bc-containers-trees-multiway-post_order bc-containers-trees-multiway-pre_order`	Tree with an arbitrary number of children.
Directed Graphs	`bc-graphs-directed`	Groups of items with one-way relationships
Undirected Graphs	`bc-graphs-undirected`	Groups of items with two-way relationships
Smart Pointers	`bc-smart`	Access types that automatically deallocate themselves

A definition of common data structures can be found at the National Institute of Standards and Technology.

The components are generic packages and must be instantiated for a particular type. They are arranged in hierarchies of generic packages. Each parent package must be instantiated before its child. For example, to use single linked lists (bc.containers.lists.single), bc.containers, bc.containers.lists, and bc.containers.lists.single must all be be created for the item type.

As with many component libraries, the Booch components represent all structures in memory, not in long-term storage. They cannot be used to create disk files, although the data could be saved to disk and reloaded later.

18.1.1 Containers

Containers form the cornerstone of the Booch components.

Containers are a controlled tagged record that encloses an item. The Booch components are composed of items stored in containers that are arranged in different ways.

To use any of the Booch components, a container must be instantiated to hold your item. For example, to create a new package to manage character in containers, use

  package charContainers is new BC.Containers (Item => Character);

18.1.2 Iterators

The Container package also manages the iterators used by the Booch components. An iterator is a variable that keeps track of the position in a data structure during a traversal.

Iterators are created by New_Iterator in a data structure's package, but the subprograms that work with the iterator are defined in the Container package.

Reset - start a new traversal at the first item
Next - continue to another item in the component
Is_Done - True if there are no more items
Current_Item - return the current item

The Is_Done function indicates when all items have been traversed. When Is_Done is true, Current_Item is undefined. In other words, the program must loop through all items in the list, plus 1, before Is_Done is true.

Because an Iterator is a class-wide type, it must be assigned a new value when it is declared to avoid a compiler error.

  i : charContainers.Iterator'class := charList.New_Iterator( customers );

18.1.3 Single linked Lists

Creating a single linked list requires the instantiation of 3 separate generic packages: BC.Containers, BC.Containers.Lists, and BC.Containers.Lists.Single. To avoid problems with access types, these should be declared globally (that is, in a package spec).

First, a container must be defined to hold the item you want to store in your linked list.

  package Containers is new BC.Containers (Item => Character);

Second, basic operations on lists must be instantiated.

  package Lists is new Containers.Lists;

Finally, the single linked list package must be instantiated. For an unbounded package, you chose a storage pool to use. Single linked lists are always unbounded. Use Global_Heap if you have no preference.

  package LS is new Lists.Single (Storage_Manager => Global_Heap.Pool,
                                  Storage => Global_Heap.Storage);

The single linked list package provides the following subprograms:

Clear - destroy the list
Insert - add an item to the list
Append - add an item to the end of the list
Remove - remove an item from the list
Purge - remove consecutive items
Preserve - inverse of Purge, keep consecutive items, removing the rest
Share/_Head/_Foot - make the list an alias for a sublist in another list
Tail - discard everything but the last item in the list
Length - return the number of items in the list
Is_Null - true if the list has no items
Is_Shared - true if there an alias to a sublist in the list
Head - return the top-most item
Foot - return the last item in the list
Item_at - return an item at a given position
New_Iterator - return an iterator for traversing the list
Process_Head/Tail - generic procedure to return an item with processing
Swap_Tail - NQS-KB

Notice that the term "Foot" refers to the last item in the list. The Ada string packages uses the term "Tail".

Here's an example:

with BC.Containers.Lists.Single;
with Global_Heap;

package customers is

  type aCustomer is record
       customerID     : integer;
       accountBalance : float;
  end record;
  -- this is the item to put in the list

  package customerContainers is new BC.Containers (Item => aCustomer);
  -- create a new controlled tagged record container for customers

  package customerLists is new customerContainers.Lists;
  -- create a new list support package for our using container type

  package customerList is new customerLists.Single (Storage_Manager => Global_Heap.Pool, Storage => Global_Heap.Storage);
  -- create a single linked list package using the lists support
  -- customized for our container type

end customers;


with ada.text_io, BC, customers;
use ada.text_io, BC, customers;

procedure list_demo is
  customers : customerList.Single_List;
  c         : aCustomer;
  i         : customerContainers.Iterator'class := customerList.New_Iterator( customers );
begin
  Put_Line( "This is a demo of the Booch components: single-linked lists" );
  New_Line;

  -- The Newly Declared List

  Put_Line( "The list is newly declared." );
  Put_Line( "The list is empty? " & customerList.Is_Null( customers )'img );
  Put_Line( "The list is shared? " & customerList.Is_Shared( customers )'img );
  Put_Line( "The list length is" & customerList.Length( customers )'img );
  New_Line;

  -- Inserting a customer

  c.customerID := 7456;
  c.accountBalance := 56.74;
  customerList.Insert( customers, c );

  Put_Line( "Added customer" & c.customerID'img );
  Put_Line( "The list is empty? " & customerList.Is_Null( customers )'img );
  Put_Line( "The list is shared? " & customerList.Is_Shared( customers )'img );
  Put_Line( "The list length is" & customerList.Length( customers )'img );

  c := customerList.Head( customers );
  Put_Line( "The head item is customer id" & c.customerID'img );
  c := customerList.Foot( customers );
  Put_Line( "The foot item is customer id" & c.customerID'img );
  New_Line;

  -- Apending a customer

  c.customerID := 9362;
  c.accountBalance := 88.92;
  customerList.Append( customers, c );

  Put_Line( "Appended customer" & c.customerID'img );
  Put_Line( "The list length is" & customerList.Length( customers )'img );
  c := customerList.Head( customers );
  Put_Line( "The head item is customer id" & c.customerID'img );
  c := customerList.Foot( customers );
  Put_Line( "The foot item is customer id" & c.customerID'img );
  New_Line;

  -- Iterator example

  Put_Line( "Resetting the iterator.." );
  customerContainers.Reset( i );
  c := customerContainers.Current_item ( i );
  Put_Line( "The current item is customer id" & c.customerID'img );
  Put_Line( "Are we done? " & customerContainers.Is_Done( i )'img );

  Put_Line( "Advancing to the next item..." );
  customerContainers.Next( i );
  c := customerContainers.Current_item ( i );
  Put_Line( "The current item is customer id" & c.customerID'img );
  Put_Line( "Are we done? " & customerContainers.Is_Done( i )'img );

  Put_Line( "Advancing to the next item..." );
  customerContainers.Next( i );
  Put_Line( "Are we done? " & customerContainers.Is_Done( i )'img );
  begin
    c := customerContainers.Current_item ( i );
  exception when BC.NOT_FOUND =>
    put_line( "BC.NOT_FOUND exception: no item at this position in the list" );
  end;
  
end list_demo;

This is a demo of the Booch components: single-linked lists

The list is newly declared.
The list is empty? TRUE
The list is shared? FALSE
The list length is 0

Added customer 7456
The list is empty? FALSE
The list is shared? FALSE
The list length is 1
The head item is customer id 7456
The foot item is customer id 7456

Appended customer 9362
The list length is 2
The head item is customer id 7456
The foot item is customer id 9362

Resetting the iterator..
The current item is customer id 7456
Are we done? FALSE
Advancing to the next item...
The current item is customer id 9362
Are we done? FALSE
Advancing to the next item...
Are we done? TRUE
BC.NOT_FOUND exception: no item at this position in the list

Single linked lists should not be Guarded.

18.1.4 Double linked Lists

Double linked lists are implemented exactly the same as single-linked lists except that the word "Double" is substituted for the word "Single".

Double linked lists are useful for lists that must be browsed backwards and forwards continuously.

Double linked lists should not be Guarded.

18.1.5 Bags

Bags, like linked lists, are collections of items. However, there is no attempt to order the items. Duplicate items can be stored, but the bag keeps a count of duplications to save memory instead of storing copies of the duplicates.

The bags package provides the following subprograms:

Are_Equal - True if two bags have the same contents
Clear - Removes all items from the bag
Add - Adds an item to the bag
Remove - Removes an item from the bag
Union - Add one bag to another
Intersection - Remove all items not common between two bags. Where there are duplicates, keep the lower duplicate count
Difference - Remove all items not common between two bags. Where there are duplicates, subtract from the original and discard the item if the total is >= 0
Extent - Return the number of distinct items
Total_Size - Return the total number of items (including duplicates)
Count - Return the number of occurences of an item in the bag
Is_Empty - True if the bag is empty
Is_Member - True if one or more copies of an item is in the bag
Is_Subset - True if the contents of one bag is completely contained in another
Is_Proper_Subset - Same as Is_Subset, but the bags must not be equal

Bags can be bounded, dynamic or unbounded.

Bags are implemented using a hash table. To declare a bag, a program must provide a hash function for storing items in the bag, and must indicate the size of the hash table.

Here's an example. Notice that some of the subprograms are in the Bags instantiation, and some in the Bags.Unbounded instantiation. Also notice the iterator moves over the items, but not the duplications:

with BC.Containers.Bags.Unbounded;
with Global_Heap;

package customers is

  type aCustomerID is new integer range 1_000..9_999;

  function IDHash( id : aCustomerID ) return Positive;
  -- our hash function

  package customerContainers is new BC.Containers (Item => aCustomerID);
  -- create a new controlled tagged record container for customers

  package customerBags is new customerContainers.Bags;
  -- create a new bag support for our using container type

  package customerBag is new customerBags.Unbounded(
          Hash => IDHash,
          Buckets => 99,
          Storage_Manager => Global_Heap.Pool,
          Storage => Global_Heap.Storage);
  -- create an unbounded bag package holding customer numbers

end customers;

package body customers is

  function IDHash( id : aCustomerID ) return Positive is
  -- our hash function
  begin
    return Positive( id ); -- in this case, using the id is good enough
  end IDHash;


end customers;

with ada.text_io, BC, customers;
use ada.text_io, BC, customers;

procedure bag_demo is
  customers : customerBag.Unbounded_Bag;
  c         : aCustomerID;
  i         : customerContainers.Iterator'class := customerBag.New_Iterator( customers );
begin
  Put_Line( "This is a demo of the Booch components: bags" );
  New_Line;

  -- The Newly Declared Bag

  Put_Line( "The bag is newly declared." );
  Put_Line( "The bag is empty? " & customerBag.Is_Empty( customers )'img );
  Put_Line( "The bag extent is" & customerBag.Extent( customers )'img );
  Put_Line( "The bag total size is" & customerBags.Total_Size( customers )'img );
  New_Line;

  -- Inserting a customer

  c := 7456;
  customerBags.Add( customers, c );

  Put_Line( "Added customer" & c'img );
  Put_Line( "The bag is empty? " & customerBag.Is_Empty( customers )'img );
  Put_Line( "The bag extent is" & customerBag.Extent( customers )'img );
  New_Line;

  -- Inserting another customer

  c := 9362;
  customerBags.Add( customers, c );

  Put_Line( "Added customer" & c'img );
  Put_Line( "The bag is empty? " & customerBag.Is_Empty( customers )'img );
  Put_Line( "The bag extent is" & customerBag.Extent( customers )'img );
  Put_Line( "The bag total size is" & customerBags.Total_Size( customers )'img );
  New_Line;

  -- Inserting duplicate customer

  c := 9362;
  customerBags.Add( customers, c );

  Put_Line( "Added customer" & c'img );
  Put_Line( "The bag is empty? " & customerBag.Is_Empty( customers )'img );
  Put_Line( "The bag extent is" & customerBag.Extent( customers )'img );
  Put_Line( "The bag total size is" & customerBags.Total_Size( customers )'img );
  New_Line;

  -- Iterator example

  Put_Line( "Resetting the iterator.." );
  customerContainers.Reset( i );
  c := customerContainers.Current_item ( i );
  Put_Line( "The current item is customer id" & c'img );
  Put_Line( "Are we done? " & customerContainers.Is_Done( i )'img );

  Put_Line( "Advancing to the next item..." );
  customerContainers.Next( i );
  c := customerContainers.Current_item ( i );
  Put_Line( "The current item is customer id" & c'img );
  Put_Line( "Are we done? " & customerContainers.Is_Done( i )'img );

  Put_Line( "Advancing to the next item..." );
  customerContainers.Next( i );
  Put_Line( "Are we done? " & customerContainers.Is_Done( i )'img );
  begin
    c := customerContainers.Current_item ( i );
  exception when BC.NOT_FOUND =>
    put_line( "BC.NOT_FOUND exception: no item at this position in the bag" );
  end;
  
end bag_demo;

This is a demo of the Booch components: bags

The bag is newly declared.
The bag is empty? TRUE
The bag extent is 0
The bag total size is 0

Added customer 7456
The bag is empty? FALSE
The bag extent is 1

Added customer 9362
The bag is empty? FALSE
The bag extent is 2
The bag total size is 2

Added customer 9362
The bag is empty? FALSE
The bag extent is 2
The bag total size is 3

Resetting the iterator..
The current item is customer id 7456
Are we done? FALSE
Advancing to the next item...
The current item is customer id 9362
Are we done? FALSE
Advancing to the next item...
Are we done? TRUE
BC.NOT_FOUND exception: no item at this position in the bag

Bags are useful for counting the occurrences of an item in a large amount of data.

18.1.6 Sets

Sets are essentially the same as bags but may not contain duplicates. The are useful for detecting the presence/absence of an item, or representing flags or conditions.

with BC.Containers.Sets.Bounded;
with Global_Heap;

package fruit_sets is

  -- my grandfather owned one of the largest fruit companies in the world

  type aFruit    is ( Apples, Grapes, Peaches, Cherries, Pears, Plums, Other );

  function FruitHash( f : aFruit ) return Positive;
  -- our hash function for the set

  package fruitContainers is new BC.Containers( item=> aFruit );
  -- basic fruit container

  package fruitSets is new fruitContainers.Sets;
  -- basic set support

  package fruitBoundedSets is new fruitSets.Bounded( fruitHash,
     Buckets => 10,
     Size => 20 );
  -- our actual set is an unbounded set

end fruit_sets;

package body fruit_sets is

  function FruitHash( f : aFruit ) return Positive is
  begin
    return aFruit'pos( f )+1; -- good enough for this example
  end FruitHash;

end fruit_sets;

with ada.text_io, kb_sets;
use ada.text_io, kb_sets;

procedure set_demo is
  use fruitSets;
  use fruitBoundedSets;
  s1 : Bounded_Set;
  s2 : Bounded_Set;
  s3 : Bounded_Set;
begin
  Put_Line( "This is a demo of the Booch components: sets" );
  New_Line;

  Add( s1, apples );
  Add( s1, peaches );
  Add( s2, apples );
  Add( s2, peaches );
  Add( s2, pears );

  Put_Line( "Set 1 has apples and peaches." );
  Put_Line( "Set 2 has apples, peaches and pears." );
  New_Line;
  Put_Line( "Extent of set 1? " & Extent( s1 )'img );
  Put_Line( "Extent of set 2? " & Extent( s2 )'img );
  Put_Line( "Peaches in set 1? " & Is_Member( s1, peaches )'img );
  Put_Line( "Pears   in set 1? " & Is_Member( s1, pears )'img );
  Put_Line( "Set 1 a subset of set 2? " & Is_Subset( s1, s2 )'img );
  Put_Line( "Set 2 a subset of set 1? " & Is_Subset( s2, s1 )'img );
  Put_Line( "Set 1 a subset of set 1? " & Is_Subset( s1, s1 )'img );
  Put_Line( "Set 1 a proper subset of set 1? " & Is_Proper_Subset( s1, s1 )'img );
  New_Line;

  s3 := s1;
  Union( s3, s2 );
  Put_Line( "Set 3 is the union of set 1 and set 2" );
  Put_Line( "Extent of set 3? " & Extent( s3 )'img );
end set_demo;

This is a demo of the Booch components: sets

Set 1 has apples and peaches.
Set 2 has apples, peaches and pears.

Extent of set 1?  2
Extent of set 2?  3
Peaches in set 1? TRUE
Pears   in set 1? FALSE
Set 1 a subset of set 2? TRUE
Set 2 a subset of set 1? FALSE
Set 1 a subset of set 1? TRUE
Set 1 a proper subset of set 1? FALSE

Set 3 is the union of set 1 and set 2
Extent of set 3?  3

18.1.7 Collections

Collections are a (conceptually) combination of lists and bags. Duplicates actually exist as copies in the collection, not simply counted. Collections are also indexed, like a list, so that items can be referenced in the collection.

The Collections package provides the following subprograms:

Create - Create a new collection and its initial chunk
Clear - Remove all items from a collection
Insert - Add an item in front of another
Append - Add an item to the end of the collection
Remove - Remove an item at an index
Replace - Replace an item at an index
Length - Return the number of items in the collection
Is_Empty - True if there are no items in the collection
First - Return the item at the front of the collection
Last - Return the item at the end of the collection
Item_At - Return the item at a particular index
Location - Return the first index where an item is found (0 if

Collections are implemented as dynamically allocated arrays.

with BC.Containers.Collections.Dynamic;
with Global_Heap;

package products is

  type aProduct is record
       id : integer;
       weight : float;
  end record;

  package productContainers is new BC.Containers (Item => aProduct);
  -- this is the basic container

  package productCollections is new productContainers.Collections;
  -- create a new collection support for our using container type

  package productCollection is new productCollections.dynamic(
          Storage_Manager => Global_Heap.Pool,
          Storage => Global_Heap.Storage);
  -- create a dynamic collection holding products

end products;

with ada.text_io, BC, products;
use ada.text_io, BC, products;

procedure collection_demo is
  products : productCollection.Dynamic_Collection;
  p        : aProduct;
  i        : productContainers.Iterator'class := productCollection.New_Iterator( products );
begin
  Put_Line( "This is a demo of the Booch components: collections" );
  New_Line;

  products := productCollection.Create( 100 );

  -- The Newly Declared Collection

  Put_Line( "The collection is newly declared with a chunk size of 100..." );
  Put_Line( "The collection is empty? " & productCollection.Is_Empty( products )'img );
  Put_Line( "The collection length is" & productCollection.Length( products )'img );
  Put_Line( "The collection chunk size is" & productCollection.Chunk_Size( products )'img );
  New_Line;

  -- Adding an Item

  p.id := 8301;
  p.weight := 17.0;
  productCollection.Append( products, p );

  Put_Line( "Product id" & p.id'img & " was added..." );
  Put_Line( "The collection is empty? " & productCollection.Is_Empty( products )'img );
  Put_Line( "The collection length is" & productCollection.Length( products )'img );
  Put_Line( "The collection chunk size is" & productCollection.Chunk_Size( products )'img );
  p := productCollection.First( products );
  Put_Line( "The first item is" & p.id'img );
  p := productCollection.Last( products );
  Put_Line( "The last item is" & p.id'img );
  New_Line;

  -- Adding another Item
  p.id := 1732;
  p.weight := 27.0;
  productCollection.Append( products, p );

  Put_Line( "Product id" & p.id'img & " was added..." );
  Put_Line( "The collection is empty? " & productCollection.Is_Empty( products )'img );
  Put_Line( "The collection length is" & productCollection.Length( products )'img );
  Put_Line( "The collection chunk size is" & productCollection.Chunk_Size( products )'img );
  p := productCollection.First( products );
  Put_Line( "The first item is" & p.id'img );
  p := productCollection.Last( products );
  Put_Line( "The last item is" & p.id'img );
  New_Line;

  -- Changing the Chunk Size

  productCollection.Set_Chunk_Size( products, Size => 1 );

  Put_Line( "The chunk size was reduced to only 1..." );
  Put_Line( "The collection is empty? " & productCollection.Is_Empty( products )'img );
  Put_Line( "The collection length is" & productCollection.Length( products )'img );
  Put_Line( "The collection chunk size is" & productCollection.Chunk_Size( products )'img );
  p := productCollection.First( products );
  Put_Line( "The first item is" & p.id'img );
  p := productCollection.Last( products );
  Put_Line( "The last item is" & p.id'img );
  New_Line;

  -- Iterator example

  Put_Line( "Resetting the iterator.." );
  productContainers.Reset( i );
  p := productContainers.Current_item ( i );
  Put_Line( "The current item is customer id" & p.id'img );
  Put_Line( "Are we done? " & productContainers.Is_Done( i )'img );

  Put_Line( "Advancing to the next item..." );
  productContainers.Next( i );
  p := productContainers.Current_item ( i );
  Put_Line( "The current item is customer id" & p.id'img );
  Put_Line( "Are we done? " & productContainers.Is_Done( i )'img );

  Put_Line( "Advancing to the next item..." );
  productContainers.Next( i );
  Put_Line( "Are we done? " & productContainers.Is_Done( i )'img );
  begin
    p := productContainers.Current_item ( i );
  exception when BC.NOT_FOUND =>
    put_line( "BC.NOT_FOUND exception: no item at this position in the collection" );
  end;

Collections are suitable for small lists or lists where the upper bound is known or rarely exceeded.

This is a demo of the Booch components: collections

The collection is newly declared with a chunk size of 100...
The collection is empty? TRUE
The collection length is 0
The collection chunk size is 100

Product id 8301 was added...
The collection is empty? FALSE
The collection length is 1
The collection chunk size is 100
The first item is 8301
The last item is 8301

Product id 1732 was added...
The collection is empty? FALSE
The collection length is 2
The collection chunk size is 100
The first item is 8301
The last item is 1732

The chunk size was reduced to only 1...
The collection is empty? FALSE
The collection length is 2
The collection chunk size is 1
The first item is 8301
The last item is 1732

Resetting the iterator..
The current item is customer id 8301
Are we done? FALSE
Advancing to the next item...
The current item is customer id 1732
Are we done? FALSE
Advancing to the next item...
Are we done? TRUE
BC.NOT_FOUND exception: no item at this position in the collection

18.1.8 Queues

Queues are a list in which items are removed in the same order they are added. Items are added at the end of the queue and removed at the front.

An ordered (or "priority") queue is a queue in which added items are sorted.

The queues package provides the following subprograms:

Clear - Remove all items from the queue
Append - Add an item to the back of the queue
Pop - Remove an item from the front of the queue and return it
Remove - Remove an item at a particular index
Length - Return the number of items in the queue
Is_Empty - True if there are no items in the queue
Front - Return the item at the front of the queue without removing it
Process - generic procedure to return an item with processing
Location - Return the first index where an item appears else 0
Are_Equal - True if two queues have the same items and length
Copy - Copy one queue to another

An ordered queue is identical except that append adds an item in sorted order.

Queues can be bounded, dynamic or unbounded.

Queues provide "fair" processing and reduce starvation.

18.1.9 Stacks

Stacks are lists in which the last item placed in the list is the first item removed.

The Stacks package provides the following subprograms:

Clear - Remove all items from the stack
Push - Add an item to the top of the queue
Pop - Remove an item from the top of the stack and return it
Depth - Return the number of items in the stack
Is_Empty - True if there are no items in the stack
Top - Return the item at the top of the stack without removing it
Process_Top - generic procedure to return an item with processing
Are_Equal - True if two stacks have the same items and length
Copy - Copy one stack to another

Stacks can be bounded, dynamic or unbounded.

Stacks are used for temporary storage, compact representation and fast data access.

18.1.10 Deques

Deques (double-ended queues, pronounced "deck") are a combination of a stack and queue where items can be placed at the front or the back and removed from either the front or the back.

The Deques package provides the following subprograms:

Clear - Remove all items from the deque
Append - Add an item to the deque
Pop - Remove an item from the deque and return it
Remove - Remove an item at a particular index
Length - Return the number of items in the deque
Is_Empty - True if there are no items in the deque
Front - Return the item at the front of the deque without removing
Back - Return the item at the back of the deque without removing it
Process_Front/_Back - generic procedure to return an item with processing
Location - Return the first index where an item appears else 0
Are_Equal - True if two deques have the same items and length
Copy - Copy one deque to another

Deques can be bounded, dynamic or unbounded.

18.1.11 Rings

Rings are similar to deques, but rings have no "front" or "back", only a moving point of reference called "top".

In addition to the deque subprograms, rings include "Mark" to mark a point in the ring, "Rotate_To_Mark" to move the ring to the marked position, and "At_Mark" to test to see if the top of the ring is at the mark.

Rings can be bounded or dynamic.

18.1.12 Maps

Maps are ordered pairs of related items. Each item is related to a "value" which may or may not be the same type. Maps relate items to values by "binding" them.

The Maps package provides the following subprograms:

Clear - destroy a map
Bind - relate an item to a value
Rebind - relate an item to a different value
Unbind - remove the relationship between an item and its value
Extent - return the number of relationships
Is_Empty - true if there are no relationships
Is_Bound - true if the item is related to a value
Value_Of - the value an item is related to
Visit - a "read-only" procedure to traverse the map
Modify - a procedure that traverses the map making changes

Maps are implemented with a hash table and caching.

Maps can be bounded, dynamic, unbounded or synchronized.

Maps are useful as translation tables.

18.1.13 Binary Trees

Binary trees are lists with two successors instead of 1, named "left" and "right". The items in the tree are not sorted by the Booch component. The program has full control on how items are added to the tree.

Programs "walk" the tree by moving the root of the tree up and down the links to the items. Left_Child follows the left child link. Right_Child follows the right child link. Parent follows the parent link. Each of these subprograms can be used as a procedure (to move the root of the tree) or as a function (to examine the item the link connects to).

  item := Item_At( tree );
  Put( "Left child of " & item ) ;
  item := Item_At( Left_Child( tree ) );
  Put_Line( " is " & item ) ;

When the root of the tree is moved, any items above the new root that aren't referenced anymore are destroyed. To move around the tree without destroying nodes (which is typically what you want to do), create an "alias" to the root of the tree with Create prior to moving.

  root := Create( tree ); -- create a reference to the root
  Left_Child( tree );     -- safe: old root is not destroyed

Moving into an empty (null) position in the tree is allowed, but any attempt to look at the item there will raise an exception. The leaves and the parent of the root are empty.

The Trees.Binary package provides the following subprograms:

Clear - Destroy the tree
Insert - Insert an item at the tree's root
Append - Add an item in the place of a particular item, moving the old item to a new position
Remove - Remove an item from the tree
Share - Create an alias to a subtree of the tree
Child/Left_Child/Right_Child - Move to a child item
Parent - Move towards the root
Set_Item - Make an item the root of the tree
Has_Children - True if the tree has any children items
Is_Null - True if the tree has no items
Is_Shared - True if any subtree has an alias to it
Is_Root - True if the tree is at the root of tree
Item_At - Return the item at the root of the tree

In addition, the tree may have an in_order, pre_order or post_order generic procedure. This procedure traverses the tree and executes processes each item. Pre_order processes an item before its children. Post_order processes an item after its children. In_order processes a node in the sort order of the tree--after all the left children but before all the right.

with BC.Containers.Trees.Binary.In_Order;
with BC.Containers.Trees.Binary.Pre_Order;
with BC.Containers.Trees.Binary.Post_Order;
with Global_Heap;

package shipment_binary is

  -- grandfather would be proud

  type aQuantity is ( Unknown, Basket_6Quart, Basket_11Quart, Bushel, Skid, Boxcar );
  type aFruit    is ( Apples, Grapes, Peaches, Cherries, Pears, Plums, Other );

  type aShipment is record
       number   : Positive;   -- number of containers
       quantity : aQuantity;  -- the containers
       contents : aFruit;     -- type of fruit
  end record;


  procedure visitShipment( s : aShipment; OK : out boolean );
  -- our tree traversal function

  package shipmentContainers is new BC.Containers( item=> aShipment );
  -- basic fruit container

  package shipmentTrees is new shipmentContainers.Trees;
  -- basic tree support

  package shipmentBinaryTrees is new shipmentTrees.Binary(
        Storage_Manager => Global_Heap.Pool,
        Storage => Global_Heap.Storage );
  -- our binary tree support

  procedure inOrdershipmentTraversal is new shipmentBinaryTrees.In_Order( visitShipment );
  -- an in-order traversal

  procedure preOrdershipmentTraversal is new shipmentBinaryTrees.Pre_Order( visitShipment );
  -- a pre-order traversal

  procedure postOrdershipmentTraversal is new shipmentBinaryTrees.Post_Order( visitShipment );
  -- a post-order traversal

end shipment_binary;

with ada.text_io;
use ada.text_io;

package body shipment_binary is

  procedure visitShipment( s : aShipment; OK : out boolean ) is
  -- our tree traversal function
  begin
     Put( "Shipment of" );
     Put( s.number'img );
     Put( " " );
     Put( s.quantity'img );
     Put( "(S) of " );
     Put_Line( s.contents'img );
     OK := true;
  end visitShipment;

end shipment_binary;

with ada.text_io, shipment_binary;
use ada.text_io, shipment_binary;

procedure bintree_demo is
  use shipmentBinaryTrees;
  root : Binary_Tree;
  t    : Binary_Tree;
  s    : aShipment;
  OK   : boolean;
begin
  Put_Line( "This is a demo of the Booch components: binary trees" );
  New_Line;

  -- this is the root item

  s.number := 5;
  s.quantity := basket_6quart;
  s.contents := cherries;
  Insert( t, s, Child => Left );
  -- child doesn't really matter because there's no prior item at the root

  root := Create( t ); -- remember where the root is

  -- add to left of root

  s.number := 7;
  s.quantity := basket_11quart;
  s.contents := pears;
  Append( t, s, Child => Left, After => Left );
  -- child doesn't really matter here

  -- add to right of root

  s.number := 12;
  s.quantity := bushel;
  s.contents := apples;
  Append( t, s, Child => Left, After => Right );
  -- child doesn't really matter here
 
  Left_Child( t );  -- move "t" down left branch

  s.number := 3;
  s.quantity := skid;
  s.contents := peaches;
  Append( t, s, Child => Left, After => Right );
  -- child doesn't really matter here

  Put_Line( "Our tree is: ");
  Put_Line( "          5 6 qt baskets of cherries" );
  Put_Line( "                         |" );
  Put_Line( "       +----------------------------------------------------+" );
  Put_Line( "       |                                                    |" );
  Put_Line( "7 11 qt baskets of pears                          12 bushels of apples" );
  Put_Line( "       |" );
  Put_Line( "       +-------------------------------|" );
  Put_Line( "                               3 skids of peaches" );
  New_Line;

  Put_Line( "In-order traversal:" );
  inOrderShipmentTraversal( root, OK );
  if not OK then
     Put_Line( "The traversal was interrupted" );
  end if;
  New_Line;

  Put_Line( "Pre-order traversal:" );
  preOrderShipmentTraversal( root, OK );
  if not OK then
     Put_Line( "The traversal was interrupted" );
  end if;
  New_Line;

  Put_Line( "Post-order traversal:" );
  postOrderShipmentTraversal( root, OK );
  if not OK then
     Put_Line( "The traversal was interrupted" );
  end if;

end bintree_demo;

This is a demo of the Booch components: binary trees

Our tree is: 
          5 6 qt baskets of cherries
                         |
       +----------------------------------------------------+
       |                                                    |
7 11 qt baskets of pears                          12 bushels of apples
       |
       +-------------------------------|
                               3 skids of peaches

In-order traversal:
Shipment of 7 BASKET_11QUART(S) of PEARS
Shipment of 3 SKID(S) of PEACHES
Shipment of 5 BASKET_6QUART(S) of CHERRIES
Shipment of 12 BUSHEL(S) of APPLES

Pre-order traversal:
Shipment of 5 BASKET_6QUART(S) of CHERRIES
Shipment of 7 BASKET_11QUART(S) of PEARS
Shipment of 3 SKID(S) of PEACHES
Shipment of 12 BUSHEL(S) of APPLES

Post-order traversal:
Shipment of 3 SKID(S) of PEACHES
Shipment of 7 BASKET_11QUART(S) of PEARS
Shipment of 12 BUSHEL(S) of APPLES
Shipment of 5 BASKET_6QUART(S) of CHERRIES

Binary trees should not be Guarded.

18.1.14 AVL Trees

AVL trees are binary trees that are balanced. On every insert or delete, the tree is restructured to keep its symmetry. As a result, the trees must be sorted by the Booch component and the program using the AVL tree must provide a "<" function to sort the tree by.

The AVL package provides fewer subprograms than the binary tree package:

Clear - destory the AVL tree
Insert - add an item into the AVL tree
Delete - remove an item form the AVL tree
Extent - return the number of items in the AVL tree
Is_Null - true if there are no items in the AVL tree
Is_Member - true if an item is in the AVL tree
Visit - traverse the tree in-order executing a "read only" procedure
Modify - traverse the tree in-order executing a procedure that can alter the items.

There are no subprograms for walking the tree.

Here is a sample declaration:

with BC.Containers.Trees.AVL;
with Global_Heap;

package fruit_avl is

  -- more fun with fruit

  type aQuantity is ( Unknown, Basket_6Quart, Basket_11Quart, Bushel, Skid, Boxcar );
  type aFruit    is ( Apples, Grapes, Peaches, Cherries, Pears, Plums, Other );

  type aShipment is record
       number   : Positive;   -- number of containers
       quantity : aQuantity;  -- the containers
       contents : aFruit;     -- type of fruit
  end record;


  function sortCriteria( left, right : aShipment ) return boolean;
  -- for sorting the AVL tree

  package shipmentContainers is new BC.Containers( item=> aShipment );
  -- basic fruit container

  package shipmentTrees is new shipmentContainers.Trees;
  -- basic tree support

  package shipmentAVLTrees is new shipmentTrees.AVL(
        sortCriteria,
        Storage_Manager => Global_Heap.Pool,
        Storage => Global_Heap.Storage );
  -- our AVL tree support

end fruit_avl;

package body fruit_avl is

  function sortCriteria( left, right : aShipment ) return boolean is
  begin
    return left.number < right.number;
  end sortCriteria;

end fruit_avl;

AVL trees have slower inserts and deletes than binary trees but are faster than a normal binary tree for searching.

18.1.15 Multiway Trees

A multiway tree is a tree with any number of unsorted children (as opposed to a binary tree which always has no more than two chidren).

The subprograms are similar to a binary tree. The append procedures add child items to an item. A new function called "Arity" returns the number children an item has.

Multiway trees should not be Guarded.

18.1.16 Graphs

Essentially, graphs are a generalization of maps where any number of items can be related to each other (as opposed to only two).

A directed graph is a set of items (vertices) that are connected by relationships (edges or "arcs"). Like a single linked list, a program can only move forward along an arc.

Items can also be linked to themselves.

The graphs-directed package provides the following subprograms:

Create_Arc - add a relationship between two items
Number_Of_Incoming_Arcs - return the number of incoming arcs to an item
Number_Of_Outgoing_Arcs - return the number of outgoing arcs to an item
Set_From_Vertex - move an arch's source to a new item
Set_To_Vertex - move an arc's destination to a new item
From_Vertex - return the source item for an arc
To_Vertex - return the destination item for an arc

There are four iterators: a graph iterator, and three iterators for visiting items (incoming, outgoing and both).

An undirected graph is a directed graph with pointers to both the previous and next item along an arc. Like a double linked list, a program can move forwards or backwards along an arc.

The graphs-undirected package provides the following subprograms:

Create_Arc - add a relationship between two items
Arity - return the number of arcs. Self-arcs are counted only once
Set_First_Vertex - move an arch's first item to a new item
Set_Second_Vertex - move an arc's second item to a new item
First_Vertex - return the first item for an arc
Second_Vertex - return the second item for an arc

There are two iterators: a graph iterator and an item iterator.

Graphs should not be Guarded.

18.1.17 Smart Pointers

Smart pointers are an access type that counts the number of references to the item being pointed to. Your program allocates the item. The item is deallocated when no more pointers point to it. Smart pointers are a simplified form of garbage collection.

The smart package provides the following subprograms:

Create - create a new smart pointer from an access variable
Value - return the item pointed to by the smart pointer

with BC.smart;

package depts is

  type departments is ( accounting, information_technology, shipping, human_resources );
  type deptAccess  is access all departments;
  package deptPtrs is new BC.smart( departments, deptAccess );

end depts;

with ada.text_io, depts;
use ada.text_io, depts;

procedure sp_demo is
  accountingPtr  : deptPtrs.Pointer;
  accounting2Ptr : deptPtrs.Pointer;
  department     : deptAccess;
begin
  Put_Line( "This is a demo of the Booch components: smart pointers" );
  New_Line;

  department := new departments'( accounting );

  Put_Line( "Assigning dynamically allocate value to a smart pointer" );
  accountingPtr := deptPtrs.Create( department );
  Put_Line( "The accounting pointer points at " & deptPtrs.Value( accountingPtr ).all'img );
  New_Line;

  Put_Line( "Assigning a smart pointer to a smart pointer" );
  accounting2Ptr := accountingPtr;
  Put_Line( "The accounting pointer 2 points at " & deptPtrs.Value( accounting2Ptr ).all'img );
  New_Line;

  Put_Line( "The memory is released when the program ends or no more pointers" );
  Put_Line( "access the memory." );
end sp_demo;

This is a demo of the Booch components: smart pointers

Assigning dynamically allocate value to a smart pointer
The accounting pointer points at ACCOUNTING

Assigning a smart pointer to a smart pointer
The accounting pointer 2 points at ACCOUNTING

The memory is released when the program ends or no more pointers
access the memory.

18.1.18 Booch Multithreading

Booch components can be guarded (manually "locking" the structure for exclusive access) or synchronized (implicit blocking) for multithreading purposes.

Guarding is implemented by creating extending a container type to a Guarded_Container using the GC.Containers.Guarded package. Guarded containers contain two new subprograms, "Seize" and "Release", to lock and unlock a container. (This is implemented using a semaphore.) Any Booch data structure can be made guarded using guarded containers, but in some cases guarding will not work as expected and should not be used (for example, with lists).

The basic semaphore locks individual objects (although it many not work as expected on certain structures such as lists, according to AdaPower.Net). The basic semaphore can be extended and customized by a programmer.

Rewriting the Bags example with guards:

with BC.Containers.Bags.Unbounded;
with BC.Containers.Guarded;
with BC.Support.Synchronization;
with Global_Heap;

package guarded_customers is

  type aCustomerID is new integer range 1_000..9_999;

  function IDHash( id : aCustomerID ) return Positive;
  -- our hash function

  package customerContainers is new BC.Containers (Item => aCustomerID);
  -- this is the basic container

  package customerBags is new customerContainers.Bags;
  -- create a new bag support for our using container type

  package customerBag is new customerBags.Unbounded(
          Hash => IDHash,
          Buckets => 99,
          Storage_Manager => Global_Heap.Pool,
          Storage => Global_Heap.Storage);
  -- create an unbounded bag holding customer numbers

  package customerGuardedBag is new customerContainers.Guarded (
     Base_Container => customerBag.Unbounded_Bag,
     Semaphore => BC.Support.Synchronization.Semaphore );
  -- create a new controlled tagged record container for customers

end guarded_customers;

A new guarded bag can now be declared:

  customers : customerGuardedBag.Guarded_Container;

and the bag can be locked using

  customerGuardedBag.Seize( customers );

Synchronized access by threads is implemented in special versions of the data structure packages (for example, maps.synchronized). With synchronized packages, the implementation details are hidden from the user.

18.2 XMLAda - Unicode, XML, SAX and DOM

Ada Core Technologies provides a set of XML and Unicode packages called XMLAda. It is open source and can be downloaded from the ACT Europe's web site at https://libre2.adacore.com/xmlada/. Limited documentation is available online at this web site as well.

Unicode is bundled together with XML because XML uses Unicode characters. Unicode characters are not the same as Ada's standard 16-bit wide character type. Wide characters are based on a different standard.

This is an overview of using these packages, including short examples to see how they are used. It does not discuss the details of Unicode or XML.

When you configure XMLAda, you're given a choice of where to install the library (/usr/local might be a good choice). To compile, bind and link your programs, you'll need to include the locations of the library and Ada package spec files. For example, for /usr/local you will need to use these options:

gnatmake -c -I. -I/usr/local/include/xmlada/ -L/usr/local/lib/ my_program
gnatbind -aO./ -I. -I/usr/local/include/xmlada/ -I- -x my_program.ali
gnatlink my_program.ali -L/usr/local/lib/ -lxmlada_unicode -lxmlada_input_sources -lxmlada_sax -lxmlada_dom

If you are not using a particular subsystem, you can omit the -l library option. However, DOM is built using SAX--if you use the DOM packages, you'll need to include the SAX library.

18.2.1 Unicode Basics

Unicode comes in three versions: fixed length 32-bit UTF-32, variable length 16-bit UTF-16, and variable length UTF-8. All three versions can represent the same characters, from ASCII to ancient languages. UTF-32 is the basic Unicode type and each character is always 32 bits. UTF-16 is a compressed version of UTF-32: each character is 16-bits but some character as represented by two 16-bit characters in a row. UTF-8 has even more compression: the first 128 characters are equivalent to ASCII (or Latin-1), the standard Ada character sets, but the upper 128 codes can be used for characters using up to 6 consecutive bytes for one character. The Ada packages support all three types of Unicode.

An example of UTF-8 in many languages is available on the UTF-8 Sampler Page. This web page is composed of UTF-8 characters and uses "<META http-equiv="Content-Type" content="text/html; charset=utf-8"> in the HTML header. The xterm program is also supposed to support UTF-8 characters (for example, if you dump a UTF-8 document to standard output in an xterm window).

Do you need to know the character codes? They are available in sets of PDF charts at unicode.org.

The Unicode packages sometimes work with string types, sometimes with numeric types. A basic UTF-32 character (type unicode_char defined in unicode.ads) and is a 32-bit number.

type Unicode_Char is mod 2**32;

Basic UTF-32 functions such as is_letter, is_digit or to_unicode (ASCII/Latin-1 to UTF-32 character) are found in unicode.ads.

18.2.2 UTF and ASCII Characters

The first 127 characters in Unicode are identical to the 127 character ASCII set. If you are strictly working with these characters, it's easy to convert them to Unicode characters.

with unicode;
use unicode;
procedure ASCII_to_Unicode is
  utf_32   : unicode_char;    -- UTF-32 character
  ascii_ch : character;       -- Latin-1 (or ASCII) character
begin
  utf_32 := character'pos( 'A' );
  ascii_ch := character'val( utf_32 );
end ASCII_to_Unicode;

Characters above ASCII 127 will result in garbage or a run-time CONSTRAINT_ERROR exception.

The Unicode packages do not define a UTF-8 character type but you can create your own type.

procedure ASCII_to_UTF_8 is
  subtype unicode_utf_8_char is unicode_char range 0..255;
  utf_8 : unicode_utf_8_char;
  ascii_ch : character;
begin
  utf_8 := character'pos( 'A' );
  ascii_ch := character'val( utf_8 );
end ASCII_to_UTF_8;

Memory can be saved by limiting the UTF-8 character to 7 bits.

type unicode_utf_8_char is new unicode_char range 0..127;
for unicode_utf_8_char'size use 7;

18.2.3 Unicode and Latin-1 Conversions

Latin-1 (the extended ASCII standard used by Ada) and the first 8 bits of UTF-32 are not identical. To convert between them, you'll need to use a Unicode conversion function. The Unicode packages contain many conversion functions under the unicode.ccs package hierarchy. In particular, unicode.css.iso_8859_1.ads performs Latin-1 conversions.

The following short program demonstrates how to convert between UTF-32 strings and Ada unbounded strings (Latin-1 characters). (The next section will show how to convert any string to any character set.)

with ada.text_io,
     ada.strings.unbounded,
     unicode.ccs.iso_8859_1;
use ada.text_io,
     ada.strings.unbounded,
     unicode,                   -- basic Unicode
     unicode.ccs,               -- Unicode conversions
     unicode.ccs.iso_8859_1;    -- Latin-1 conversions

procedure unitest is

   type a_unicode_string is array(1..80) of unicode_char;

   function to_unicode_string( msg : unbounded_string ) return a_unicode_string is
     -- to 32 byte unicode characters (UTF-32)
     result : a_unicode_string := (others => 0);
   begin
     for c in 1..length( msg ) loop
         result(c) := to_unicode( element( msg, c ) );
     end loop;
     return result;
   end to_unicode_string;

   function to_unbounded_string( msg : a_unicode_string ) return unbounded_string is
     -- to Latin-1
     result : unbounded_string;
     c : natural;
   begin
     c := 1;
     while c <= a_unicode_string'last and then msg( c ) /= 0  loop
         result := result & character'val( to_iso_8859_1( msg( c ) ) );
         c := c + 1;
     end loop;
     return result;
   end to_unbounded_string;

   msg : unbounded_string := to_unbounded_string( "this is a test" );
   umsg : a_unicode_string;
   msg2 : unbounded_string;

begin

   put_line( "Original: " & to_string( msg ) );
   umsg := to_unicode_string( msg );
   msg2 := to_unbounded_string( umsg );
   put_line( "After conversion: " & to_string( msg2 ) );

end unitest;

Using characters above Latin-1 255 will result in garbage or a run-time CONSTRAINT_ERROR exception.

18.2.4 Ada, UTF-8 and UTF-16 Strings

UTF-32 characters are the most flexible form of Unicode but they use a large amount of space. UTF-8 uses less space than UTF-32 and is compatible with ASCII but the characters can require up to 6 consecutive bytes. Unfortunately, although the Unicode packages support UTF-8, it is not easy to use. It is not supported directly and all conversions must be done through UTF-32. This creates some confusing terminology as there are many "to_utf32" and "from_utf32" functions that work on different kinds of 32-bit characters (not necessarily "UTF" characters at all!).

Converting an Ada string into UTF-8 format is a 3 step process:

Convert the Ada string from 8-bit to 32-bit characters (using the unicode.ces.basic_8bit) package.
Convert the 32-bit characters to UTF-32 32-bit characters (using the unicode.ces.utf32 package and the unicode.css.iso_8859_1 package).
Convert the UTF-32 string into a UTF-8 string (using the unicode.ces.utf8) package.

The unicode.ces packages include string types that work with the functions. String types with the name "UTF32" have 32-bit characters. "UTF8" strings have 8-bit characters. "LE" strings are "little-endian". These strings are all renamings of a standard Ada string and are provided to make programs easier to read. That is, a unicode_char is a number but a utf32_le_string is a string type, not an array of numbers as one might expect. Another effect is that you cannot use the unicode.ces packages and refer to the functions: they must be named in full so the compiler will know what functions you are referring to.

Putting it together, here is the method to convert an Ada string to a UTF-8 string:

s : string := "This is a test";
--  8-bit Latin-1 string (normal Ada string)

s_32 : utf32_le_string := unicode.ces.basic_8bit.to_utf32( s );
-- 32-bit Latin-1 string (normal Ada string with 32-bit characters)

u_32 : utf32_le_string := unicode.ces.utf32.to_unicode_le( s_32,
       cs => unicode.ccs.iso_8859_1.iso_8859_1_character_set );
-- UTF-32 string (convert Latin-1 to Unicode characters)

u_8  : utf8_string := unicode.ces.utf8.from_utf32( u_32 );
-- change UTF-32 to UTF-8

The UTF-8 string may have more bytes than the original string. Use the unicode.ces.utf8.length function (instead of the 'length attribute) to determine the number of characters in the string.

To convert from UTF-8 to Ada strings:

Convert the UTF-8 string into a UTF-32 string (using the unicode.ces.utf8) package.
Convert the 32-bit characters to Latin-1 32-bit characters (using the unicode.ces.utf32 package and the unicode.css.iso_8859_1 package).
Convert the 32-bit Ada string to 8-bit characters (using the unicode.ces.basic_8bit) package.

Here is a complete example that translates a string to UTF-8 and back to an Ada string:

with ada.text_io,
     unicode.ces.utf8,
     unicode.ces.utf32,
     unicode.ces.basic_8bit,
     unicode.ccs.iso_8859_1;
use  ada.text_io,
     unicode,
     unicode.ccs,
     unicode.ces,
     unicode.ces.utf8,
     unicode.ces.utf32;

procedure there_and_back_again is

   s    : string := "This is a test";
   --  8-bit Latin-1 string (normal Ada string)

   s_32 : utf32_le_string := unicode.ces.basic_8bit.to_utf32( s );
   -- 32-bit Latin-1 character (normal Ada string with 32-bit characters)

   u_32 : utf32_le_string := unicode.ces.utf32.to_unicode_le( s_32,
          cs => unicode.ccs.iso_8859_1.iso_8859_1_character_set );
   -- UTF-32 string (convert Latin-1 to Unicode characters)

   u_8  : utf8_string := unicode.ces.utf8.from_utf32( u_32 );
   -- change UTF-32 to UTF-8

   undo_u_32 : utf32_le_string := unicode.ces.utf8.to_utf32( u_8 );
   -- change UTF-8 to UTF-32

   undo_s_32 : utf32_le_string := unicode.ces.utf32.to_unicode_le( undo_u_32,
          cs => unicode.ccs.iso_8859_1.iso_8859_1_character_set );
   -- change UTF-32 to Latin-1 with 32-bit characters
   -- KB: is this right?  Would make more sense as to_cs but that function
   -- throws an exception...

   undo_s : string := unicode.ces.basic_8bit.from_utf32( undo_s_32 );
   -- back to original string "This is a test"

begin
   put_line( "Original string = '" & s & "'" );
   put_line( "       To UTF-8 = '" & u_8 & "'" );
   put_line( "     Back again = '" & undo_s & "'" );
   if s = undo_s then
      put_line( "Translation successful" );
   end if;
end there_and_back_again;

The results:

Original string = 'This is a test'
       To UTF-8 = 'This is a test'
     Back again = 'This is a test'
Translation successful

Using the same process, you can use UTF-16 by using the appropriate packages.

KB: Do the strings need to be free'd?

18.2.5 Inputting XML

To provide a standard way of reading XML information, XMLAda includes a set of input packages. This packages are designed to read information from a file or string. You can also create custom input functions to read XML information from other sources.

The Input packages are designed to look like standard Ada file I/O operations. To use an XML file, use the Input_Sources.File package. You open the file, get a Unicode character, check for the end of file and close the file. The file could be a string or some other form of input. The Set_Encoding procedure informs the Input packages about the type of characters (Latin_1, UTF-8, etc.), although the encoding of a file can usually be determined automatically.

data : Input_Sources.File.File_Input;
...
Input_Sources.File.Open( "data.xml", data );
-- Parse the information
Input_Sources.File.Close( data );

If you are using SAX or DOM, you normally only need to open or close the XML source. The parsers will handle reading the XML data.

This program will dump the contents of an XML file called "data.xml" to standard output:

with ada.text_io,
     unicode,
     input_sources.file;
use  ada.text_io,
     unicode,
     input_sources.file;

procedure xml_dump is
   data : File_Input;
   uc   : unicode_char;
begin
   Open( "data.xml", data );
   while not Input_Sources.File.Eof( data ) loop
      Next_Char( data, uc );
      if uc >= 32 and uc < 127 then   -- Printable ASCII?
         put( character'val( uc ) );
      elsif uc = 10 then              -- Linefeed
         new_line;
      else
         put( "[#" & uc'img & ']' ); -- other character, print numeric code
      end if;
   end loop;
   new_line;
   Close( data );
end xml_dump;

Strings are more difficult. Although there is a Input_Sources.Strings package, there is no predefined handling for Ada or Unicode strings. You have to write a set of callbacks to read the string yourself.

with ada.text_io,
     unicode.ces,
     unicode.ccs.iso_8859_1,
     input_sources.strings;
use ada.text_io,
     unicode,
     unicode.ces,
     input_sources.strings;

procedure xml_dump_string is

   data : String_Input;
   uc   : unicode_char;

   s    : aliased byte_sequence := "<thing>This is a test</thing>";
   -- s is the source string

   -- These callbacks are described in unicode.ces.  For Latin_1, they need
   -- to work with one byte characters only.

   procedure s_read( Str   : Byte_Sequence; Index : in out Positive; Char  : out Unicode_Char) is
   begin
      char := character'pos( str( index ) );
      index := index + 1;
   end s_read;

   function s_width( char : Unicode_Char ) return natural is
   begin
      return 1;
   end s_width;

   procedure s_encode( Char   : Unicode_Char; Output : in out Byte_Sequence; Index  : in out Natural) is
   begin
      output( index+1 ) := character'val( char );
   end s_encode;

   function s_length( str : byte_sequence ) return natural is
   begin
     return str'length;
   end s_length;

   scheme : encoding_scheme := ( s_read'unrestricted_access, s_width'unrestricted_access, s_encode'unrestricted_access, s_length'unrestricted_access);
   -- a collection of callbacks to read string s
   -- I should not use unrestricted_access but don't want to fool around with
   -- Ada's pointer scoping restrictions for this short example...
begin

   Open( s'unrestricted_access, scheme, data );
   while not Eof( data ) loop
      Next_Char( data, uc );
      if uc >= 32 and uc < 127 then   -- Printable ASCII?
         put( character'val( uc ) );
      elsif uc = 10 then              -- Linefeed
         new_line;
      else
         put( "[#" & uc'img & ']' );
      end if;
   end loop;
   new_line;
   Close( data );

end xml_dump_string;

The results:

<thing>This is a test</thing>

18.2.6 Parsing XML Using SAX (Simple API for XML)

Interpreting XML is difficult because it has a recursive structure: using nested tags allows great flexibility but it means that any individual tag only has meaning when considered in the context of the surrounding tags. Or to put it another way, you can't grep XML.

XMLAda provides two methods for interpreting (or parsing) an XML document. The first is SAX (Simple API for XML) and is particularily useful for short documents or documents without a lot of nested context. SAX is sometimes referred to as as callback or push-based parsing. SAX reads through an XML document and calls procedures you define to handle the different parts of the document SAX encounters. Because SAX is a standard, using SAX in Ada is very similar to using SAX in other computer languages.

Create a set of handlers for different XML components (elements (or tags), attributes, free text, etc.). These handlers should be put in a tagged record extended from sax.readers.reader. With objects, you can create a hierarchy of readers for different kinds of XML files.

Since SAX uses callbacks, it doesn't have to load the entire XML document at one time. XML documents with simple structures can be quickly analysed.

The simplest SAX program reads through an XML file and does nothing by default:

with ada.text_io,
     unicode.ces,
     input_sources.file,
     sax.readers;
use ada.text_io,
     unicode,
     input_sources.file,
     sax.readers;

procedure sax_nothing is
   data : File_Input;
   r    : sax.readers.reader;
begin
   Open( "data.xml", data );
   Parse( r, data );
   Close( data );
end sax_nothing;

To do something useful, you have to extend the class and create new handler functions. Create a new tagged record based on sax.readers.reader and replace that default content handlers for the content you want to examine. The most common ones are:

Start_Document: called before parsing the document
End_Document: called after the document is finished parsing
Start_Element: called when SAX reads an XML open tag
End_Element: called for a XML closing tag
Characters: called for the free characters between the tags, not including ignorable whitespace (another callback). The SAX standard does not specify whether or not the handler receives all or part of a set of free characters--XMLAda calls Characters with a complete set of free characters.

The parameters and their names are based on the SAX standard. The most common parameters are:

Local_Name: the name of the item minus an namespace
Qname: the name as seen in the document
Attributes: the attributes attached to a tag (as a tagged record)

There are many possible handlers. Check sax.readers.ads for a complete list.

If the XML file is incorrect, an XML_FATAL_ERROR exception will be raised.

18.2.7 SAX Example: RSS Headlines

RSS is an XML standard for reporting news articles. An RSS file contains news articles look like this:

  <item>
     <title>The Headline</title>
     <link>Link to the article</title>
     ...
  </item>

Suppose you want to write a program to display news headlines. To do this, you'd need to capture the free text (the Characters callback) and display the text when the </title> tag is reached (the End_Element callback). But you only want to display the <title> tags inside of <item> tags because there are other titles (such as the title of the web site where the items came from).

Create a new package called my_readers to contain our RSS tagged record called my_rss_reader. The record will have handlers for starting and ending tags as well as character content. The titles are displayed with the dump procedure. This procedure assumes that the XML file is ASCII or Latin-1, but the procedure can be rewritten to handle Unicode as discussed above.

-- my_readers.ads
with Sax.Exceptions;
with Sax.Locators;
with Sax.Readers;
with Sax.Attributes;
with Sax.Models;
with Unicode.CES;

with ada.strings.unbounded;
use  ada.strings.unbounded;

package my_readers is
  type my_rss_reader is new sax.readers.reader with private;

   procedure Start_Element
     (Handler       : in out my_rss_reader;
      Namespace_URI : Unicode.CES.Byte_Sequence := "";
      Local_Name    : Unicode.CES.Byte_Sequence := "";
      Qname         : Unicode.CES.Byte_Sequence := "";
      Atts          : Sax.Attributes.Attributes'Class);
   -- The start of a tag.  e.g. <title>

   procedure End_Element
     (Handler : in out my_rss_reader;
      Namespace_URI : Unicode.CES.Byte_Sequence := "";
      Local_Name    : Unicode.CES.Byte_Sequence := "";
      Qname         : Unicode.CES.Byte_Sequence := "");
   -- The end of a tag.  e.g. </title>

   procedure Characters
     (Handler : in out my_rss_reader; Ch : Unicode.CES.Byte_Sequence);
   -- The free text contained between tags.  Ch is a string not a single character.

   -- The rest we don't need

private

  type my_rss_reader is new sax.readers.reader with record
       title   : unbounded_string;    -- last free text
       in_item : boolean := false;    -- true if in an <item>
  end record;

end my_readers;

-- my_readers.adb
with ada.text_io;
use  ada.text_io;

package body my_readers is

   procedure dump( ch : unicode.ces.byte_sequence ) is
     -- Dump some ASCII-compatible characters.  This assumes ASCII or
     -- Latin-1 characters in the XML file.
     c : character;
     ch_code : natural;
   begin
     for i in 1..ch'length loop
        c := ch(i);
        ch_code := character'pos( c );
        if ch_code >= 32 and ch_code < 127 then        -- Printable ASCII?
           put( c );
        elsif ch_code = 10 then                        -- Linefeed
           new_line;
        else                                           -- Others
           put( "[#" & ch_code'img & ']' );
        end if;
     end loop;
  end dump;

  procedure Start_Element
    (Handler       : in out my_rss_reader;
     Namespace_URI : Unicode.CES.Byte_Sequence := "";
     Local_Name    : Unicode.CES.Byte_Sequence := "";
     Qname         : Unicode.CES.Byte_Sequence := "";
     Atts          : Sax.Attributes.Attributes'Class) is
  begin
     if local_name = "item" then                       -- <item>
        Handler.in_item := true;                       -- starting an item
     end if;
  end Start_Element;

  procedure End_Element
    (Handler : in out my_rss_reader;
     Namespace_URI : Unicode.CES.Byte_Sequence := "";
     Local_Name    : Unicode.CES.Byte_Sequence := "";
     Qname         : Unicode.CES.Byte_Sequence := "") is
  begin
     if local_name = "title" then                      -- </title>
        if Handler.in_item then                        -- item title?
           dump( to_string( handler.title ) );         -- show it
           new_line;
        end if;
     elsif local_name = "item" then                    -- </item>
           Handler.in_item := false;                   -- leaving an item
     end if;
  end End_Element;

  procedure Characters
    (Handler : in out my_rss_reader; Ch : Unicode.CES.Byte_Sequence) is
  begin
    handler.title := to_unbounded_string( ch );
  end Characters;

end my_readers;

-- rss.adb
with ada.text_io,
     unicode.ces,
     input_sources.file,
     my_readers;
use  ada.text_io,
     unicode,
     input_sources.file,
     my_readers;

procedure rss is
   data : File_Input;
   r    : my_rss_reader;
begin
   Open( "coder.rss", data );
   Parse( r, data );
   Close( data );
end rss;

The results for my coder.rss file for my Lone Coder blog is as follow:

Lone Coder: Google: Lawful Good or Chaotic Neutral?
Lone Coder: The Tyrrany of the Label
Lone Coder: Losing Control of your Linux Startup
Lone Coder: OpenSuSE 10: Developer Dream or Crippleware?
Lone Coder: In Mourning of Statftime
Lone Coder: The Need for Speed: Speedbumps on the Web

18.2.8 SAX Attribute Handling

An attribute is additional information added to an XML tag. For example, in HTML with <p align="right">, align="right" is an attribute. align is the name of the attribute. right is the value of the attribute.

The Start_Element handler has an Atts tagged record that describes any attributes included with the tag. The subprograms available for attributes are described in sax-attributes.ads. The most commonly used subprograms are:

Get_Length - return the number of attributes
Get_Local_Name - return the name of an attribute
Get_Value - return the value of an attribute

Attributes are numbered from zero. To display the attributes on an XML tag, use the following in your Start_Element handler:

use SAX.Attributes;
...
for i in 0..get_length( Atts )-1 loop
    put_line( "Name = " & get_local_name( Atts, i ) );
    put_line( "Value = " & get_value( Atts, i ) );
end loop;

18.2.9 Parsing XML Using DOM (Document Object Model)

SAX calls your callbacks as it parsers an XML file. DOM, Document Object Model, works by loading the entire XML file and building a tree based on the tags. Using the DOM packages, you can "walk" the tree, node by node. This uses more memory but doesn't constrain how you examine the document the way SAX does (top to bottom). DOM, like SAX, is a standard and using DOM in Ada is similar to using DOM in other languages.

The default DOM reader, Tree_Reader, is used to create the document tree. Unlike SAX, the default reader will provide most of the functionality you need and you won't need to extend it.

with input_sources.file,
     dom.readers,
     dom.core;
use  dom.readers,
     dom.core,
     input_sources.file;

procedure dom_nothing is
   data : File_Input;
   r    : Tree_Reader;
   d    : Document;
begin
   Open( "data.xml", data );
   d := get_tree( r );
   Parse( r, data );
   free( r );
   Close( data );
end dom_nothing;

A longer example: display the first XML tag with attributes.

with input_sources.file,
     dom.readers,
     dom.core.documents,
     dom.core.nodes;
use dom.readers,
     dom.core,
     dom.core.documents,
     dom.core.nodes,
     input_sources.file;

with ada.text_io;
use  ada.text_io;

procedure dom_first_node is

   procedure dump( n : node ) is
   begin
     if n /= null then
        case n.node_type is
        when element_node =>                            -- An XML tag
             put( "<" );
             put( node_name( n ) );                     -- show name
             declare                                    -- attributes
               AS : Named_Node_Map := Attributes (N);
             begin
               for I in 0..Length( AS )-1 loop        -- numbered from zero
                  put( " " );
                  put( node_name( Item(AS, I) ) );
                  put( "=" );
                  put( node_value( Item(AS, I) ) );
               end loop;
             end;
             put_line( ">" );
        when attribute_node =>                          -- An attribute
             put_line( node_name( n ) & "=" & node_value( n ) );
        when others =>                                  -- Other
             put_line( "Unknown tree node type" );
        end case;
     end if;
   end dump;

   data : File_Input;
   r    : Tree_Reader;
   d    : Document;
   document_root : element;

begin
   Open( "data.xml", data );
   Parse( r, data );
   d := get_tree( r );
   document_root := get_element( d );
   dump( document_root );
   free( r );
   Close( data );
end dom_first_node;

For my coder.rss file, this program displays:

<rss version=2.0>

The dom-core-nodes.ads package includes navigation subprograms to move around the document and examine nodes (XML items). first_child returns the first child nested in an XML element. next_sibling moves to the next adjacent item. For a complete example, David Botton has an example program that walks an XML document using DOM at his AdaPower web site.

18.3 General Purpose Libraries

18.3.1 AdaCL

AdaCL is a library for writing small day-to-day programs normally handled by scripting languages. It includes CGI, garbage collection and improved string functions. The home page for the project is AdaCL.

An alternative to my BUSH project

18.3.2 SAL (Stephe's Ada Library)

SAL is another "standard template library"-type project providing general data structures, vectors and math functions with a heavy emphasis on generics. The library is located at http://www.toadmail.com/~ada_wizard/ada/sal.html.

Summary of packages:

SAL.Aux.Definite_Private_Items;
SAL.Aux.Definite_Private_Items;
SAL.Aux.Enum_Iterators;
SAL.Aux.Indefinite_Limited_Items;
SAL.Aux.Indefinite_Private_Items.Comparable;
SAL.Aux.Indefinite_Private_Items;
SAL.Aux.Sort_Indefinite_Items_Definite_Keys;
SAL.Config_Files.Boolean;
SAL.Config_Files.Duration;
SAL.Config_Files.Integer;
SAL.Config_Files.Time;
SAL.Config_Files;
SAL.Endianness;
SAL.Gen.Alg.Count;
SAL.Gen.Alg.Find_Binary;
SAL.Gen.Alg.Find_Linear;
SAL.Gen.Alg.Find_Linear.Sorted;
SAL.Gen.Alg.Process_All_Constant;
SAL.Gen.Alg;
SAL.Gen.Gray_Code;
SAL.Gen.Lists.Double.Iterators;
SAL.Gen.Lists.Double;
SAL.Gen.Lists;
SAL.Gen.Stacks.Bounded_Limited;
SAL.Gen.Stacks.Bounded_Nonlimited;
SAL.Gen.Stacks;
SAL.Gen.Word_Order_Convert.Scalar_32;
SAL.Gen.Word_Order_Convert.Scalar_64;
SAL.Gen.Word_Order_Convert;
SAL.Gen_Array_Image;
SAL.Gen_Array_Text_IO;
SAL.Gen_FIFO;
SAL.Gen_Math.Gen_Den_Hart;
SAL.Gen_Math.Gen_DOF_2;
SAL.Gen_Math.Gen_DOF_3.Gen_Image;
SAL.Gen_Math.Gen_DOF_3;
SAL.Gen_Math.Gen_DOF_6.Gen_Image;
SAL.Gen_Math.Gen_DOF_6.Gen_Integrator_Utils;
SAL.Gen_Math.Gen_DOF_6;
SAL.Gen_Math.Gen_Inverse_Array;
SAL.Gen_Math.Gen_Manipulator;
SAL.Gen_Math.Gen_Runge_Kutta_4th;
SAL.Gen_Math.Gen_Scalar;
SAL.Gen_Math.Gen_Square_Array.Gen_Inverse;
SAL.Gen_Math.Gen_Square_Array;
SAL.Gen_Math.Gen_Vector.Gen_Image;
SAL.Gen_Math.Gen_Vector;
SAL.Generic_Binary_Image;
SAL.Generic_Decimal_Image;
SAL.Generic_Float_Image;
SAL.Generic_Hex_Image;
SAL.Math_Double.Den_Hart;
SAL.Math_Double.DOF_2;
SAL.Math_Double.DOF_3.Cacv_Inverse;
SAL.Math_Double.DOF_3.Image;
SAL.Math_Double.DOF_3;
SAL.Math_Double.DOF_6.Image;
SAL.Math_Double.DOF_6.DC_Array_DCV_Inverse;
SAL.Math_Double.DOF_6.Integrator_Utils;
SAL.Math_Double.DOF_6;
SAL.Math_Double.Elementary;
SAL.Math_Double.Scalar;
SAL.Math_Double.Text_IO;
SAL.Math_Float.Den_Hart;
SAL.Math_Float.DOF_2;
SAL.Math_Float.DOF_3.Cacv_Inverse;
SAL.Math_Float.DOF_3.Image;
SAL.Math_Float.DOF_3;
SAL.Math_Float.DOF_6.DC_Array_DCV_Inverse;
SAL.Math_Float.DOF_6.Integrator_Utils;
SAL.Math_Float.DOF_6;
SAL.Math_Float.Elementary;
SAL.Math_Float.Polynomials;
SAL.Math_Float.Scalar;
SAL.Math_Float.Text_IO;
SAL.Math_Float_Kraft_HC_Nominal;
SAL.Math_Float_Manipulator_6;
SAL.Math_Float_Manipulator_7;
SAL.Math_Float_RRC_K1607_Nominal;
SAL.Memory_Streams.Address;
SAL.Memory_Streams.Bounded;
SAL.Memory_Streams;
SAL.Poly.Alg.Count;
SAL.Poly.Alg.Find_Linear;
SAL.Poly.Alg.Process_All_Constant;
SAL.Poly.Binary_Trees.Sorted.Iterators;
SAL.Poly.Binary_Trees.Sorted;
SAL.Poly.Binary_Trees;
SAL.Poly.Function_Tables.Monotonic.First_Order;
SAL.Poly.Function_Tables.Monotonic;
SAL.Poly.Function_Tables;
SAL.Poly.Lists.Double;
SAL.Poly.Lists.Single.Iterators;
SAL.Poly.Lists.Single;
SAL.Poly.Lists;
SAL.Poly.Stacks.Unbounded_Array;
SAL.Poly.Stacks;
SAL.Poly.Unbounded_Arrays;
SAL.Poly;
SAL.Simple.Function_Tables.Monotonic.First_Order;
SAL.Simple.Function_Tables.Monotonic;
SAL.Simple.Function_Tables;
SAL.Simple.Searches.Binary;
SAL.Simple.Searches;
SAL.Simple;
SAL.Time_Conversions;

<--Last Chapter

Table of Contents

Next Chapter-->