From scamel@elte.hu Sat Aug 12 20:53:15 2006
Date: Thu, 10 Aug 2006 12:54:41 +0200
From: Zolyomi Istvan <scamel@elte.hu>
To: gsd@elte.hu
Subject: Ernst level (Re: On family polymorphism)



----- Továbbított levél Scam <scamel@nuk.teteny.elte.hu> címről -----
   Dátum: Wed, 16 Jul 2003 11:19:27 +0200 (CEST)
  Feladó: Scam <scamel@nuk.teteny.elte.hu>
Viszontválasz cím: Scam <scamel@nuk.teteny.elte.hu>
Tartalom: Re: On family polymorphism (fwd)
 Címzett: scamel@elte.hu



---------- Forwarded message ----------
Date: Mon, 07 Jul 2003 18:14:10 +0200
From: Erik Ernst <eernst@daimi.au.dk>
To: Scam <scamel@nuk.teteny.elte.hu>
Subject: Re: On family polymorphism

Scam wrote:
> Dear Sir,
> 
> We are working at Eotvos Lorand University, Budapest (remember
> ECOOP2001). We enjoyed your presentation on family polymorphism where you
> introduced the problem and gave solution in gbeta. We considered the
> problem and we think we have found a solution in C++ using
> only standard template features. In essence our approach is similar to 
> your gbeta solution.

Hello Istvan and Zoltan,

Thank you for your interest!  I think your approach in fact is quite 
different from the gbeta solution I presented, but I'll have to go into 
some detail in order to clarify the differences and then I'll try to 
convince you that these differences are really important.  :-)

> [..]
 >
> #include <iostream>
> 
> struct Graph
> {
> 	struct Node;
> 	struct Edge { Node *n1, *n2; virtual ~Edge(); };
> 	struct Node { 
> 		virtual bool touches(Edge *e) {
> 			return (this == e->n1) || (this == e->n2);
> 		}
> 	};
> };

You are creating a class (or struct, but I don't want to distinguish 
between the two here) 'Graph' with two other classes 'Node' and 'Edge' 
nested inside.  Nesting in C++ and in gbeta are two very different 
things, because nested C++ classes are just global classes whose names 
are enclosed in the name space of the enclosing class.  This means that 
you can use a nested class just like a global class, except for the 
naming.  E.g.,

   struct Graph {
     struct Node {};
   };

   int main(int argc, char *argv[])
   {
     Graph::Node *n = new Graph::Node();
     return 0;
   }

In this main function we are creating an object which is an instance of 
the Graph::Node class, and there never existed an instance of the Graph 
class.

At first sight this might seem to be good, because it means that we can 
create instances of nested classes without worrying about the connection 
to instances of enclosing classes.  But it does in fact reduce the whole 
notion of nested classes to a naming issue:  Move the class Graph::Node 
to the global level, adjust usages inside Node of static properties of 
Graph (they would now need to be 'Graph::something' where they could 
formerly be plain 'something'), and the program is semantically the 
same, now without usage of nested classes.

The reason why it is not good is that it is such a superficial concept: 
It can be defined out of the language by means of a relatively simple 
transformation.  One tell-tale sign of the superficiality is that nested 
classes cannot access anything non-static in their enclosing classes.

One step up from this is the notion of inner classes is Java.  Here, an 
instance of an inner class is created in context of one particular 
enclosing object, which is an instance of the syntactically enclosing 
class.  For that inner instance it is possible to access not just static 
properties but also ordinary instance variables and methods of enclosing 
objects (now we really have enclosing *objects*, not just enclosing 
classes).  This cannot be removed from the language by a simple 
syntactical transformation; we would need to introduce a reference to 
the enclosing object in every inner class, to choose the right type for 
that reference, to make sure that it gets initialized during object 
construction and never changed thereafter, and to change expressions 
using features of enclosing objects such that they explicitly find the 
enclosing objects by following these references.

In return for the added complexity, we get encapsulation:  For each 
instance of an enclosing class, we create a universe of shared state for 
all the instances of the nested classes.  For example:

   class University {
     String name;
     class Student { .. }
   }

For every instance of University we may have any number of instances of 
Student, and all these students will share data from the enclosing 
University, e.g., its name.  It is similar to having static features 
that you can create as needed (ordinary features are non-shared, e.g., 
ordinary instance variables in a class; static features are shared, 
i.e., they are effectively just global variables and we will have 
exactly one of them per declaration; features in an enclosing object are 
shared among instances of the enclosed class, and we can have as many of 
them as we like).

This ability to have sharing in between 'one global thing' and 'one 
thing per object' is an important program organization principle, and it 
takes a non-trivial amount of work to simulate this facility by means of 
explicit pointers and distinguished-objects-holding-shared-state.

In Java, the nested classes are nested in classes, not in objects (so we 
only have one University.Student class, even if we have several 
instances of University).  This means that we cannot distinguish (in the 
type analysis, nor by means of dynamic checks) between 
MyUniversity.Student and SomeOtherUniversity.Student -- they are simply 
the same class.

In gbeta, nested classes are attributes of the enclosing _object_ not 
the enclosing class, and this means that we can create an unlimited 
number of Student classes by creating that many University classes.  In 
most languages you will have two write a larger program to obtain more 
classes, but in gbeta you can have as many classes as you wish with one 
fixed, small program.  Since type safety is largely about being able to 
see the difference between entities which are technically the same 
(e.g., a couple of thousand bits in memory) but conceptually different 
(e.g., a bitmap picture or a University object), it is my basic claim 
that you get a more expressive type system, and better type safety, by 
using a more fine grained notion of types.

Moreover, the fact that classes are features of enclosing objects, not 
of enclosing classes, makes it possible to have virtual classes.

A virtual class attribute is an attribute of an object whose value is a 
class, this class being determined by a process that is similar to 
method overriding.  This means that we do not (always) know at 
compile-time what class is the actual value of a given virtual class 
attribute (so we may know that MyUniversity.Student is a class, and that 
it is, e.g., a subclass of a global Student class, but we don't know 
exactly what class it is).

Where I used virtual classes in the gbeta example, you are using nested 
C++ classes.  This has a lot of consequences, especially with respect to 
the dynamic behavior of 'new' statements, and withe respect to the type 
analysis.

First, if you would use a 'new Node()' statement in your 'Graph' class,
the outcome would be an instance of Graph::Node, even if executed in 
context of an instance of OnOffGraph (since the enclosing object is by 
definition not considered in connection with C++ nested classes, this 
would be very hard to fix)..

This means that an inherited factory method in OnOffGraph would be 
useless: It would create Nodes (or Edges) from the simple Graph, and 
they would (1) be type-incompatible with OnOffGraph::Node/Edge and (2) 
even a dynamic cast would not help you because these objects really 
_are_ simple Graph::Node/Edge, not OnOffGraph.

So you cannot create objects and inherit the code.  This is a very 
serious restriction.

> Graph::Edge::~Edge() { delete n1; delete n2; }
> 
> struct OnOffGraph
> {
> 	struct Node;
> 	struct Edge : public Graph::Edge { bool enabled; };
> 	struct Node : public Graph::Node {  
> 		bool touches(Edge *e) {
> 			return e->enabled && Graph::Node::touches(e);
> 		}
> 	};
> };

Note that Graph::Node::touches and OnOffGraph::Node::touches are two 
distinct methods (you are _not_ doing virtual redefinition of the same 
method, you are just creating two methods which happen to have the same 
name, because they do not have type compatible argument lists).

Next, you don't even bother to inherit from Graph to OnOffGraph. :-)

I don't know whether that is an oversight or if it was by design, but it 
certainly makes it hard to access nodes and edges of different kinds of 
graphs polymorphically.  Basically, this means that you cannot do 
anything with Nodes and Edges without telling the compiler exactly what 
kinds of nodes and edges you are talking about.

Consider the same situation for simple classes (not families), e.g., 
Point and ColorPoint:

   class Point { int x,y; void foo() {} }

   class ColorPoint extends Point { color c; }

   Point p = new Point();
   ColorPoint cp = new ColorPoint();

Now we can do

   p.foo();
   cp.foo();

because a 'foo' method with the right argument lists is available in the 
statically known type of both 'p' and 'cp'.  However, you can also do 
the following:

   p = cp;
   p.foo();

This means that 'p' becomes a polymorphic pointer: It is of type Point, 
but it refers to an instance of a subclass (and subtype), ColorPoint.

This is a thing that you cannot do with class families in C++.  See below:

> template <class Family>
> void build (typename Family::Node *n, typename Family::Edge *e, bool b) {
> 	e->n1 = e->n2 = n;
> 	std::cout << (b == n->touches(e)) << std::endl;
> }
> 
> int main()
> {
> 	build<Graph>(new Graph::Node, new Graph::Edge, true);
> 	build<OnOffGraph>(new OnOffGraph::Node, new OnOffGraph::Edge, false);
> 	
> 	//build<OnOffGraph>(new OnOffGraph::Node, new Graph::Edge, true); //COMPILE
ERROR
> 	//build<OnOffGraph>(new Graph::Node, new OnOffGraph::Edge, false); //COMPILE
ERROR
> 
>         build<Graph>(new OnOffGraph::Node, new Graph::Edge, true);
> 	build<Graph>(new Graph::Node, new OnOffGraph::Edge, true);
>         build<Graph>(new OnOffGraph::Node, new OnOffGraph::Edge, true);

(These last three should actually be rejected if we had family 
polymorphism in C++; the first two are clearly wrong, because

> 
> 	return 0;
> }
> 

In these examples you are clearly indicating for every invocation of 
build what type arguments it receives; in fact, if a node is statically 
known as a Graph::Node but is actually an OnOffGraph::Node, then 
'touches' invoked on that node will be the one in Graph::Node (not the 
one in OnOffGraph::Node), and this means that 'enabled' will be ignored 
for that node.  This would most likely be an application level error for 
an OnOffGraph::Node, and it should have been a type error (which it 
won't be unless you have family polymorphism, you'll just get wrong 
semantics because the wrong method gets called).

In order to be able to work polymorphically on the nodes and edges, you 
would need the ability to write types which are looked up in objects. 
Only then (or at least that's the only solution I know) can you ensure 
that the types fit together (this kind of node and that kind of edge are 
a consistent family of classes), without becoming dependent on the exact 
classes at compile-time.  So you'd have something like this:

   final Graph myGraph = new OnOffGraph();
   myGraph.Node n = new myGraph.Node();
   myGraph Edge e = new myGraph.Edge(n,n);

   if (n.touches(e)) { ... }

and this would then be the right 'touches' method (the one that takes 
'enabled' into account) even though the statically known type of myGraph 
is only Graph, not OnOffGraph.

So, in summary, there are lots of differences, and they are pretty deep 
and sometimes subtle when it comes to understand why they do make a 
difference.

The differences are so complex that lots of people would never notice 
them (I know!), or they might assume that important differences must be 
simple differences, hence these differences are really not important.

I hope that you will consider my arguments for some time and end up in a 
mental state where the differences seem both important and 
understandable. :-)


   kind regards,

-- 
Erik Ernst                                 eernst@daimi.au.dk
Department of Computer Science, University of Aarhus, Denmark




----- Vége a továbbított üzenetnek -----