From scamel@elte.hu Sat Aug 12 20:53:15 2006 Date: Thu, 10 Aug 2006 12:54:41 +0200 From: Zolyomi Istvan To: gsd@elte.hu Subject: Ernst level (Re: On family polymorphism) ----- Továbbított levél Scam címről ----- Dátum: Wed, 16 Jul 2003 11:19:27 +0200 (CEST) Feladó: Scam Viszontválasz cím: Scam Tartalom: Re: On family polymorphism (fwd) Címzett: scamel@elte.hu ---------- Forwarded message ---------- Date: Mon, 07 Jul 2003 18:14:10 +0200 From: Erik Ernst To: Scam Subject: Re: On family polymorphism Scam wrote: > Dear Sir, > > We are working at Eotvos Lorand University, Budapest (remember > ECOOP2001). We enjoyed your presentation on family polymorphism where you > introduced the problem and gave solution in gbeta. We considered the > problem and we think we have found a solution in C++ using > only standard template features. In essence our approach is similar to > your gbeta solution. Hello Istvan and Zoltan, Thank you for your interest! I think your approach in fact is quite different from the gbeta solution I presented, but I'll have to go into some detail in order to clarify the differences and then I'll try to convince you that these differences are really important. :-) > [..] > > #include > > struct Graph > { > struct Node; > struct Edge { Node *n1, *n2; virtual ~Edge(); }; > struct Node { > virtual bool touches(Edge *e) { > return (this == e->n1) || (this == e->n2); > } > }; > }; You are creating a class (or struct, but I don't want to distinguish between the two here) 'Graph' with two other classes 'Node' and 'Edge' nested inside. Nesting in C++ and in gbeta are two very different things, because nested C++ classes are just global classes whose names are enclosed in the name space of the enclosing class. This means that you can use a nested class just like a global class, except for the naming. E.g., struct Graph { struct Node {}; }; int main(int argc, char *argv[]) { Graph::Node *n = new Graph::Node(); return 0; } In this main function we are creating an object which is an instance of the Graph::Node class, and there never existed an instance of the Graph class. At first sight this might seem to be good, because it means that we can create instances of nested classes without worrying about the connection to instances of enclosing classes. But it does in fact reduce the whole notion of nested classes to a naming issue: Move the class Graph::Node to the global level, adjust usages inside Node of static properties of Graph (they would now need to be 'Graph::something' where they could formerly be plain 'something'), and the program is semantically the same, now without usage of nested classes. The reason why it is not good is that it is such a superficial concept: It can be defined out of the language by means of a relatively simple transformation. One tell-tale sign of the superficiality is that nested classes cannot access anything non-static in their enclosing classes. One step up from this is the notion of inner classes is Java. Here, an instance of an inner class is created in context of one particular enclosing object, which is an instance of the syntactically enclosing class. For that inner instance it is possible to access not just static properties but also ordinary instance variables and methods of enclosing objects (now we really have enclosing *objects*, not just enclosing classes). This cannot be removed from the language by a simple syntactical transformation; we would need to introduce a reference to the enclosing object in every inner class, to choose the right type for that reference, to make sure that it gets initialized during object construction and never changed thereafter, and to change expressions using features of enclosing objects such that they explicitly find the enclosing objects by following these references. In return for the added complexity, we get encapsulation: For each instance of an enclosing class, we create a universe of shared state for all the instances of the nested classes. For example: class University { String name; class Student { .. } } For every instance of University we may have any number of instances of Student, and all these students will share data from the enclosing University, e.g., its name. It is similar to having static features that you can create as needed (ordinary features are non-shared, e.g., ordinary instance variables in a class; static features are shared, i.e., they are effectively just global variables and we will have exactly one of them per declaration; features in an enclosing object are shared among instances of the enclosed class, and we can have as many of them as we like). This ability to have sharing in between 'one global thing' and 'one thing per object' is an important program organization principle, and it takes a non-trivial amount of work to simulate this facility by means of explicit pointers and distinguished-objects-holding-shared-state. In Java, the nested classes are nested in classes, not in objects (so we only have one University.Student class, even if we have several instances of University). This means that we cannot distinguish (in the type analysis, nor by means of dynamic checks) between MyUniversity.Student and SomeOtherUniversity.Student -- they are simply the same class. In gbeta, nested classes are attributes of the enclosing _object_ not the enclosing class, and this means that we can create an unlimited number of Student classes by creating that many University classes. In most languages you will have two write a larger program to obtain more classes, but in gbeta you can have as many classes as you wish with one fixed, small program. Since type safety is largely about being able to see the difference between entities which are technically the same (e.g., a couple of thousand bits in memory) but conceptually different (e.g., a bitmap picture or a University object), it is my basic claim that you get a more expressive type system, and better type safety, by using a more fine grained notion of types. Moreover, the fact that classes are features of enclosing objects, not of enclosing classes, makes it possible to have virtual classes. A virtual class attribute is an attribute of an object whose value is a class, this class being determined by a process that is similar to method overriding. This means that we do not (always) know at compile-time what class is the actual value of a given virtual class attribute (so we may know that MyUniversity.Student is a class, and that it is, e.g., a subclass of a global Student class, but we don't know exactly what class it is). Where I used virtual classes in the gbeta example, you are using nested C++ classes. This has a lot of consequences, especially with respect to the dynamic behavior of 'new' statements, and withe respect to the type analysis. First, if you would use a 'new Node()' statement in your 'Graph' class, the outcome would be an instance of Graph::Node, even if executed in context of an instance of OnOffGraph (since the enclosing object is by definition not considered in connection with C++ nested classes, this would be very hard to fix).. This means that an inherited factory method in OnOffGraph would be useless: It would create Nodes (or Edges) from the simple Graph, and they would (1) be type-incompatible with OnOffGraph::Node/Edge and (2) even a dynamic cast would not help you because these objects really _are_ simple Graph::Node/Edge, not OnOffGraph. So you cannot create objects and inherit the code. This is a very serious restriction. > Graph::Edge::~Edge() { delete n1; delete n2; } > > struct OnOffGraph > { > struct Node; > struct Edge : public Graph::Edge { bool enabled; }; > struct Node : public Graph::Node { > bool touches(Edge *e) { > return e->enabled && Graph::Node::touches(e); > } > }; > }; Note that Graph::Node::touches and OnOffGraph::Node::touches are two distinct methods (you are _not_ doing virtual redefinition of the same method, you are just creating two methods which happen to have the same name, because they do not have type compatible argument lists). Next, you don't even bother to inherit from Graph to OnOffGraph. :-) I don't know whether that is an oversight or if it was by design, but it certainly makes it hard to access nodes and edges of different kinds of graphs polymorphically. Basically, this means that you cannot do anything with Nodes and Edges without telling the compiler exactly what kinds of nodes and edges you are talking about. Consider the same situation for simple classes (not families), e.g., Point and ColorPoint: class Point { int x,y; void foo() {} } class ColorPoint extends Point { color c; } Point p = new Point(); ColorPoint cp = new ColorPoint(); Now we can do p.foo(); cp.foo(); because a 'foo' method with the right argument lists is available in the statically known type of both 'p' and 'cp'. However, you can also do the following: p = cp; p.foo(); This means that 'p' becomes a polymorphic pointer: It is of type Point, but it refers to an instance of a subclass (and subtype), ColorPoint. This is a thing that you cannot do with class families in C++. See below: > template > void build (typename Family::Node *n, typename Family::Edge *e, bool b) { > e->n1 = e->n2 = n; > std::cout << (b == n->touches(e)) << std::endl; > } > > int main() > { > build(new Graph::Node, new Graph::Edge, true); > build(new OnOffGraph::Node, new OnOffGraph::Edge, false); > > //build(new OnOffGraph::Node, new Graph::Edge, true); //COMPILE ERROR > //build(new Graph::Node, new OnOffGraph::Edge, false); //COMPILE ERROR > > build(new OnOffGraph::Node, new Graph::Edge, true); > build(new Graph::Node, new OnOffGraph::Edge, true); > build(new OnOffGraph::Node, new OnOffGraph::Edge, true); (These last three should actually be rejected if we had family polymorphism in C++; the first two are clearly wrong, because > > return 0; > } > In these examples you are clearly indicating for every invocation of build what type arguments it receives; in fact, if a node is statically known as a Graph::Node but is actually an OnOffGraph::Node, then 'touches' invoked on that node will be the one in Graph::Node (not the one in OnOffGraph::Node), and this means that 'enabled' will be ignored for that node. This would most likely be an application level error for an OnOffGraph::Node, and it should have been a type error (which it won't be unless you have family polymorphism, you'll just get wrong semantics because the wrong method gets called). In order to be able to work polymorphically on the nodes and edges, you would need the ability to write types which are looked up in objects. Only then (or at least that's the only solution I know) can you ensure that the types fit together (this kind of node and that kind of edge are a consistent family of classes), without becoming dependent on the exact classes at compile-time. So you'd have something like this: final Graph myGraph = new OnOffGraph(); myGraph.Node n = new myGraph.Node(); myGraph Edge e = new myGraph.Edge(n,n); if (n.touches(e)) { ... } and this would then be the right 'touches' method (the one that takes 'enabled' into account) even though the statically known type of myGraph is only Graph, not OnOffGraph. So, in summary, there are lots of differences, and they are pretty deep and sometimes subtle when it comes to understand why they do make a difference. The differences are so complex that lots of people would never notice them (I know!), or they might assume that important differences must be simple differences, hence these differences are really not important. I hope that you will consider my arguments for some time and end up in a mental state where the differences seem both important and understandable. :-) kind regards, -- Erik Ernst eernst@daimi.au.dk Department of Computer Science, University of Aarhus, Denmark ----- Vége a továbbított üzenetnek -----