References


The Concept Behind Reference


In modern programming languages there is two important concepts defining
the behaviour of variables:

Scope: Defines the area in the program source where a certain identifier
binded to a memory location. Also defines the visibility: when the identifier
is valid to use and means the memorylocation we mentioned before.

Life: Defines the time-span under runtime when the memory location is safe
to store our values. After the duration the memory location is not safe to
access.

Most cases we define scope and life in the same declaration:

// 1. allocates memory for an int type variable (in the stack)
// 2. binds the name "i" to this memory area.

int i;


Sometimes we can define a memory location, without binding a name to it.

// 1. allocates memory for an int type variable (in the heap)
//    no name has been bound to this memory

new int;


And sometimes we can bind a new name to an existing memory location, whether
a name has been already binded to it or not.

// 2. binds the name "j" to memory area already called "i"

int &j = i;



Reference in earlier programming languages


There is a history of references in Simula-67 and Algol-68.

Algol-68 make a difference between a binding a name to a certain memory
location and to a variable. A reference is converted to the value of the
same type, this is called dereferenciate of the variable. This is the way
to distinguish between the left and right side of an assignment.

int i = 5;      // const int
ref int j := i; // int variable



A key diffeernce between pointer an reference: null pointer


    Base *bp = ...;

    // null, if dynamic_cast is invalid
    if ( Derived *dp = dynamic_cast<Derived*>(bp) )
    {
        // ...
    }


    Base &br = ...;

    // throws exception, if dynamic_cast is invalid
    try
    {
        Derived &dr = dynamic_cast<Derived&>(br);
        // ....
    }
    catch( bad_cast ) { ... }





Scope and Life Rules

A normal variable has scope and life, a reference has only scope.


void f()
{
    int i;  // start of scope and life i

    int &ir = i;    // start of scope ir, ir bound to i

    ir = 5; // ok

}   // end of life i, end of scope i and ir


And this can lead to problems:


void f()
{
    int *ip = new int;  // start of life *ip

    int &ir = *ip;  // start of scope ir, ir bound to *ip

    delete ip;  // end of life *ip here

    ir = 5; // bad

}   // end of scope ir




Parameter passing


Historically there are languages with parameter passing by address
(FORTRAN, PASCAL), and by value (PASCAL vith var keyword, C). The formal
just uses equivalent memory fields for the formal and the actual parameter,
but the latter copies the value of actual parameter into a local variable
in the area of the subprogram.

Parameter passing is originated in the initialization semantic in C++.
There is an important difference between initialization and assignment:


int i = 3;  // initialization with a value:
            // constructor semantic

int j = i;  // initialization with a variable of own type
            // copy constructor semantic

    j = i;  // assignment


Parameter passing follows initialization semantic:


int i = 5;
int j = 6;

void f1( int x, int y)
{
    // ...
}

void f2( int &x, int &y)
{
    // ...
}


f1(i,j);   ==> int x = i;  // creates local x and copies i to x
               int y = j;  // creates local y and copies j to y

f2(i,j);   ==> int &x = i; // binds x as a new name to existing i
               int &y = j; // binds y as a new name to existing j




Reference as Return type


Normally when a C/C++ function returns by value, the returning object
is copied from the function to the target.


int f1()    // returns by value
{
    int i;  // local variable with automatic storage
    //...
    return i;   // return by value
}


It is also possible to define a function with reference return type.
The meaning is to bind the function expression to the returning object.


int& f2()    // returns by reference
{
    int i;  // local variable with automatic storage
    //...
    return i;   // returns the reference ERROR!!!
}


The usage is different. If the returning object will not survive the
function call expression, than we must copy it. Otherwise, it is allowed
returning with a reference.


int  j = f1();  // ok: value of i has copied into j
int& j = f2();  // bad: no copy, j refers to invalid 



Typical usage of swap:


void swap( int &x, int &y)
{
    int tmp = x;
    x = y;
    y = tmp;
}

int i = 5;
int j = 6;

swap( i, j);
assert(i==6 && j==5);


But a reference should be bound only to a left value:


swap(i,7);          // syntax error: could not bind reference to 7
int &y = 7;

swap(i,3.14);       // syntax error: conversion creates temporary int,
int &y = int(3.14); // and reference could not bind to temporary

const int &z1 = 7;      // ok: const ref could bound to 7
const int &z2 = 3.14;   // ok: const ref could bound to temporary



Usage examples:

class date
{
public:
  // returns reference to returning the original object
  date& date::setYear(int y)  { _year = y;  return *this; }
  date& date::setMonth(int m) { _month = m; return *this; }
  date& date::setDay(int d)   { _day = d;   return *this; }

  // returns reference to returning the original (incremented) object
  date& date::operator++() { ++_day; return *this; }
  // returns value with copy of the temporary object before incrementation
  date  date::operator++(int) { date curr(*this); ++_day; return *curr; }

private:
  int _year;
  int _month;
  int _day;
};


date d;
++d.setYear(2011).setMonth(11).setDay(11);  // still a reference



Optimalization


One of the typical usage of the references is returning from function by
reference rather than by value. Returning reference often more effective
than copying. In some cases, however, copying is a must.


template <typename T>
class matrix
{
public:
    // ...
          T& operator()(int i, int j)       { return v[i*cols+j]; }
    const T& operator()(int i, int j) const { return v[i*cols+j]; }

    matrix& operator+=( const matrix& other)
    {
        for (int i = 0; i < cols*rows; ++i)
            v[i] += other.v[i];
    }
private:
    // ...
    T* v;
};

template <typename T>
matrix<T> operator+( const matrix<T>& left, const matrix<T>& right)
{
    matrix<T> result(left);
    result += right;

    return result;
}


Let us discover the return values one-by-one:


    T& operator()(int i, int j)  { return v[i*cols+j]; }

This function returns reference to the selected value, allowing clients
to modify the appropriate element of the matrix:


    matrix<double> dm(10,20);
    // ...
    dm(2,3) = 3.14;  // modify metrix element
    cout << dm(2,3); // copies matrix element
    dm(2,3) += 1.1;  // modify matrix element

    double& dr = dm(2,3);   // doesn't copies
    dr += 1.1;              // modify matrix element


A second version of operator() is presented to read constant matrices.


  const T& operator()(int i, int j) const { return v[i*cols+j]; }

This const memberfunction must return const reference, otherwise the
const-correctness would be leaking:


    const matrix<double> cm = dm;
    // ...
    cm(2,3) = 3.14;  // syntax error: returns with const reference
    cout << cm(2,3); // ok: copies (read) matrix element
    cm(2,3) += 1.1;  // syntax error: returns with const reference

    double& dr = cm(2,3);        // syntax error: const reference
                                 // does not convert to reference
    const double& cdr = cm(2,3); // ok, doesn't copy  


Most assignment operators defined with (non-const) reference type as
return value:


    matrix& operator=(const matrix& other)
    {
        if ( &other == this ) return *this;

        delete [] v;

        cols = other.cols;
        rows = other.rows;
        v = new T[cols*rows];

        for (int i = 0; i < cols*rows; ++i)
            v[i] = other.v[i];
    }

    matrix& operator+=( const matrix& other)
    {
        for (int i = 0; i < cols*rows; ++i)
            v[i] += other.v[i];
    }

This is more effective than a value returning type, and also avoiding
unneccessary restrictions:


    dm1 = dm2 = dm3;

    ++( dm1 += dm2 );



But there is an other solution:

    matrix& operator=(matrix other)
    {
        if ( &other == this ) return *this;

        cols = other.cols;
        rows = other.rows;
        T* t = v;
        v    = other.v;
        other.v = t;
    }




In the previous examples returning reference (or constant reference) was
safe, because the memory area which was referred survived the operation.
In the next addiotion returning a reference (or constant reference) would
be a major fault:


template <typename T>
matrix<T> operator+( const matrix<T>& left, const matrix<T>& right)
{
    matrix<T> result(left);     // local variable: automatic storage
    result += rigth;

    return result;      // result will disappear: must copy
}