Sunday, 8 December 2013

Using TypeTokens to retrieve generic parameters

Note: I've recovered this article from http://archive.org.
The original article is  presented here mostly untouched.



Super Type Tokens, also known by Type-safe Heterogenerous Container (or simply THC) is very well described in article by Neal Gafter, who explains how Super Type Tokens can be used in order to retrieve Run Time Type Information (RTTI) which would be erased otherwise, in a process known as type erasure.


Overview

There are circumstances where you'd like to have a class which behaves in different ways depending on generic parameters.

Contrary to what is widely accepted, type erasure can be avoided, which means that the callee has ability to know which generic parameters were employed during the call.

For example, imagine a List which would not rely on Java Collections Framework but on array of primitive types, because performance would be much better than JCF classes. So, you'd like to tell List that it should internally allocate an array of ints or an array of doubles, depending on a generic parameter you specify. Something like this:
List<Integer> myList = new PrimitiveList<Integer>()
... would be backed by int[] whilst
List<Double> myList = new PrimitiveList<Double>()
... would be backed by a double[].

The problem: Type Erasure

When Generics was implemented in Java5, it was decided that this feature would be offered by javac (the Java compiler) and only very minimum changes would be implemented in other components of the architecture. The big benefit of this decision is that the implementation of this feature was relatively simple and imposed only relatively minimum risk to existing Java applications, guaranteeing compatibility and stability of the Java ecosystem as a whole.

The problem with this implementation is that only javac knows about generic types you specified in your source code. This knowledge exists only at compilation time. At run time, your callee class has no clue what generic parameters were employed during the call. It happens because information relative to generic types is lost in a process known as type erasure, which basically means that javac does not put type information it has at compilation time in the bytecode, which ultimately means that your running application does not know anything about type information you've defined in your source code.

Confused? Well ... it basically means that the code below is not possible:
class MyClass<T> {
    private final T o;

    public MyClass() {
        this.o = new T();
    }
}
... because at run time MyClass does not actually know anything about the type of generic parameter T. In spite javac at compile time is able to perform syntax and semantics validation of your source code, at run time all information regarding generic type T is thoroughly lost.

Actually, the previous statement may not be 100% correct under certain circumstances. This is what we will see in the next topic.

How type erasure can be avoided

When Generics was implemented in Java5, the type system was reviewed and long story short, information about generic types can be made available at run time under specific circumstances. This is a very important concept to our discussion here:
Generic types are available to anonymous classes.

Anonymous classes

Let's debate a little bit what an anonymous class is and also what it is not.
Let's suppose we are instantiating MyClass like this:
MyClass<Double> myInstance = new MyClass<Double>() {
        //
        // Some code here
        //
        // In this block we are adding functionality to MyClass
        //
};

We are actually creating an instance of MyClass, but also we are adding some additional code to it, which is enclosed by curly braces.

What it means is that we are creating an object myInstance of an anonymous class of MyClass. It does not mean that MyClass is itself an anonymous class! MyClass is definitely not anonymous because you have declared it somewhere else, correct?

In the snippet of code above we are using something which is an extended thing made from our original definition of MyClass plus some more logic. This extended thing is actually the anonymous class we are talking about. In other words, the myInstance.class was never declared, which means it is anonymous.

How javac handles anonymous classes

When javac finds an anonymous class, it creates data structures in the bytecode (which are available at run time) which holds the actual generic type parameters employed during the call. So, we have another very important concept here:
The Java compiler employs type erasure when objects are instantiated
except when objects are instantiated from anonymous classes.

In other words, our aforementioned MyClass do not know any type information when it is called like this
MyClass<Double> myClass = new MyClass<Double>();
but it does know generic type information when it is called like this:
MyClass<Double> myClass = new MyClass<Double>() { /* something here */ };

In order to obtain generic type information at run time, you do have to change the call, in order to employ an anonymous class made of your original class and not your original class directly. In the next topic we will cover what needs to be done in your implementation of MyClass in order to retrieve generic type information, but a very specific concept is that it will not work unless you call an anonymous class of your defined class. So:
MyClass<Double> myClass1 = new MyClass<Double>();     // type erasure DOES happen
MyClass<Double> myClass2 = new MyClass<Double>() { }; // type erasure DOES NOT happen!

Notice that you only need to have an anonymous class. If you don't have any additional logic to be added if you don't need anything additional. Like you see when object myClass2 was created, there's an anonymous block which is absolutely empty, in this example.

Classical solution

Let's review what we are interested here: we are interested on generic types, which are types. Observe that types are ultimately class definitions. So, we would like to give our class MyClass<T> the ability to know that its T generic parameter is actually a T.class.

In our classical solution described here it can be done very easily simply passing what we need during the call. This is something like this:
MyClass<Double> myClass = new MyClass<Double>(Double.class);

Observe that this is not a very good solution because you have to tell Double three times: (1) when you define the type, (2) when you pass the generic parameter and (3) when you pass the formal parameter Double.class. It looks too verbose and too repetitive, isn't it?

Anyway, this is what the great majority of developers do. They simply tell that Double is generic parameter and then they tell Double.class just after as a formal parameter during the call. In spite it works, the code does not look optimal and it even may lead to bugs later when your application becomes bigger and you start to refactor things, etc.

More flexible solution

We already visited a classical solution for the problem of type erasure and we had already seen how an anonymous call can be done. Now we need to understand how generic types can be retrieved at run time without having to pass Double so many times as we did in our classical solution.

Going straight to the point, let's define an skeleton of our MyClass which does the job we need. Joining some ideas from the classical solution and using some incantation offered by a class called TypeTokenTree. Below we explain the general concept:
import org.jquantlib.lang.reflect.TypeTokenTree;

public class MyClass<T> {

    private final Class<?> typeT;

    public MyClass(final Class<?> typeT) {
        this.typeT = typeT;
        init();
    }

    public MyClass() {
        this.typeT = new TypeTokenTree(this.getClass()).getElement(0);
        init();
    }

    private init() {
        // perform initializations here
    }
}

The code above allows you to call MyClass employing 2 different strategies:
MyClass<Double> myClass1 = new MyClass<Double>(Double.class); // classical solution
MyClass<Double> myClass2 = new MyClass<Double>() { };         // only sorcerers do this

Notice that object myClass1 employs the classical solution we described, which is what the great majority of developers do. The object myClass2 was created using the incantation explained in this article and we will explain it better below.

Digging the solution

Class TypeTokenTree is a helper class which returns the Class of the n-th generic parameter. In the line
this.typeT = new TypeTokenTree(this.getClass()).getElement(0);

We are building an instance of TypeTokenTree, passing the actual class of the current instance and asking for the 0-th generic type parameter.

Please observe what we've written in bold: the actual class of the current instance may be or may not be MyClass. Got it? Observe that the actual class of the current instance will not be MyClass if you employed an anonymous call. In this case, i.e: when you have an anonymous call, javac generates code which keeps generic type information available in the bytecode. Notice that:
TypeTokenTree fails when a non-anonymous call is done!

This is OK. Actually, there's no way to be anything different from that!. It's application's responsibility to recover from such situation.

In the references section below you can find links to class TypeTokenTree and another class it depends on: TypeToken. These files are implemented as part of JQuantLib and contain code which is specific to JQuantLib and may not be convenient for everyone. For this reason, we can see below modified versions of these classes which aims to be independent of JQuantLib and aims to explain in detail how the aforementioned incantation works.

First of all, you need to have a look at method getGenericSuperclass from the JDK. This method is basically the root of the incantation and it basically traverses data structures created in the bytecode by javac. These data structures provide type information regarding the generic types you employed. In general, getGenericSuperclass returns null, which means that the current instance belongs to a non-anonymous class. In the rare circumstances you employ anonymous classes, getGenericSuperclass will return something different of null. And this is how we do this magic.

When getGenericSuperclass does not return null, you have opportunity to traverse the data structure javac created in the bytecode and you can discover what was available at compile time (finally!), effectively getting rid of type erasure.
static public Type getType(final Class<?> klass, final int pos) {
    // obtain anonymous, if any, class for 'this' instance
    final Type superclass = klass.getGenericSuperclass();

    // test if an anonymous class was employed during the call
    if ( !(superclass instanceof Class) ) {
        throw new RuntimeException("This instance should belong to an anonymous class");
    }

    // obtain RTTI of all generic parameters
    final Type[] types = ((ParameterizedType) superclass).getActualTypeArguments();

    // test if enough generic parameters were passed
    if ( pos < types.length ) {
        throw RuntimeException(String.format(
           "Could not find generic parameter %d because only %d parameters were passed",
              pos, types.length));
    }

    // return the type descriptor of the requested generic parameter
    return types[pos];
}

Pros and cons

The big benefit of employing Type Tokens is that the code becomes less redundant, I mean:
MyClass<Double> myClass = new MyClass<Double>() { };
... is absolutely enough. You don't need anything like this:
MyClass<Double> myClass = new MyClass<Double>(Double.class);
On the other hand, the code also becomes obcure, because failing to remember to add the anonymous block will end up on an exception thrown by class TypeToken.
MyClass<Double> myClass = new MyClass<Double>() { }; // succeeds
MyClass<Double> myClass = new MyClass<Double>();     // TypeTokenTree throws an Exception

The point is: this technique is not widely advertised and most developers never heard that this could be done. If you are sharing your code with your peers, contributors or clients, chances are that you will have to spend some time explaining the magic the code does. In general, developers forget to make the call properly, which leads to failures at runtime, as just explained above.

There's also a small performance penalty imposed when TypeToken is called, once this information may be available at compile time and javac can simply write it down straight away when you call
MyClass<Double> myClass = new MyClass<Double>(Double.class);

Test Cases

OK. Now you visited the theory, you'd like to see how this thing really works. Below you can find some test cases which exercise classes TypeToken and TypeTokenTree'. These test cases cover some varied scenarios and they should be enough to illustrate how the techniques explained here can be used in the real world.


References





If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website.

Thanks

Richard Gomes 20:16, 3 January 2011 (GMT) [Date of the original article ]

5 comments:

  1. Should the examples instantiating MyClass be passing in the string parameter for both the classical and the anonymous class approach?

    ReplyDelete
  2. Hi :) I've removed the String argument you mentioned, since it was not needed in the context of the explanation. It was there only occupying space and potentially causing confusion.
    Thanks for your contribution! :)

    ReplyDelete
  3. Thanks! This was a very interesting post.

    ReplyDelete
  4. That's very interesting! However, I am curious; should we worry about performance for cases when we need a lot of instantiations of these types of classes?
    I mean; is "new TypeTokenTree(this.getClass()).getElement(0)" costly?
    The way your article is written makes it really easy to understand, congrats!

    ReplyDelete
  5. Hello Bob. I've never ran performance tests and I'm not able even to guess how much this feature would cost. If you are going go create a large number of instances employing this technique, it's advisable to do some performance tests.

    There's also the point I've already raised about users not being able to understand the technique explained here, because it's obscure and definitely not well known.

    Thanks for your comments. Cheers :)

    ReplyDelete