Friday, 4 April 2014

IT Consulting Business


A chronicle about IT consulting business, based on real life, sort of.


Let me present you a story. I hope you enjoy.

First of all, lets create a mental picture of a fridge which rockets itself to the Moon.

Our sales people proposed that apparatus to a client... and the client bought!
We don't have such apparatus yet ... and nobody knows how to do that ...  but we need to deliver that in 6 months, anyway.

OK. This is what we sold to a client. So, lets make that happen!

OK. Lets hire a bunch of very well paid crazy techie guys who must accomplish the difficult task of creating such apparatus in a matter of months. Well, despite very capable and working 12-15 hours a day, these crazy techie guys failed to create such apparatus. Not really a surprise, to be honest. Anyway, despite of the failure...  they were able to create something which looks like a rocket, something which is more or less rectangular and ... better than anyone expected: it is painted in white!

Eventually such apparatus is barely capable of reaching the Moon... which is good!

Our sales guys are satisfied enough and more surprising than that: the client is satisfied too! In particular, the client loved the colour it is painted.

OK. Supplier (us) and client agrees that such apparatus must regularly make travels to the Moon.


Initially things did not work very well, so we've
decided to kept 3 of those very well paid crazy techie guys. They fixed lots of things, fortunately!  After 3 or 4 more months the apparatus works much, much better. Only 5 passengers died and we covered all funeral costs, obviously. I think we are just lucky! I guess we can now get rid of those 3 very well paid techie guys.

And that's it. Once the product is delivered the supplier (us) is only interested on getting the annual support fees paid, which must obviously cover all maintenance costs and obviously must produce maximum profits. Profit is all that matters, really.

The supplier (us) is not interested anymore to make such apparatus to be a real fridge which rockets itself to the Moon because... well... the client accepted the apparatus the way it is, whatever it is and whatever you call it. All that matters is that they need to keep one low paid techie guy to renew the white paint every time it faints.

In order to increase profits a bit more, someone had the bright idea of replacing the last remaining techie guy by a remote worker, who works from home.


This remote worker apparently feels plenty happy of being able of working from home, very close to his own fridge, which never rocketed itself to the Moon.

And this is the end of the story.

Friday, 21 March 2014

Configuring Postfix for relaying on Debian

This article explains how Postfix can be installed and configured to route emails to an external SMTP server.
 

Configuration


Run the script below as root.

#!/bin/bash

apt-get install postfix sasl2-bin bsd-mailx -y

##################################
# Choose:
#   * 'Satelite system'
#   * leave relayhost blank
##################################

cd /etc/postfix

cp -p main.cf main.cf.ORIGINAL

cat << EOD >> main.cf

smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_mechanism_filter = plain, login
smtp_sasl_security_options = noanonymous
EOD

# Substitute server, username and password below by your own settings
SERVER=smtpout.secureserver.net
USERNAME=admin@jquantlib.org
PASSWORD=secret

cat << EOD > sasl_passwd
${SERVER} ${USERNAME}:${PASSWORD}
EOD

chmod 400 sasl_passwd

postmap /etc/postfix/sasl_passwd

/etc/init.d/postfix restart

Testing your configuration


As a regular user, try something like this:

$ echo "It works!" | mailx -s test me@mydomain.com


Credits: Setup postfix to relay outbound mail using sasl

If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website.

Friday, 28 February 2014

Recovering from HTTP errors using URL Handlers

This article shows how URL handers, defined by urllib2, can be employed in practice in order to circumvent troubles we usually find when we write robots for collecting information from the Internet.

First things first (and usually a source of confusion): There are two sister libraries in Python which address retrieval of information from URLs; they are: urllib and urllib2. Conceptually, urllib2 works as a derived class of urllib. Just conceptually, because the actual implementation does not employ classes as a conventional object oriented paradigm would dictate.

If you are seeking detailed documentation about these libraries, I'm afraid to inform that your only choice is spending a couple of hours studying the source code of urllib.py and urllib2.py.

Setting an User Agent


OK. Now that you have the full documentation at hand, we can start. The first thing our robot needs to do is hiding its presence from the server side. One simple measure is employing an innocent user agent. We need to define a class derived from urllib2.BaseHandler which is responsible for setting the user agent before a request is sent to the server side. This is shown below:

import urllib2

class UserAgentProcessor(urllib2.BaseHandler):
    """A handler to add a custom UA string to urllib2 requests
    """
    def __init__(self, uastring):
        self.handler_order = 100
        self.ua = uastring

    def http_request(self, request):
        request.add_header("User-Agent", self.ua)
        return request

    https_request = http_request


(credits: This code was shamelessly copied from this article by Andrew Rowls)

Handling HTTP ERROR 404 (Not Found)


There are other things we need to do, such as throttling our requests, otherwise the server side will easily guess that there's a robot on our side sending dozens of requests per second. But throttling is a subject that I'm not going to cover here. You can later create your throttling handler, after you get better acquainted with some techniques covered in this article.

Some webservers are really busy, which may cause failures to our requests. Other webservers deliberatly reject requests given certain circumstances, for example: the server side may detect that we are sending dozens of requests per second and may decide to punish us for 10 minutes. Again we are back to the subject of throttling, which we are not going to cover here. But let's address this sort of issue partially, which may be of practical use in a majority of situations.

Let's say the webserver responds HTTP ERROR 404 (Not Found) eventually (or even regularly), even when the resource is existing in reality. We just need to be a little skeptic and send another request after waiting a couple of seconds. Eventually we need to be far more skeptic (or a little stubborn, if you will) and send several additional requests, before we become sure enough that the resource is actually and truly non-existent.

What we need to do is basically stamp requests so that we will have means to determine whether a request needs to be sent again to the server side, eventually waiting some time before that. Also, requests to different webservers may require different parameters for number of retries and for the delay to be employed. See below how we implemented this things:


import urllib2

class HTTPNotFoundHandler(urllib2.BaseHandler):
    """A handler which retries access to resources when 404 (NotFound) is received
    """

    handler_order = 600 # before HTTPDigestAuthHandler and ProxyDigestAuthHandler

    def __init__(self, retries=5, delay=2):
        self.retries = int(retries)
        self.delay   = float(delay)
        assert(self.retries >= 1)
        assert(self.delay >= 0.0)

    def http_request(self, req):
        if hasattr(req, 'headers') and 'Error_404' in req.headers:
            Error_404 = req.headers['Error_404']
            assert(int(Error_404['retries']) >= 1)
            assert(float(Error_404['delay']) >= 0.0)
        return req

    def http_error_404(self, req, fp, code, msg, headers):
        if hasattr(req, 'headers') and 'Error_404' in req.headers:
            Error_404 = req.headers['Error_404']
        else:
            Error_404 = dict()
            Error_404['delay']   = self.delay
            Error_404['retries'] = self.retries

        count   = Error_404['count'] if 'count' in Error_404 else 1
        retries = Error_404['retries']
        delay   = Error_404['delay']
        if count == retries:
            raise urllib2.HTTPError(req.get_full_url(),
                                    code,
                                    msg,
                                    headers,
                                    fp)
        else:
            # Don't close the fp until we are sure that

            # we won't use it with HTTPError.
            fp.read()
            fp.close()
            # sleep a little while
            from time import sleep
            sleep(delay)
            # send another request
            Error_404['count'] = count + 1
            req.add_header('Error_404', Error_404)
            return self.parent.open(req)

    https_error_404 = http_error_404


Now, let's add two utility functions:

def install_opener(opener=None):
    import urllib2
    if opener is None:
        urllib2.install_opener(build_opener())
    else:
        urllib2.install_opener(opener)
    return urllib2

def build_opener(

                  user_agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0',
                  http_404_retries=3,
                  http_404_delay=2.0):
    return urllib2.build_opener(
        UserAgentProcessor(user_agent),
        HTTPNotFoundHandler(http_404_retries, http_404_delay) )


Just put all the code you see in this up to this point into a file, say: api.py.

Test cases


Now, let's  create some test cases for it, using pytest. First thing consists on creating the conftest.py file, like shown below:

from __future__ import print_function

from pytest import fixture

@fixture
def opener():
    from mypackage.api import api
    return api.build_opener()

@fixture
def urllib2(opener):
    from mypackage.api import api
    return api.install_opener(opener)


If you are not acquainted to pytest, a very brief explanation of the code above is that we are defining functions opener and urllib2 which we will later employ as parameters to other functions. In a nutshell, pytest replaces the parameter by a call to the special functions (marked by @fixture) we have defined.

Now, let's create a file for test cases called test_urllib.py, like shown below:

import pytest

class TestOpeners(object):

    def xtest_build_opener(self, opener):
        pass

    def xtest_existing(self, urllib2):
        url = 'http://google.com'
        f = urllib2.urlopen(url)
        assert(f.code == 200)

    def xtest_existing_but_faulty(self, urllib2):
        url = 'http://biz.yahoo.com/p/'
        f = urllib2.urlopen(url)
        assert(f.code == 200)

    def xtest_non_existing(self, urllib2):
        from urllib2 import HTTPError
        url = 'http://google.com/this_url_does_not_exist'
        with pytest.raises(HTTPError):
            f = urllib2.urlopen(url)

    def test_non_existing_with_header(self, urllib2):
        from urllib2 import HTTPError
        url = 'http://google.com/this_url_does_not_exist'
        req = urllib2.Request(url, headers = {
            'Error_404'  : { 'retries': 5,
                             'delay'  : 2.0 }})
        with pytest.raises(HTTPError):
            f = urllib2.urlopen(req)

    def test_wrong_header_retries_1(self, urllib2):
        from urllib2 import HTTPError
        url = 'http://google.com'
        req = urllib2.Request(url, headers = {
            'Error_404' : { 'retries': 'rubbish',
                            'delay'  : 2.0 }})
        with pytest.raises(ValueError):
            f = urllib2.urlopen(req)

    def test_wrong_header_retries_2(self, urllib2):
        from urllib2 import HTTPError
        url = 'http://google.com/this_url_does_not_exist'
        req = urllib2.Request(url, headers = {
            'Error_404' : { 'retries': 0,
                            'delay'  : 2.0 }})
        with pytest.raises(AssertionError):
            f = urllib2.urlopen(req)


Conclusion

You can have better and more robust control of requests without even touching your application code by installing a custom opener to urllib2.

Thursday, 13 February 2014

Strong type checking in Python? What the hell is this !!???

This article describes a Python annotation which combines documentation with type checking in order to help Python developers to gain better understanding and control of the code, whilst allowing them to catch mistakes on the spot, as soon as they occur.

Being a Java developer previously but extradited to Python by my own choice, I sometimes feel some nostalgy from the old times, when the Java compiler used to tell me all sorts of stupidities I used to do.

In the Python world no one is stupid obviously, except probably me who many times find myself passing wrong types of arguments by accident or by pure stupidity, in case you accept the hypothesis that there's any difference between the two situations.

When you are coding your own stuff, chances are that you know very well what is going on. In general, you have the entire bloody API alive and kicking inside your head. But when you are learning some third party software, in particular large frameworks, chances are that your code is called by something you don't understand very well, which decides to pass arguments to your code which you do not have a clue what they are about.


Documentation

Documentation is a good way of sorting out this difficulty. Up-to-date documentation, in particular, is the sort of thing I feel extremely happy when I have chance to find one. My mood is being constantly crunched these days, if you understand what I mean.

Outdated documentation is not only useless but also undesirable. Possibly for this reason some (or many?) people prefer no documentation at all, since absence of information is better than misinformation, they defend.

It's very difficult to keep documentation up-to-date, unless you are forced somehow to do so. Maybe at gun point?


Strong type checking

I'm not in the quest of convincing anyone that strong type checking is good or useful or desirable.  Like everything in life, there are pros and cons.

On the other hand, I'd like to present a couple of benefits which keep strong type checking in my wishlist:

* I'd like to have the ability to stop the application as soon as a wrong type is received by a function or returned by a function to its caller. Stop early, catch mistakes easily, immediately, on spot.

* I'd like to identify and document argument types being passed by frameworks to my code, easily, quickly, effectively, without having to turn the Internet upside down every time I'm interested to learn what the hell argument x is about.


Introducing sphinx_typesafe

Doing a bit of research, I found an interesting library called IcanHasTypeCheck (or ICHTC for short), which I ended up rewriting almost from scratch during the last revision and I've renamed it to sphinx_typesafe.

Let me explain the idea:

In the docstring of a function or method, you employ Sphinx-style documentation patterns in order to tell types associated to variables.

If your documentation is pristine, the number of arguments in the documentation match the number of arguments in the function or method definition.

If your logic is pristine, the types of arguments you documented match the types of arguments actually passed to the function or method at runtime, or returned by the function or method to the caller, at runtime.

You just need to add an annotation @typesafe before the function or method, and sphinx_typesafe checks if the documentation matches the definition.

If you don't have a clue about the type of an argument, simply guess some unlikely type, say: None. Then run the application and sphinx_typesafe will interrupt the execution of it and report that the actual type does not match None. The next step is obviously substitute None by the actual type.


Benefits

A small example tells more than several paragraphs.
Imagine that you see some code like this:

    import math
   
    def d(p1, p2):
        x = p1.x - p2.x
        y = p1.y - p2.y
        return math.sqrt(x*x + y*y)



Imagine that you had type information about it, like this:


    import math
    from sphinx_typesafe import typesafe

    @typesafe
    def d(p1, p2):
        """
        :type p1: shapes.Point
        :type p2: shapes.Point
        :rtype  : float
        """
        x = p1.x - p2.x
        y = p1.y - p2.y
        return math.sqrt(x*x + y*y)


Now you are able to understand what this code is about, quickly!.
In particular, you are able to tell what it is the domain of types this code is intended to operate on.

When you run this code, if this function receives a shapes.Square instead of a shape.Point, it would stop immediately. Notice that, eventually, a shape.Square may have components x and y which would make the function return wrong results silently. Imagine your test cases catching this situation!

So, I hope I demonstrated the two benefits I was interested on.



Missing Features


Polymorphism

Sometimes I would like to tell that an argument can be a file but also a str. At the moment I can say that the argument can be types.NotImplementedType meaning "any type". But I would like something more precise, like this:

    :type f: [file, str]
   
This is not difficult to implement, actually, but we are not there yet.


Non intrusive

I would like to have a non intrusive way to turn on type checking and a very cheap way of turning off type checking, if possible without any code change.

Thinking more about use cases, I guess that type checking is very useful when you are developing and, in particular, when you are running your test suite. You are probably not interested on having the overhead of type checking on production code which was theoretically exhaustively tested.

Long story short, I would like to integrate sphinx_typesafe with pytest, so that an automatic decoration of functions and methods would happen automagically and without any code change.

If pytest finds a docstring which happens to contain a Sphinx-style type specification on it, @typesafe is applied to the function or method. That would be really nice! You could also run your code in production without type checking since type checking was never turned on in the first place.

The idea looks to be great, but my ignorance on pytest internals and my limited time prevents me of going ahead. Maybe in future!


Python3 support

The sources of sphinx_typesafe itself are ready for Python3, but sphinx_typesafe does not handle properly your sources written in Python3 yet. It's not difficult to implement, actually: it's just a matter of adjusting one function, but we are not there yet. Maybe you feel compelled to contribute?


More Information

https://pypi.python.org/pypi/sphinx_typesafe


Credits

Thank Klaas for inspiration and his IcanHasTypeCheck (or ICHTC for short).

Tuesday, 10 December 2013

Money in Brazil is Real

Nice! You are going to Brazil to see the World Cup or the Olympics or maybe just for enjoying 6,000km of sunny wonderful coast?

In this case you have heard that money in Brazil is Real. Yeah... this is the name of the currency: Real. Its symbol is BRL, which means Brazilian Real.

In Brazilian Portuguese (yes: in Brazil we speak Portuguese, not Spanish!), the word real has two meanings:

* like real in English
* like royal in English

So, now you know that Brazilians, despite far from monarchy for almost 2 centuries now, they have a currency which is royal. Yes, royal !

I know, I know... English speakers make jokes with Brazilian money, which is real, not imaginary, isn't it? .... LOL ... not a problem... Brazilians are easy going and we love jokes, any kind of joke, lots of them, politically correct or not, racially correct of not, sexually correct or not, religiously correct or not... because this is how the world is: with all sorts of things which make this world. So, regardless of your political orientation, sexual orientation, your race or religion or anything it might be... it's time for a joke.

By the way, prepare your mood for Brazilians, in particular in Rio. Cariocas (plural of Carioca: people who have born in Rio) are known for their quick thinking when there's a situation which might lead to a joke. You may miss a joke, I may miss a joke, but Cariocas never miss one. So, do not be offended if you suddenly become victim of a joke... better learn quickly how to make others victims of your jokes too. :)

Living in England I know that many jokes are not acceptable here. To be more realistic, Brits make lots of jokes too, about everything ... but only in the privacy of their homes, where they are sure that they will not be prosecuted for being naughty with someone else's race, religion or sexual orientation.

Things are different in Brazil. Bullying is national culture, all over the place. Or still national culture, to be more realistic. Unfortunately, British mood and life style is slowly contaminating Brazilians. Unfortunately, bullying is becoming something not acceptable anymore. Maybe globalization is responsible for this. This is pretty sad and absolutely unacceptable!

You may find the previous paragraph nasty. But be sure, it's your fault and only your fault. This is your mindset, which is shamefully narrow and only tuned to your own culture for decades, not being able of seeing anything outside your small island. When you go to another country, you have to adapt yourself to the other culture's mindset. It's that simple! :)

Well, anyway... despite that humour in Brazil is not as good as it was one decade ago, it's still very far for Brazilians a day when only jokes about the weather would be socially acceptable.

Sorry ... I couldn't resist to this joke... LOL

Sunday, 8 December 2013

Using TypeTokens to retrieve generic parameters

Note: I've recovered this article from http://archive.org.
The original article is  presented here mostly untouched.



Super Type Tokens, also known by Type-safe Heterogenerous Container (or simply THC) is very well described in article by Neal Gafter, who explains how Super Type Tokens can be used in order to retrieve Run Time Type Information (RTTI) which would be erased otherwise, in a process known as type erasure.


Overview

There are circumstances where you'd like to have a class which behaves in different ways depending on generic parameters.

Contrary to what is widely accepted, type erasure can be avoided, which means that the callee has ability to know which generic parameters were employed during the call.

For example, imagine a List which would not rely on Java Collections Framework but on array of primitive types, because performance would be much better than JCF classes. So, you'd like to tell List that it should internally allocate an array of ints or an array of doubles, depending on a generic parameter you specify. Something like this:
List<Integer> myList = new PrimitiveList<Integer>()
... would be backed by int[] whilst
List<Double> myList = new PrimitiveList<Double>()
... would be backed by a double[].

The problem: Type Erasure

When Generics was implemented in Java5, it was decided that this feature would be offered by javac (the Java compiler) and only very minimum changes would be implemented in other components of the architecture. The big benefit of this decision is that the implementation of this feature was relatively simple and imposed only relatively minimum risk to existing Java applications, guaranteeing compatibility and stability of the Java ecosystem as a whole.

The problem with this implementation is that only javac knows about generic types you specified in your source code. This knowledge exists only at compilation time. At run time, your callee class has no clue what generic parameters were employed during the call. It happens because information relative to generic types is lost in a process known as type erasure, which basically means that javac does not put type information it has at compilation time in the bytecode, which ultimately means that your running application does not know anything about type information you've defined in your source code.

Confused? Well ... it basically means that the code below is not possible:
class MyClass<T> {
    private final T o;

    public MyClass() {
        this.o = new T();
    }
}
... because at run time MyClass does not actually know anything about the type of generic parameter T. In spite javac at compile time is able to perform syntax and semantics validation of your source code, at run time all information regarding generic type T is thoroughly lost.

Actually, the previous statement may not be 100% correct under certain circumstances. This is what we will see in the next topic.

How type erasure can be avoided

When Generics was implemented in Java5, the type system was reviewed and long story short, information about generic types can be made available at run time under specific circumstances. This is a very important concept to our discussion here:
Generic types are available to anonymous classes.

Anonymous classes

Let's debate a little bit what an anonymous class is and also what it is not.
Let's suppose we are instantiating MyClass like this:
MyClass<Double> myInstance = new MyClass<Double>() {
        //
        // Some code here
        //
        // In this block we are adding functionality to MyClass
        //
};

We are actually creating an instance of MyClass, but also we are adding some additional code to it, which is enclosed by curly braces.

What it means is that we are creating an object myInstance of an anonymous class of MyClass. It does not mean that MyClass is itself an anonymous class! MyClass is definitely not anonymous because you have declared it somewhere else, correct?

In the snippet of code above we are using something which is an extended thing made from our original definition of MyClass plus some more logic. This extended thing is actually the anonymous class we are talking about. In other words, the myInstance.class was never declared, which means it is anonymous.

How javac handles anonymous classes

When javac finds an anonymous class, it creates data structures in the bytecode (which are available at run time) which holds the actual generic type parameters employed during the call. So, we have another very important concept here:
The Java compiler employs type erasure when objects are instantiated
except when objects are instantiated from anonymous classes.

In other words, our aforementioned MyClass do not know any type information when it is called like this
MyClass<Double> myClass = new MyClass<Double>();
but it does know generic type information when it is called like this:
MyClass<Double> myClass = new MyClass<Double>() { /* something here */ };

In order to obtain generic type information at run time, you do have to change the call, in order to employ an anonymous class made of your original class and not your original class directly. In the next topic we will cover what needs to be done in your implementation of MyClass in order to retrieve generic type information, but a very specific concept is that it will not work unless you call an anonymous class of your defined class. So:
MyClass<Double> myClass1 = new MyClass<Double>();     // type erasure DOES happen
MyClass<Double> myClass2 = new MyClass<Double>() { }; // type erasure DOES NOT happen!

Notice that you only need to have an anonymous class. If you don't have any additional logic to be added if you don't need anything additional. Like you see when object myClass2 was created, there's an anonymous block which is absolutely empty, in this example.

Classical solution

Let's review what we are interested here: we are interested on generic types, which are types. Observe that types are ultimately class definitions. So, we would like to give our class MyClass<T> the ability to know that its T generic parameter is actually a T.class.

In our classical solution described here it can be done very easily simply passing what we need during the call. This is something like this:
MyClass<Double> myClass = new MyClass<Double>(Double.class);

Observe that this is not a very good solution because you have to tell Double three times: (1) when you define the type, (2) when you pass the generic parameter and (3) when you pass the formal parameter Double.class. It looks too verbose and too repetitive, isn't it?

Anyway, this is what the great majority of developers do. They simply tell that Double is generic parameter and then they tell Double.class just after as a formal parameter during the call. In spite it works, the code does not look optimal and it even may lead to bugs later when your application becomes bigger and you start to refactor things, etc.

More flexible solution

We already visited a classical solution for the problem of type erasure and we had already seen how an anonymous call can be done. Now we need to understand how generic types can be retrieved at run time without having to pass Double so many times as we did in our classical solution.

Going straight to the point, let's define an skeleton of our MyClass which does the job we need. Joining some ideas from the classical solution and using some incantation offered by a class called TypeTokenTree. Below we explain the general concept:
import org.jquantlib.lang.reflect.TypeTokenTree;

public class MyClass<T> {

    private final Class<?> typeT;

    public MyClass(final Class<?> typeT) {
        this.typeT = typeT;
        init();
    }

    public MyClass() {
        this.typeT = new TypeTokenTree(this.getClass()).getElement(0);
        init();
    }

    private init() {
        // perform initializations here
    }
}

The code above allows you to call MyClass employing 2 different strategies:
MyClass<Double> myClass1 = new MyClass<Double>(Double.class); // classical solution
MyClass<Double> myClass2 = new MyClass<Double>() { };         // only sorcerers do this

Notice that object myClass1 employs the classical solution we described, which is what the great majority of developers do. The object myClass2 was created using the incantation explained in this article and we will explain it better below.

Digging the solution

Class TypeTokenTree is a helper class which returns the Class of the n-th generic parameter. In the line
this.typeT = new TypeTokenTree(this.getClass()).getElement(0);

We are building an instance of TypeTokenTree, passing the actual class of the current instance and asking for the 0-th generic type parameter.

Please observe what we've written in bold: the actual class of the current instance may be or may not be MyClass. Got it? Observe that the actual class of the current instance will not be MyClass if you employed an anonymous call. In this case, i.e: when you have an anonymous call, javac generates code which keeps generic type information available in the bytecode. Notice that:
TypeTokenTree fails when a non-anonymous call is done!

This is OK. Actually, there's no way to be anything different from that!. It's application's responsibility to recover from such situation.

In the references section below you can find links to class TypeTokenTree and another class it depends on: TypeToken. These files are implemented as part of JQuantLib and contain code which is specific to JQuantLib and may not be convenient for everyone. For this reason, we can see below modified versions of these classes which aims to be independent of JQuantLib and aims to explain in detail how the aforementioned incantation works.

First of all, you need to have a look at method getGenericSuperclass from the JDK. This method is basically the root of the incantation and it basically traverses data structures created in the bytecode by javac. These data structures provide type information regarding the generic types you employed. In general, getGenericSuperclass returns null, which means that the current instance belongs to a non-anonymous class. In the rare circumstances you employ anonymous classes, getGenericSuperclass will return something different of null. And this is how we do this magic.

When getGenericSuperclass does not return null, you have opportunity to traverse the data structure javac created in the bytecode and you can discover what was available at compile time (finally!), effectively getting rid of type erasure.
static public Type getType(final Class<?> klass, final int pos) {
    // obtain anonymous, if any, class for 'this' instance
    final Type superclass = klass.getGenericSuperclass();

    // test if an anonymous class was employed during the call
    if ( !(superclass instanceof Class) ) {
        throw new RuntimeException("This instance should belong to an anonymous class");
    }

    // obtain RTTI of all generic parameters
    final Type[] types = ((ParameterizedType) superclass).getActualTypeArguments();

    // test if enough generic parameters were passed
    if ( pos < types.length ) {
        throw RuntimeException(String.format(
           "Could not find generic parameter %d because only %d parameters were passed",
              pos, types.length));
    }

    // return the type descriptor of the requested generic parameter
    return types[pos];
}

Pros and cons

The big benefit of employing Type Tokens is that the code becomes less redundant, I mean:
MyClass<Double> myClass = new MyClass<Double>() { };
... is absolutely enough. You don't need anything like this:
MyClass<Double> myClass = new MyClass<Double>(Double.class);
On the other hand, the code also becomes obcure, because failing to remember to add the anonymous block will end up on an exception thrown by class TypeToken.
MyClass<Double> myClass = new MyClass<Double>() { }; // succeeds
MyClass<Double> myClass = new MyClass<Double>();     // TypeTokenTree throws an Exception

The point is: this technique is not widely advertised and most developers never heard that this could be done. If you are sharing your code with your peers, contributors or clients, chances are that you will have to spend some time explaining the magic the code does. In general, developers forget to make the call properly, which leads to failures at runtime, as just explained above.

There's also a small performance penalty imposed when TypeToken is called, once this information may be available at compile time and javac can simply write it down straight away when you call
MyClass<Double> myClass = new MyClass<Double>(Double.class);

Test Cases

OK. Now you visited the theory, you'd like to see how this thing really works. Below you can find some test cases which exercise classes TypeToken and TypeTokenTree'. These test cases cover some varied scenarios and they should be enough to illustrate how the techniques explained here can be used in the real world.


References





If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website.

Thanks

Richard Gomes 20:16, 3 January 2011 (GMT) [Date of the original article ]

Tuesday, 3 December 2013

Configuring 2 static IPs with fibre PPPoE with Eclipse Internet : MTU size issue


THIS IS A DRAFT *  THIS IS A DRAFT * THIS IS A DRAFT * THIS IS A DRAFT


This blog entry describes difficulties and solutions related to PPPoE, in particular issues related to MTU size and iptables configurations.



I've recently moved from BeThere to Eclipse Internet. It was a long marriage with Be, for 5 years, but I had to go away for technical reasons.

In a matter of a few days I've got a new Zyxel NBG4604 from Eclipse. The router is IPv6 capable, which was a surprise to me, since I've researched about the modem and I haven't found anything mentioning IPv6. Eclipse still do not offer IPv6, but you already have IPv6 even without knowing it, if you have static IPs.

IPv6 apart, I stumbled with a much simpler thing: an annoying issue which happens on certain websites, leading to sluggish performance. So, what's the point of migrating to fibre if navigation is seriously impacted?

But, what the issue is? and what causes it?


The sluggish fiber connection

The issue is that some websites fail to load properly into the browser. This is an example: github.com employs avatars from gravatar.com . It happens that github seems to "work fine", whilst gravatar "fails" to load. The browser keeps trying to load something from gravatar and stays there, trying and trying and the request is never completed. This jeopardizes navigation of github, which is a primary source of concern to me.

Long story short, the issue is related to PPPoE (PPP over Ethernet), which is basically the authentication layer which is employed by many ISPs, including Eclipse. Explaining in slightly deeper detail, it is necessary to adjust a parameter called MTU size when you have PPPoE due to technicalities you can find more details here.

I've opened a ticket with Eclipse Internet asking for the recommended MTU size, which they promptly responded. But there are some more details involved, as I explain below.

If I were using my Zyxel router like any regular end user, behind the fibre modem... I suppose I would not have any trouble. But I've decided to employ a Debian box as my main router / firewall. It basically means that I need to configure it properly, understand some technicalities I wouldn't care otherwise.


Path to solution

Long story short, I've configured these things:
  •  PPPoE
  •  /etc/network/interfaces with 2 static IPs (or multiple static IPs)
  •  MTU size
  •  iptables rules related to MTU size
I've connected the fibre modem to my NIC eth0 whilst the other NIC eth1 is connected to the LAN.

The NIC which faces the Internet, via the fibre modem has to acquire 2 IPs from the ISP. The first IP is acquired when the PPPoE layer authenticates and negotiates stuff with the ISP side. The additional IP address need to be configured after the first one, not requiring any special negotiation by the PPPoE layer.

OK. See below my /etc/networks/interfaces (with some fake addresses):

auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual
    address   192.168.2.2
    netmask   255.255.255.0

auto dsl-provider
iface dsl-provider inet ppp
    pre-up    /sbin/ifconfig eth0 up
    post-down /sbin/ifconfig eth0 down
    provider  dsl-provider
    address   82.111.111.111

    netmask   255.255.255.252
    post-up   sleep 7 ; \
              gw=$( /sbin/ifconfig ppp | \

                    head -2 | tail -1 | \
                    sed -E 's/(.*P-t-P:)([0-9.]+)( .*)/\2/' ) ; \
              echo "Define default gateway $gw" ; \
              /sbin/route add default gw $gw ppp0 ; \
              echo Bringing up ppp0:1 ; \
              /sbin/ifconfig ppp0:1 82.111.111.112 netmask 255.255.255.252; \
              echo Bringing up eth1 ; \
              /sbin/ifconfig eth1 up; \
              echo Disable IP forward for security reasons; \
              echo 0 > /proc/sys/net/ipv4/ip_forward
    pre-down  echo Bringing down eth1 ; \
              /sbin/ifconfig eth1 down ; \
              echo Bringing down ppp0:1 ; \
              /sbin/ifconfig ppp0:1   down


The important bits are:

* eth0 must be left without any IP configuration because it will be employed by PPPoE in order to talk with the fibre modem.
* eth1 can be configured with a LAN IP, but you should not bring it up until you define the default gateway, which will be some IP address on the ISP side.
* you really don't know what the default gateway is, until the moment the connection is stablished with your ISP because this IP can change and will probably change every time you disconnect and connect again.
* interface ppp0, despite not configured by you, will be configured for you when dsl-provider brings up.


More details

Make a backup copy of /etc/network/interfaces as it is presented in the section above. Observe that, when you install PPPoE, it will change your configuration. But I already told you how it should be. So, make a backup copy!

$ cp /etc/network/interfaces /etc/network/interfaces.BACKUP
$ sudo apt-get install pppoe pppoeconf -y

During the installation, pppoeconf runs and tries to find your fibre modem. Make sure you connected eth0 with your modem.

It will ask your username and password, required to authenticate against the ISP. Eclipse sent me a letter with this stuff, but it is also available in the Connection Manager page.

When PPPoE runs, it changes your /etc/networks/interfaces. Have a look at it and see what happened. If you are following this recipe the way I describe, you will see that we had already configured everything which is needed in our version of /etc/networks/interfaces. Simply restore the backup copy.

$ cp /etc/networks/interfaces.BACKUP /etc/networks/interfaces

Just a reminder: make sure you put your 2 (or more) IPs as mentioned in the Connection Manager page into your /etc/networks/interfaces, in interfaces ppp0:1, ppp0:2, .... as much as you have static IPs, remembering that your first IP must go to ppp0 itself.

In other words, I've put the line below somewhere into my /etc/networks/interfaces:

  /sbin/ifconfig ppp0:1 82.111.111.112 netmask 255.255.255.252

If you have more that 2 static IPs, you will be interested on configuring additional virtual interfaces.


Now try to connect to your ISP:

$ sudo ifup dsl-provider

You should see something like this:

$ sudo ifconfig ppp0
ppp0      Link encap:Point-to-Point Protocol 
          inet addr:82.111.111.111 P-t-P:82.153.1.65  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1478  Metric:1
          RX packets:1200866 errors:0 dropped:0 overruns:0 frame:0
          TX packets:748261 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:3
          RX bytes:1633001082 (1.5 GiB)  TX bytes:59749711 (56.9 MiB)


Now verify if your second IP is connected:

$ ifconfig ppp0:1
ppp0:1    Link encap:Point-to-Point Protocol 
          inet addr:82.111.111.112  P-t-P:82.111.111.112  Mask:255.255.255.252
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1478  Metric:1

There are 2 important aspects to be noted at this point:

1. You may see that a netmask is 255.255.255.255, which means our IP is connected one-on-one to an IP on the ISP side. Well, this is what point-to-point means, and it makes sense! But all netmasks should honour what we have configured in /etc/networks/interfaces, which does not seem to be the case. We will address this issue later.

2. The MTU size is 1478, which is a recommended value I've got from Eclipse. Chances are that you are seeing some other value. No worries, we will address this issue later.

Let's dive a bit into these aspects in the next sections.


Interface configuration

The interface ppp0 happens to be wired to "P-t-P:82.153.1.65" in this case in particular. Actually, every time you connect you may potentially connect to a different IP on the ISP side. It means that you cannot assume that a certain IP in particular is your default gateway permanently. In our /etc/network/interfaces we find dynamically what is the default gateway we need to configure:

      ...
      gw=$( /sbin/ifconfig ppp | \
            head -2 | tail -1 | \
            sed -E 's/(.*P-t-P:)([0-9.]+)( .*)/\2/' ) ; \
      /sbin/route add default gw $gw ppp0 ; \

      ...

The netmask of ppp0 should be actually 255.255.255.252. This is the value I said it should be in /etc/networks/interfaces, but it is stubborn and insists on 255.255.255.255. It's possibly an issue I still need to fix on PPPoE configuration.
PENDING: I said before we would be addressing this issue. Well, not yet :( ... I still need to figure out how this can be done.

My ppp0:1 is a virtual interface which is configured with my second static IP address. Observe that it is connected with itself "P-t-P:82.111.111.112", which does not look to be correct. It should be connected to some IP in the vicinity of "P-t-P:82.153.1.65", which is the IP ppp0 is currently connected to.
PENDING: I still need to fix this!


The netmask of ppp0:1 is already 255.255.255.252, which honours the configuration I've put on /etc/networks/interfaces. This is good.

Note: despite of pending items, lots of things are working just fine here.


Static routes

Now have a look at the static routing table:

$ sudo route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         82.153.1.65     0.0.0.0         UG    0      0        0 ppp0
82.111.111.111  0.0.0.0         255.255.255.252 U     0      0        0 ppp0
82.153.1.65     0.0.0.0         255.255.255.255 UH    0      0        0 ppp0
192.168.2.0     0.0.0.0         255.255.255.0   U     0      0        0 eth1

PENDING: I still need to fix the netmask marked in red.
PENDING: I still need to accomodate ppp0:1 in the routing table!

The important bits are:

1. Flags UG means that this route is the default gateway. This route must be associated to interface ppp0 and must be associated with the IP address given by the ISP at PPPoE negotiation time. This was already explained in the section above.

2. Flags UH means that a given route is a host route, i.e: a route to talk to a single host in particular. In this case, interface ppp0 is responsible for talking to the IP address given by the ISP at PPPoE negotiation time.These flags are set by ppp, since it creates a point-to-point connection to a specific host on the ISP side.

3. I still need to configure ppp0:1 and make it appear in the routing table. The way it is at the moment "works" and I can even ping this address from outside, but it actually routes via ppp0, which is an additional hop, which adds some latency.
PENDING: I still need to accomodate ppp0:1 in the routing table!


MTU size configuration

Long story short, it's necessary to configure the MTU size in order to accomodate some information with is hanging on each packet of data when you are using PPPoE. The actual value of MTU size may vary under different circumstances and may even depend on what you have on your side of the connection. But let's keep it simple at this point and simply stick to the value Eclipse informed me to employ, which is 1478.

Keeping it simple, all you have to do is edit your /etc/ppp/peers/dsl-provider and make sure you have a block like this:

    connect /bin/true
    noauth
    persist
    mtu 1478


Then reconnect, making sure you release everything before connecting again:

$ sudo ifdown dsl-provider; sudo poff ; \
   sudo ifconfig eth1 down; sudo ifconfig eth0 down; \
   sudo ifup dsl-provider


Try to navigate to websites like http://github.com and see if the browser successfully retrieves everything, completing the request in a few seconds. If the icon keeps rolling and rolling in the browser's location bar... this is not a very good sign.

Note: Actually, chances are that this test will not work very well if you have a configuration similar to mine, I mean: you are using your Linux box as a router and/or firewall. This leads us to the next section.


iptables configuration

I have a firewall based on iptables running on my Debian box. I'm definitely not a network engineer and I'm not willing to become one, but I managed to configure my firewall relatively easily using a software called fwbuilder. It took me some time to get used to how things work... but, as I said, it can be done relatively easily if you know some basics of TCP/IP. No need to hire a network engineer ;-)

Long story short, if you put this below in the epilog script of your firewall configuration, you will be telling iptables to clamp MSS to MTU.


echo "Running epilog script"
# This is needed for NAT on ppp0
$IPTABLES --table nat --append POSTROUTING --out-interface ppp0 -j MASQUERADE

# This is needed for hosts in DMZ-10 to accept requests
$IPTABLES --append FORWARD --in-interface virbr1 -j ACCEPT

# http://adsl.cutw.net/mtu.html
# http://www.tldp.org/HOWTO/IP-Masquerade-HOWTO/mtu-issues.html
# http://www.cisco.com/en/US/tech/tk175/tk15/technologies_tech_note09186a0080093bc7.shtml

$IPTABLES -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
$IPTABLES -A OUTPUT  -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
$IPTABLES -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu


# Now it's safe to enable IP forward
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "Epilog script done"



At this point, you must be curious about what virbr1 is in the block above, which is something which appeared from nothing in this discussion. It's explained in the next section.

You may also be curious about what the hell clamping MSS to MTU means. Look for more information in section References below.


Bootstrapping virtual servers

My Debian box is at the same time the router, the firewall and it hosts several virtual machines, mimicking a typical DMZ scenario, with virtual machines facing the Internet and virtual machines facing the LAN.

When my server comes up, it first brings up the interface dsl-provider, because it's marked as auto in /etc/networks/interfaces:


      ...
      auto dsl-provider
      iface dsl-provider inet ppp

      ...

... then it starts the virtual machines, then it starts the firewall.

For virtual machines, I use virt-manager , which provides among other things, the service /etc/init.d/libvirt-guests, which is responsible for bringing up all virtual machines. This process also creates subnets for those virtual machines facing the Internet and for those machines facing the LAN.

When I start the firewall, all static routing is already defined, whatever if interfaces involved are physical, related to ppp or related to virt-manager. All that the firewall needs to do is enforce security on these routings and make sure that certain requests arriving from the Internet are properly routed to certain virtual servers sitting on the subnet which should face the Internet, which in my case is virbr1. It's also necessary to remember to enable ip_forward in the kernel.

But we still need to make sure that the firewall starts after services provided by virt-manager. The way to do this is to put something like this below in your firewall configuration, in the editor tab of your firewall definition.

#### BEGIN INIT INFO
#Provides:       firewall
#Required-Start: $network $remote_fs $syslog libvirt-bin libvirtd libvirt-guests
#Required-Stop:  $network $remote_fs $syslog
#Default-Start:  2 3 4 5
#Default-Stop:   0 1 6
#Description:    firewall rules
#### END INIT INFO


When fwbuilder deploys the firewall rules onto your server, it will create /etc/init.d/firewall with a header like shown below, which is is what you is needed in order to provide dependency information between services. Some spaces are stubbornly added by fwbuilder, but it works without need to edit anything by hand, which is great.

# ### BEGIN INIT INFO
# Provides:       firewall
# Required-Start: $network $remote_fs $syslog libvirt-bin libvirtd libvirt-guests
# Required-Stop:  $network $remote_fs $syslog
# Default-Start:  2 3 4 5
# Default-Stop:   0 1 6
# Description:    firewall rules
# ### END INIT INFO



Conclusion

If you followed this article, you will be probably able to connect to Eclipse Internet without any router between your Debian box and your modem. You will also be able to tackle problems related to slowness which happens due to misconfigured parameters related to ppp and iptables.

I hope this article is useful and please let me know if you find it incomplete, misleading or wrong. If you have suggestions or something to complement, please let me know :)


References

http://adsl.cutw.net/mtu.html
http://www.tldp.org/HOWTO/IP-Masquerade-HOWTO/mtu-issues.html
http://www.cisco.com/en/US/tech/tk175/tk15/technologies_tech_note09186a0080093bc7.shtml




-------

If you found this article useful, please consider sharing it or linking to it.



THIS IS A DRAFT *  THIS IS A DRAFT * THIS IS A DRAFT * THIS IS A DRAFT