Thrift provides lightweight, cross language RPC by generating code for each target language from a simple definition file. There are lots of RPC mechanisms out there, what’s so interesting about Thrift? Good question.

Thrift is easy to use and Thrift in any given language “feels right” for that language. Using Thrift in Python feels Pythonic. Using it in C++ feels, errr, uhm, C++ish?

Thrift ships with a binary protocol to transport the data so it’s lightweight and efficient. It doesn’t try to solve all the problems of the world, it just tries to do one thing well. The architecture is very clean and allows changing the behaviour at any layer in the RPC stack. This includes the Protocol layer which encodes the data structures and the Transport layer which is responsible for moving the data from point A to Z.

You can choose the language that is most appropriate for the component and the component can live on the most appropriate system. For example, the search component of your site may need to troll through a huge amount of data so writing it in C++ and running it on a search cluster may be the best choice. This seach component can be easily called from your the TurboCherryDjangoLons application running on your web server farm. Perhaps you need Geolocation but would prefer not to burn a ton of RAM on each of your web nodes you can use Thrift to talk to a remote Geolocation service which is written in Python. This leads to my example.

Here we have a very simple Thrift definition file:

struct SpiffyGeoIPLocation {
   1: string country_code,
   2: double latitude,
   3: double longitude,
   4: list<string> possible_cities
}

service SpiffyGeoIPService {
   SpiffyGeoIPLocation by_ip_addr(string ip_addr);
}

It defines a structure called SpiffyGeoIPLocation. This struct represents the answer we will recieve from our GeoLocation service. The definition file also defines a service called SpiffGeoIPService which exposes a single procedure call: by_ip_addr which takes in a string representing the dotted quad of the IP address.

Once we have the definition file we run the Thrift compiler which will generate the code for the structure and the service in each of the languages we request. In the case of Python this will be a pair of class definitions. There will be a class which represents the structure and one that represents the client interface. It also generates a simple client script that will allow you to make RPC calls to the service from your shell. This is very handy for testing.

Here is the implementation of this service in Python:

from thrift.transport.TSocket import TServerSocket
from thrift.transport.TTransport import TBufferedTransportFactory
from thrift.protocol.TBinaryProtocol import TBinaryProtocolFactory
from thrift.server.TServer import TThreadedServer
from simple_example import SpiffyGeoIPService
from simple_example.ttypes import SpiffyGeoIPLocation

class SpiffyGeoIPHandler:
     def by_ip_addr(self, ip_addr):
         result = SpiffyGeoIPLocation(dict(country_code='us',
             latitude=41.90, longitude=87.65,
             possible_cities=['Chicago, IL', 'Rosemont, IL']))

         if result == None:
             return SpiffyGeoIPLocation(dict(country_code='??',
                 possible_cities=['??']))
         return result

processor = SpiffyGeoIPService.Processor(SpiffyGeoIPHandler())
transport = TServerSocket(3773)
tfactory = TBufferedTransportFactory()
pfactory = TBinaryProtocolFactory()
server = TThreadedServer(processor, transport, tfactory, pfactory)

server.serve()

if __name__ == '__main__':
    import sys
    sys.path.insert(0, 'gen-py')
    sys.path.insert(0, 'thrift-trunk/lib/py/build/lib.macosx-10.4-i386-2.5')

As you can see we set up some Thrift infrastructure and define a handler class which implements by_ip_addr procedure as a method. It is this handler class where you put the code to perform the desired task.

The code to make the call is equally simple:

from thrift.transport.TSocket import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol.TBinaryProtocol import TBinaryProtocol
from simple_example import SpiffyGeoIPService

socket = TSocket('localhost', 3773)
transport = TBufferedTransport(socket)
protocol = TBinaryProtocol(transport)
geoip_service = SpiffyGeoIPService.Client(protocol)
transport.open()

where = geoip_service.by_ip_addr("192.0.2.37")

print where.country_code, where.latitude, where.longitude, where.possible_cities
for city in where.possible_cities:
   print city

if __name__ == '__main__':
    import sys
    sys.path.insert(0, 'gen-py')
    sys.path.insert(0, 'thrift-trunk/lib/py/build/lib.macosx-10.4-i386-2.5')

As you can see we get back an object which is an instance of the SpiffyGeoIPLocation class which has all the fields defined in the structure definition in the first slide. It’s acts just like a Python object because it is a Python object.

Why do I like Thrift so much? It’s made distributing the components of my current project at my day job very easy. This is good because I have to run two instances of my system for redundancy, one on the east coast and one on the west. Each one will periodically backfill holes in it’s database with the other coast. Most of the code in other parts of our system are in Perl. Using Thrift I have been able to use the data generated by my Python system painlessly in Perl parts of the system. Also, I may need to optimize the speed of my system, with Thrift it would be pretty easy to rewrite the slow components in C++. I’m repeating myself here, but think about this – you can easily integrate new code with old systems in other languages. This frees you from having to write the whole system in one language.

If you’d like to play with these examples, feel free to grab the code.