There are currently quite a few different ways of developing a web application in python.  When you add in how you deploy the application as well, there are even more choices.  In terms of application frameworks, you have at least:

  • Django
  • twisted.web
  • flask
  • bottle
  • cyclone
  • tornado
  • pyramid

Then these can be run using many different servers, including:

  • tornado
  • twisted
  • cyclone
  • wsgiref
  • rocket
  • cherrypy
  • gunicorn
  • fapws
  • google app engine
  • gevent

And many more.  Typically, these take one of several approaches.  Asynchronous either explicit (cyclone, tornado) or via monkey patch and event loop (gevent); threaded such as rocket, or written in C to use an event loop.  In addition to this, you now have several different pythons for deployment:

  • cpython
  • jython
  • pypy

At some point, these servers are generally dealing with asynchronous event loops or using threading.  The two approaches to handling this are either to program in a normal style (gevent) or to explicitly use event based programming (eg cyclone).  The rise of javascript and node.js has seen event based programming becoming more mainstream.  I wanted to find out which of these many combinations would perform best, and in particular what effect using pypy as the interpreter would have on the performance.

The benchmark

I created a fairly simple benchmark and implemented it across the different application styles.  The benchmark creates one route which renders 'Hello world', a click counter stored by redis that we increase, and finally a static 2,000 character string that we retrieve from redis.  I then run ab against the application for 10,000 requests for three replicates at different levels of concurrency (4, 16, 64, 256 connections).  We take the stats for the requests per second and also the total request time averages, range and standard deviation.

These were run on a linux box with the kernel 'Linux #1 SMP Tue Nov 29 11:53:48 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux' and 24 core 'Intel(R) Xeon(R) CPU           X5675  @ 3.07GH' (although only one core will be used for python) and 48Gb ram.  The pythons used were 'Python 2.7.1' and 'Python 2.7.2, [PyPy 1.8.0 with GCC 4.4.6]'.

You can run the benchmark by checking out the repository on github.  If you know anything about the application styles, I would encourage you to take a look at the implementations in the servers directory.

The versions used are:

  • Flask==0.8
  • Paste==
  • Rocket==1.2.4
  • Twisted==12.0.0
  • WebOb==1.2b3
  • Werkzeug==0.8.3
  • bottle==0.10.9
  • bottle-redis==0.1
  • cyclone==1.0-rc3
  • distribute==0.6.24
  • pyramid==1.3a8
  • redis==2.4.11
  • repoze.lru==0.4
  • tornado==2.2
  • gevent==1.0dev
  • greenlet==0.3.4
  • eventlet==0.9.16
  • CherryPy==3.2.2

I patched bottle-redis to use a connection queue and bottle to silence the logging when using gevent.  Twisted needs C extensions disabled to install on pypy.


JIT effect

The first thing to notice is that pypy takes a little time to optmize the code.  This plot shows the requests per second for each test in iterations 1, 2 and 3.  Pypy is on the left (True) and cpython on the right (False).  The vertical facet is by the concurrency.


You can clearly see the effect of the JIT. Cpython, on the right, has stable performance across all iterations.  Pypy shows a marked improvement from the second iteration.

Requests per second

We now look at requests per second against concurrency across all combinations of application and server.  Each box contains pypy and cpython lines, where available.  These are from the third iteration, when the JIT compiler has done its work


Looking at this, you can see a few things.  Twisted and Cherrypy and paste are not really stable under increasing concurrency shown by the lines sloping to the right or incomplete lines (server failed).  Gevent, tornado and cyclone are stable under loads and show fairly equivalent performance under cpython.  Pypy introduces a 1.5-2x performance increase for almost all servers.

Comparing the microframeworks, flask and bottle, bottle is always faster and really flies under tornado and gevent.  Pyramid does the best and serves over 5,000 requests per second with tornado and pypy.  Cyclone comes second under pypy with over 3,500 requests per second.  However, I found the lack of performance increase over tornado and bottle slightly dissapointing, since cyclone is using an async redis connection.

Response time

Here we look at the average response time, its standard deviation, and the maximum response time.  A very similar picture for all technologies: under increasing concurrency the response times degrade.


No clear winner between pypy and cpython.


  • Gevent and tornado have excellent WSGI servers that can serve 1,000s of requests per second.
  • Pypy can provide 1.5-2x performance, and this is available with tornado and cyclone but not gevent
  • Explicit async code in cyclone did not provide a noticeable increase in performance over tornado, pyramid and bottle
  • bottle outperforms flask and really flies with gevent and tornado
  • pyramid is really fast with gevent and tornado
  • threading approaches don't seem to match the other approaches here
  • cherrypy seems to have problems with higher concurrencies

Overall, I think this shows if you are really interested in performance you should take a good look at pypy.  I have a lot of respect for the tornado code now, and would seriously consider it for future projects.  Bottle is a very good microframework that outperforms flask.


23/2/12 Add cherrypy, eventlet, created common wsgi server codeCouldn't get eventlet to run stable under pypy (socket and RPy errors).