Python vs. Haskell vs. PHP - real world performance

Recently I had to optimize a legacy PHP cgi application, which worked fine but was too slow for its purpose. The main bottleneck was in selecting matching lines from a file containing about 15000 lines. Not finding a way to optimize PHP code I decided to change the language.
Two candidates were Python and Haskell. Knowing that Haskell is the only language out of three which compiles to machine code I expected it to be a clear winner, but I was up for a surprise...

Benchmark results:

PHP: 18ms (PHP 5.3.3)
Python: 4ms (Python 2.6)
Haskell: 38ms (compiled with ghc --make -dynamic -O2, ghc 6.12)

Code of the fast cgi application I benchmarked in 3 languages:

PHP code:

define('MAX_NAMES', 30);
$handle = fopen("names.dat", "r");
 
$name = strtoupper($_REQUEST['q']);
$res = array();
$size = 0;
 
while (($customer_name = fgets($handle, 4096)) !== false) {
    if (strpos($customer_name, $bank_name) !== false) {
        $res[] = "'" . trim($customer_name) . "'";
        $size += 1;
        if ($size > MAX_NAMES + 1) {
            break;
        }
    }
}
 
fclose($handle);
print '['.implode(',',$res).']';

Python code:

import cgi
import sys, os
from flup.server.fcgi import WSGIServer
 
 
MAX_NAMES = 30
 
def app(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
    name = form["q"].value.upper()
    output_names = []
    count = 0
 
 
     with open("names.dat") as f:
        for line in f:
            if name in line:
                output_names.append(line.strip())
                count += 1
                if count > MAX_NAMES:
                    break
 
 
      yield "['" + "','".join(output_names) + "']"
 
WSGIServer(app).run()

Haskell code:

import Control.Concurrent
import System.Posix.Process (getProcessID)
import Char
import Data.List
import Data.Maybe
 
import Network.FastCGI
 
compute :: CGI CGIResult
bankSuggest = do setHeader "Content-type" "text/plain"
                 q <- getInput "q"
                 let key = maybe "" (map Char.toUpper) q
                 let n = 30
                 fileContent <- liftIO $ readFile "names.dat"
                 let result = if key /= ""
                                    then take n $ filter (Data.List.isInfixOf key) $ lines fileContent
                                    else []
                 output $ "['" ++ intercalate "','" result ++ "']"
 
main = runFastCGIConcurrent' forkIO 1 compute

Parameter q for this benchmark was selected to produce less than 30 entries, which is the most common case for this application. As a result lazy evaluation and early loop termination did not help the performance.

To sum up Python version is 4.5 times faster than PHP and almost 10 times faster than Haskell, which is pretty amazing.

 

Follow up:

As Don Stewart pointed out String type is quite slow in Haskell and a faster alternative would be to use ByteString.
So I've rewritten Haskell cgi using ByteString functions:

import Control.Concurrent
import System.Posix.Process (getProcessID)
import Char
import Data.Maybe
import Data.ByteString.Char8 as BS
 
import Network.FastCGI
 
bankSuggest :: CGI CGIResult
bankSuggest = do setHeader "Content-type" "text/plain"
                 q <- getInput "q"
                 let key = BS.pack (maybe "" (Prelude.map Char.toUpper) q)
                 let n = 30
                 fileContent <- liftIO $ BS.readFile "names.dat"
                 let result = if key /= ""
                                    then Prelude.take n $ Prelude.filter (BS.isInfixOf key) $ BS.lines fileContent
                                    else []
                 output $ "['" ++ BS.unpack (BS.intercalate "','" result) ++ "']"
 
main = runFastCGIConcurrent' forkIO 1 bankSuggest

New benchmark results:

PHP: 18ms (PHP 5.3.3)
Python: 4ms (Python 2.6)
Haskell using String: 38ms
(compiled with ghc --make -dynamic -O2, ghc 6.12)
Haskell using ByteString: 8ms
(compiled with ghc -XOverloadedStrings --make -dynamic -O2, ghc 6.12)

Indeed Haskell code using ByteString is 5 times faster than Haskell using String.
However it is still twice slower than Python!


Python 3000

Python 3000 is coming. 

Guido van Rossum explains what Python developer should expect from Python 3000 in his blog article and keynote.

Python 3000 is not compatible with current Python 2. Most changes represent language clean-up and removal of deprecated features (classic classes, string exceptions). Overall language will become smaller with fewer surprises and exceptions. Guido urges developers not to change API and promises to support Python 2.6 for at least 5 years. He mentions that 2to3 tool (source-to-source) translator will help to migrate from Python 2 to Python 3000 easier.

Release of Python 3.0 final is expected in August 2008.

 


Congratulations!

If you can read this post, it means that the registration process was successful and that you can start blogging