Python vs. Haskell vs. PHP - real world performance
Recently I had to optimize a legacy PHP cgi application, which worked fine but was too slow for its purpose. The main bottleneck was in selecting matching lines from a file containing about 15000 lines. Not finding a way to optimize PHP code I decided to change the language.
Two candidates were Python and Haskell. Knowing that Haskell is the only language out of three which compiles to machine code I expected it to be a clear winner, but I was up for a surprise...
Benchmark results:
PHP: 18ms (PHP 5.3.3)
Python: 4ms (Python 2.6)
Haskell: 38ms (compiled with ghc --make -dynamic -O2, ghc 6.12)
Code of the fast cgi application I benchmarked in 3 languages:
PHP code:
define('MAX_NAMES', 30); $handle = fopen("names.dat", "r"); $name = strtoupper($_REQUEST['q']); $res = array(); $size = 0; while (($customer_name = fgets($handle, 4096)) !== false) { if (strpos($customer_name, $bank_name) !== false) { $res[] = "'" . trim($customer_name) . "'"; $size += 1; if ($size > MAX_NAMES + 1) { break; } } } fclose($handle); print '['.implode(',',$res).']';
Python code:
import cgi import sys, os from flup.server.fcgi import WSGIServer MAX_NAMES = 30 def app(environ, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ) name = form["q"].value.upper() output_names = [] count = 0 with open("names.dat") as f: for line in f: if name in line: output_names.append(line.strip()) count += 1 if count > MAX_NAMES: break yield "['" + "','".join(output_names) + "']" WSGIServer(app).run()
Haskell code:
import Control.Concurrent import System.Posix.Process (getProcessID) import Char import Data.List import Data.Maybe import Network.FastCGI compute :: CGI CGIResult bankSuggest = do setHeader "Content-type" "text/plain" q <- getInput "q" let key = maybe "" (map Char.toUpper) q let n = 30 fileContent <- liftIO $ readFile "names.dat" let result = if key /= "" then take n $ filter (Data.List.isInfixOf key) $ lines fileContent else [] output $ "['" ++ intercalate "','" result ++ "']" main = runFastCGIConcurrent' forkIO 1 compute
Parameter q for this benchmark was selected to produce less than 30 entries, which is the most common case for this application. As a result lazy evaluation and early loop termination did not help the performance.
To sum up Python version is 4.5 times faster than PHP and almost 10 times faster than Haskell, which is pretty amazing.
Follow up:
As Don Stewart pointed out String type is quite slow in Haskell and a faster alternative would be to use ByteString.
So I've rewritten Haskell cgi using ByteString functions:
import Control.Concurrent import System.Posix.Process (getProcessID) import Char import Data.Maybe import Data.ByteString.Char8 as BS import Network.FastCGI bankSuggest :: CGI CGIResult bankSuggest = do setHeader "Content-type" "text/plain" q <- getInput "q" let key = BS.pack (maybe "" (Prelude.map Char.toUpper) q) let n = 30 fileContent <- liftIO $ BS.readFile "names.dat" let result = if key /= "" then Prelude.take n $ Prelude.filter (BS.isInfixOf key) $ BS.lines fileContent else [] output $ "['" ++ BS.unpack (BS.intercalate "','" result) ++ "']" main = runFastCGIConcurrent' forkIO 1 bankSuggest
New benchmark results:
PHP: 18ms (PHP 5.3.3)
Python: 4ms (Python 2.6)
Haskell using String: 38ms
(compiled with ghc --make -dynamic -O2, ghc 6.12)
Haskell using ByteString: 8ms
(compiled with ghc -XOverloadedStrings --make -dynamic -O2, ghc 6.12)
Indeed Haskell code using ByteString is 5 times faster than Haskell using String.
However it is still twice slower than Python!