Python vs. Haskell vs. PHP - real world performance

Recently I had to optimize a legacy PHP cgi application, which worked fine but was too slow for its purpose. The main bottleneck was in selecting matching lines from a file containing about 15000 lines. Not finding a way to optimize PHP code I decided to change the language.
Two candidates were Python and Haskell. Knowing that Haskell is the only language out of three which compiles to machine code I expected it to be a clear winner, but I was up for a surprise...

Benchmark results:

PHP: 18ms (PHP 5.3.3)
Python: 4ms (Python 2.6)
Haskell: 38ms (compiled with ghc --make -dynamic -O2, ghc 6.12)

Code of the fast cgi application I benchmarked in 3 languages:

PHP code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
define('MAX_NAMES', 30);
$handle = fopen("names.dat", "r");
 
$name = strtoupper($_REQUEST['q']);
$res = array();
$size = 0;
 
while (($customer_name = fgets($handle, 4096)) !== false) {
    if (strpos($customer_name, $bank_name) !== false) {
        $res[] = "'" . trim($customer_name) . "'";
        $size += 1;
        if ($size > MAX_NAMES + 1) {
            break;
        }
    }
}
 
fclose($handle);
print '['.implode(',',$res).']';

Python code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import cgi
import sys, os
from flup.server.fcgi import WSGIServer
 
 
MAX_NAMES = 30
 
def app(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
    name = form["q"].value.upper()
    output_names = []
    count = 0
 
 
     with open("names.dat") as f:
        for line in f:
            if name in line:
                output_names.append(line.strip())
                count += 1
                if count > MAX_NAMES:
                    break
 
 
      yield "['" + "','".join(output_names) + "']"
 
WSGIServer(app).run()

Haskell code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import Control.Concurrent
import System.Posix.Process (getProcessID)
import Char
import Data.List
import Data.Maybe
 
import Network.FastCGI
 
compute :: CGI CGIResult
bankSuggest = do setHeader "Content-type" "text/plain"
                 q <- getInput "q"
                 let key = maybe "" (map Char.toUpper) q
                 let n = 30
                 fileContent <- liftIO $ readFile "names.dat"
                 let result = if key /= ""
                                    then take n $ filter (Data.List.isInfixOf key) $ lines fileContent
                                    else []
                 output $ "['" ++ intercalate "','" result ++ "']"
 
main = runFastCGIConcurrent' forkIO 1 compute

Parameter q for this benchmark was selected to produce less than 30 entries, which is the most common case for this application. As a result lazy evaluation and early loop termination did not help the performance.

To sum up Python version is 4.5 times faster than PHP and almost 10 times faster than Haskell, which is pretty amazing.

 

Follow up:

As Don Stewart pointed out String type is quite slow in Haskell and a faster alternative would be to use ByteString.
So I've rewritten Haskell cgi using ByteString functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import Control.Concurrent
import System.Posix.Process (getProcessID)
import Char
import Data.Maybe
import Data.ByteString.Char8 as BS
 
import Network.FastCGI
 
bankSuggest :: CGI CGIResult
bankSuggest = do setHeader "Content-type" "text/plain"
                 q <- getInput "q"
                 let key = BS.pack (maybe "" (Prelude.map Char.toUpper) q)
                 let n = 30
                 fileContent <- liftIO $ BS.readFile "names.dat"
                 let result = if key /= ""
                                    then Prelude.take n $ Prelude.filter (BS.isInfixOf key) $ BS.lines fileContent
                                    else []
                 output $ "['" ++ BS.unpack (BS.intercalate "','" result) ++ "']"
 
main = runFastCGIConcurrent' forkIO 1 bankSuggest

New benchmark results:

PHP: 18ms (PHP 5.3.3)
Python: 4ms (Python 2.6)
Haskell using String: 38ms
(compiled with ghc --make -dynamic -O2, ghc 6.12)
Haskell using ByteString: 8ms
(compiled with ghc -XOverloadedStrings --make -dynamic -O2, ghc 6.12)

Indeed Haskell code using ByteString is 5 times faster than Haskell using String.
However it is still twice slower than Python!




Leave comments

authimage
  • The problem with your Haskell code is that you're doing String IO, instead of using ByteStrings.

    For performance in the real world, use Data.ByteString.readFile

  • Thanks for the suggestion Don!
    I've added follow up with new benchmark.

    • mypy
  • What if you use outputFPS instead of output?

    • Anonymous
  • I tried outputFPS and did not find any observable performance improvement.

    • mypy
  • Wow, purely academic language Haskell failed to deliver performance. No way!

    • Samuel J.
  • Please compare performance with a fast language like C++ or Java.

    • lzpk
  • Neither C++ nor Java is fast.
    Compare with C instead.

    • V

Copyright(c) 2017 - PythonBlogs.com
By using this website, you signify your acceptance of Terms and Conditions and Privacy Policy
All rights reserved