mypy

Python vs. Haskell vs. PHP - real world performance

mypy | 02 November, 2010 00:10

Recently I had to optimize a legacy PHP cgi application, which worked fine but was too slow for its purpose. The main bottleneck was in selecting matching lines from a file containing about 15000 lines. Not finding a way to optimize PHP code I decided to change the language.
Two candidates were Python and Haskell. Knowing that Haskell is the only language out of three which compiles to machine code I expected it to be a clear winner, but I was up for a surprise...

Benchmark results:

PHP: 18ms (PHP 5.3.3)
Python: 4ms (Python 2.6)
Haskell: 38ms (compiled with ghc --make -dynamic -O2, ghc 6.12)

Code of the fast cgi application I benchmarked in 3 languages:

PHP code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
define('MAX_NAMES', 30);
$handle = fopen("names.dat", "r");
 
$name = strtoupper($_REQUEST['q']);
$res = array();
$size = 0;
 
while (($customer_name = fgets($handle, 4096)) !== false) {
    if (strpos($customer_name, $bank_name) !== false) {
        $res[] = "'" . trim($customer_name) . "'";
        $size += 1;
        if ($size > MAX_NAMES + 1) {
            break;
        }
    }
}
 
fclose($handle);
print '['.implode(',',$res).']';

Python code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import cgi
import sys, os
from flup.server.fcgi import WSGIServer
 
 
MAX_NAMES = 30
 
def app(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
    name = form["q"].value.upper()
    output_names = []
    count = 0
 
 
     with open("names.dat") as f:
        for line in f:
            if name in line:
                output_names.append(line.strip())
                count += 1
                if count > MAX_NAMES:
                    break
 
 
      yield "['" + "','".join(output_names) + "']"
 
WSGIServer(app).run()

Haskell code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import Control.Concurrent
import System.Posix.Process (getProcessID)
import Char
import Data.List
import Data.Maybe
 
import Network.FastCGI
 
compute :: CGI CGIResult
bankSuggest = do setHeader "Content-type" "text/plain"
                 q <- getInput "q"
                 let key = maybe "" (map Char.toUpper) q
                 let n = 30
                 fileContent <- liftIO $ readFile "names.dat"
                 let result = if key /= ""
                                    then take n $ filter (Data.List.isInfixOf key) $ lines fileContent
                                    else []
                 output $ "['" ++ intercalate "','" result ++ "']"
 
main = runFastCGIConcurrent' forkIO 1 compute

Parameter q for this benchmark was selected to produce less than 30 entries, which is the most common case for this application. As a result lazy evaluation and early loop termination did not help the performance.

To sum up Python version is 4.5 times faster than PHP and almost 10 times faster than Haskell, which is pretty amazing.

 

Follow up:

As Don Stewart pointed out String type is quite slow in Haskell and a faster alternative would be to use ByteString.
So I've rewritten Haskell cgi using ByteString functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import Control.Concurrent
import System.Posix.Process (getProcessID)
import Char
import Data.Maybe
import Data.ByteString.Char8 as BS
 
import Network.FastCGI
 
bankSuggest :: CGI CGIResult
bankSuggest = do setHeader "Content-type" "text/plain"
                 q <- getInput "q"
                 let key = BS.pack (maybe "" (Prelude.map Char.toUpper) q)
                 let n = 30
                 fileContent <- liftIO $ BS.readFile "names.dat"
                 let result = if key /= ""
                                    then Prelude.take n $ Prelude.filter (BS.isInfixOf key) $ BS.lines fileContent
                                    else []
                 output $ "['" ++ BS.unpack (BS.intercalate "','" result) ++ "']"
 
main = runFastCGIConcurrent' forkIO 1 bankSuggest

New benchmark results:

PHP: 18ms (PHP 5.3.3)
Python: 4ms (Python 2.6)
Haskell using String: 38ms
(compiled with ghc --make -dynamic -O2, ghc 6.12)
Haskell using ByteString: 8ms
(compiled with ghc -XOverloadedStrings --make -dynamic -O2, ghc 6.12)

Indeed Haskell code using ByteString is 5 times faster than Haskell using String.
However it is still twice slower than Python!

Comments

Use Data.ByteString in Haskell

Don Stewart | 06/11/2010, 06:03

The problem with your Haskell code is that you're doing String IO, instead of using ByteStrings.

For performance in the real world, use Data.ByteString.readFile

ByteString

mypy | 07/11/2010, 01:01

mypy

Thanks for the suggestion Don!
I've added follow up with new benchmark.

unnecessary string conversion

Anonymous | 28/08/2012, 03:08

What if you use outputFPS instead of output?

speed

mypy | 26/11/2012, 01:17

I tried outputFPS and did not find any observable performance improvement.

Haskell

Samuel J. | 08/01/2013, 20:04

Wow, purely academic language Haskell failed to deliver performance. No way!

Re: Python vs. Haskell vs. PHP - real world performance

lzpk | 25/10/2013, 01:24

Please compare performance with a fast language like C++ or Java.

Re: Python vs. Haskell vs. PHP - real world performance

V | 17/04/2017, 22:56

Neither C++ nor Java is fast.
Compare with C instead.

Add comment

authimage

 
Accessible and Valid XHTML 1.0 Strict and CSS
Powered by LT - Design by BalearWeb