My post about Ruby 1.9’s impressive improvement over Ruby 1.8.6 created quite an echo within the developer community. Sure, the headline was an attention grabber, just like this one is :-P, but in a matter of a few hours, there were all sorts of blog entries with variants in many languages, more than 200 comments on Reddit, and fifty comments on my own blog. There were however, also a few misconceptions. It was great though because such a simple post generated a lot of discussion amongst developers, with some insightful arguments taking place – and besides it almost created a new meme with the whole “holy shmoly” thing. Fun as that certainly was, let’s try to summarize and clarify a few points.

First and foremost, for those who stopped at the title of the article and didn’t read on, I made it very clear in several places that I didn’t even begin to predict that Ruby 1.9 will be faster than Python 2.5.1 when it comes to real world applications. I ran a simple micro-benchmark where this just happened to be the case. Chances are that Python will have the edge in many instances, especially if we consider that it has several optimized libraries which may still be missing or suboptimal in Ruby. Within the scope of the recursive Fibonacci’s test, which essentially stress-tests method/function calling, Ruby 1.9 seems to be more than 13 times faster than Ruby 1.8.6, but within the Ruby community it’s well known that this improvement factor is not very often replicated in other micro-benchmarks or actual applications.

Ruby is improving, its progress is impressive, so let’s drive this point home and try not to speculate too much. Someone also pointed out Ruby’s lack of Unicode support and in other cases, its disadvantages over Python. That’s missing the point, the article isn’t aimed at comparing or claiming that one language is better than the other. The essence of the message was and remains, that Ruby’s main interpreter β which was typically fairly slow β will soon be replaced by Yarv, which will drastically improve Ruby’s speed, to the point where perhaps it will be comparable with the not-too-fast but acceptable CPython. And yes, even for Python there are psyco and pypy both of which would change the outcome of my test. With this out of the way, let’s move on.

Dima Dogadaylo and my friend Valentino Volonghi both blogged about a much faster version (they employed the use of generators) of my Python snippet. Many other people in this blog and on Reddit proposed faster algorithms too. That’s all well and fine, but it really compares apples and oranges. If we switch from a naive recursive algorithm to an iterative one, even Ruby 1.8.6 will be faster and able to compute the task in a few moments. The computationally expensive and inefficient recursive algorithm should be used by those who want to compare other languages with my results, otherwise the comparison will be meaningless. Using a fast language to implement a slow algorithm is always going to be slower than using a slow language with an efficient algorithm, for N sufficiently large.

The Fibonacci function is mathematically defined as follows:

In the intentionally naive and recursive algorithm I adopted in my original post, I essentially wrote the mathematical formula in Ruby and Python syntax, and then executed it for *n* in a range that goes from 0 up to 35. This is very inefficient, because the tree-recursive process generated in computing F(n) grows exponentially. We have:

F(0) = 0

F(1) = 1

F(2) = F(1) + F(1)

F(3) = F(2) + F(1) = F(1) + F(1) + F(1)

F(4) = F(3) + F(2) = F(1) + F(1) + F(1) + F(1) + F(1)

F(5) = F(4) + F(3) = F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1)

F(6) = F(5) + F(4) = F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1) + F(1)

…

You can see where this is going. The recurrence relation generated by the algorithm, implies that we execute 3*F(n) – 2 lines of code (for n nontrivial). When n=35, this is a huge number, since F(35) = 9227465. Now you can see why the algorithm really stress-tests function calling and why Ruby 1.8.6 chokes. If we want to evaluate the computational complexity of the algorithm in terms of big-O notation, we have O(phi^n), where phi is the Golden Ratio and equal to (sqrt(5) + 1)/2. That’s exponential. Valentino, Dima and many others came up with variants of iterative or tail-recursive algorithms, whose complexity is linear (O(n)). Using memoization (the technique of keeping a cache of the previous calculations) or a simple iterative loop changes the algorithm’s efficiency. The difference between my original algorithm and these others is humongous and there is no point in comparing them as a means of evaluating the runtime speed of a given language. At that point we could very well use Binet’s formula, or Edsger Dijkstra’s algorithm or even 2×2 matrix exponentiation, but we’d not be proving any point there. If you want to learn more about algorithms (a necessity for any serious programmer), I strongly suggest the introductory textbook “Introduction to Algorithms”.

Don Stewart, a real celebrity in the Haskell community, has replied to my post with two articles that essentially illustrate a couple of points that are not new to many people: 1) Haskell is fast, much faster than Python and Ruby, 2) Haskell’s ability to take advantage of multiple cores by following parallelism hints placed in the code for the compiler, is just plain awesome and easy on programmers. Don did use old versions of Ruby and Python, but I appreciate his response a lot, because he kept the same algorithm in place. He didn’t bait and switch, using one of the many fast implementations available on the Haskell wiki. His fair comparison showed, despite the very limited scope of the test, what kind of performance we can expect from this functional language’s main compiler (GHC).

As I said in the past, Haskell really is a language worth getting excited about. But it’s not all about performance and the trend of increasingly multicored CPUs. So I’m glad that we have both Ruby and Haskell with their strengths and weaknesses. While Ruby 1.9 will hopefully give us a runtime that’s seriously fast enoughβ’ in most circumstances, it’d be nice if in Ruby’s future there were features that allowed us to take advantage of multiple cores, just like Haskell does without cumbersome code modifications.

Amongst the other replies there was a mix of everything, including Assembly and LOLcode, but I’d like to point out the post by a lisper, who took the Haskell vs Lisp approach in “Dude, your quad-cores have been smoking way too much Haskell!”. He runs the following code, first for n=45 and then for n=4500:

```
(defun printfib-trec (start end)
(format t "n=~D => ~D~%" start (fib-trec start))
(if (< start end)
(printfib-trec (+ start 1) end)))
(defun fib-trec (n)
"Tail-recursive Fibonacci number function"
(labels ((calc-fib (n a b)
(if (= n 0)
a
(calc-fib (- n 1) b (+ a b)))))
(calc-fib n 0 1)))
(time (printfib-trec 0 4500))
```

On my machine this runs in 3.291 seconds. Algorithm 101, guys. Quick question for my readers: how can an algorithm that is supposed to be O(phi^n) execute in 3 seconds, per n=4500? Simple, it’s that blogger who is being naive and not the algorithm that he adopted. If you pay attention you can see that he’s trying to compare the linear O(n) tail-recursive implementation of Fibonacci in Lisp, with the naive recursive one in Haskell, and from this he concludes “Oops. Sorry Haskellβ¦”. Slow down, cowboy! You want to compare Lisp with Haskell? Let’s do a fair comparison then. Let’s keep the same algorithm for both and use n=45, shall we?

Here is my naive/recursive Lisp version:

(defun printfib (start end)
(format t "n=~D => ~D~%" start (fib start))
(if (< start end)
(printfib (+ start 1) end)))
(defun fib (n)
"Naive-recursive Fibonacci number function"
(if (or (= n 0) (= n 1))
n
(+ (fib (- n 1)) (fib (- n 2)))))
(time (printfib 0 45))

And here is the Haskell one:

```
import Control.Monad
import Text.Printf
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
main = forM_ [0..45] $ \i ->
printf "n=%d => %d\n" i (fib i)
```

On my MacBook Pro, Intel Core 2 Duo 2.2 GHz and 2 GB of RAM, Lisp (well sbcl, which supposedly uses both cores, though there is no documented proof of this) took 259.743 seconds. See the difference between O(n) and O(phi^n)? Try n=4500 with this algorithm and the sun will have burned out before the computation is finished. Haskell, used only 1 core, and took 77.779 seconds. Hmmm, Haskell was 3.3 times faster than Lisp without even parallelizing it.

Just out of curiosity, let’s try again with Don’s code which still implements the same algorithm (whose complexity is O(phi^n)), but which introduces parallelism hints for the compiler. Remember, this is being done out of curiosity, we already established that for this particular micro-benchmark Haskell smokes Lisp away.

```
import Control.Parallel
import Control.Monad
import Text.Printf
cutoff = 35
fib' :: Int -> Integer
fib' 0 = 0
fib' 1 = 1
fib' n = fib' (n-1) + fib' (n-2)
fib :: Int -> Integer
fib n | n < cutoff = fib' n
| otherwise = r `par` (l `pseq` l + r)
where
l = fib (n-1)
r = fib (n-2)
main = forM_ [0..45] $ \i ->
printf "n=%d => %d\n" i (fib i)
```

Now that Haskell’s program uses two cores (assuming that Lisp does too, which is unlikely), Haskell runs it in 52.248 seconds versus Lisp’s 259.743 seconds. That’s about 5 times faster than Lisp. Does this prove that Haskell is faster than Lisp in general (or to be more exact, that GHC is faster than sbcl on Mac OS X 10.5)? Nope, guys it’s just a silly micro-benchmark after all. But damn the temptation to say “Oops. Sorry Lisp…” was too strong. π

*Related*

Remember to optimize lisp code for speed π

(defun fib (n)

(declare (optimize speed (safety 0) (debug 0))

(fixnum n))

(if (<= (1- n) 0)

n

(the fixnum (+ (fib (- n 1)) (fib (- n 2))))))

With this simple change the function should become 3 to 4 times faster, making lisp not that slow compared to haskell.

Strange. With n=35, I got 3.9 seconds for Lisp and 6.9 seconds for Haskell.

Linux airstream 2.6.20-16-386 #2 Thu Jun 7 20:16:13 UTC 2007 i686 GNU/Linux

I’m running:

Dual Core AMD Opteron(tm) Processor 270

I just fired up the sbcl repl, pasted your Lisp code and changed 45 to 35 (I didn’t feel like waiting).

n=34 => 5702887

n=35 => 9227465

Evaluation took:

3.881 seconds of real time

3.792237 seconds of user run time

0.008 seconds of system run time

0 calls to %EVAL

0 page faults and

134,584 bytes consed.

NIL

With Haskell, I pasted your code into a file and ran:

ghc -o fib2 fib2.hs

time ./fib2

n=34 => 5702887

n=35 => 9227465

real 0m6.925s

user 0m6.788s

sys 0m0.024s

Wow, I just read Valentino’s post and Lisp is about 4 times faster now π

n=34 => 5702887

n=35 => 9227465

Evaluation took:

1.645 seconds of real time

1.580099 seconds of user run time

0.0 seconds of system run time

0 calls to %EVAL

0 page faults and

142,728 bytes consed.

NIL

By the way, I really like how your comments show the country flag, operating system icon and browser icon. Is that a public theme?

@Valentino: that’s cheating. π

@Brian: It’s a WordPress plugin called Firestats.

Brian: try, ghc -O2 -o fib2 fib2.hs

No optimization is done by default. -O2 will enable all optimizations.

j.

You must declare the numbers in the Common Lisp version to be fixnums if you want it to be equivalent to the Haskell solution; Haskell is using unsigned integers, but without the fixnum declaration, Common Lisp is using bignums, the equivalent of Haskell’s Integer.

If you don’t ask to optimize for speed, it’s very likely your Common Lisp compiler isn’t doing tail recursion.

Your Common Lisp solution also definitely isn’t using both cores.

No it’s not :). Haskell has mandatory static typing, lisp makes that optional and once you use it, common lisp generates asm code and executes that, that’s why it’s waaay faster :). Common Lisp is a wonderful wonderful language IMHO.

@Brian: Try Jeremy’s suggestion. For the parallel program you can use:

$ ghc Fib.hs -O2 -o fib –make -threaded

$ time ./fib +RTS -N2

“well sbcl, which supposedly uses both cores, though there is no documented proof of this”

SBCL will use both cores if you write explicitly multithreaded code, which that isn’t (no idea if threads are supported on mac os x, either – I have mac hardware, but run linux on it exclusively). Automatic parallelisation is not an SBCL feature as far as I know.

(Just a pity haskell’s syntax is so godawful compared to lisp’s.)

Also check out eigenclass’ analysis of the ASM generated by three compilers on this problem:

http://eigenclass.org/hiki/legitimate-microbenchmarks

and psykotic and others take that analysis further in the reddit comments:

http://programming.reddit.com/info/61tc0/comments/c02kcpp

Valentino wrote:

> Common Lisp is a wonderful wonderful language IMHO.

On this we entirely agree, and it’s fast too. Mine was a tongue in cheek reply to the Lisper who wrongly compared apples and oranges. π

– Include type declarations

– Set optimizations on

– Compile the functions using a fast Lisp compiler (SBCL or CMUCL)

– If you use CMUCL, then use block compilation

I tried both on my machine, and Lisp was several times faster.

For both non-tail-recursive versions:

Haskell (ghc with -O2):

real 3m7.850s (187.85s)

user 2m23.953s (143.95s)

sys 0m1.188s

Lisp (GCL with optimization turned on):

real time : 137.860 secs

run-gbc time : 117.470 secs

child run time : 0.000 secs

gbc time : 0.000 secs

“Sorry Haskell…”

Ahah, that’s cool w-g. With all the optimizations on, and using Valentino’s optimized version, I can see that Lisp is

veryfast. That’s very good. We knew already that the main compilers for Lisp and Haskell were very fast, just like OCaml and F# are too. But establishing which one is faster and for what applications, is beyond the reach of our silly test. πCompiling the Haskell code with optimizations made a *huge* difference. The fastest of several runs w/o optimization was 6.552s and with optimization was 0.860s. With that, the Lisp code took 1.8 times as long to execute.

Either one of those blows away Ruby which is what I typically code in. It took 95s π

Guys,

for precise benchmarking you should turn always off I/O during iterations. Running under clisp on my MacBook Pro (same config as yours), the recursive Fibonacci test takes 0.005539 sec for n=45, while turning off the call to the actual computation reduces it to 0.003004 sec, i.e. only a 45 % reduction in cpu time.

Granted, some of this is startup overhead, but commenting out the format statement gives only a run time of 0.000125 sec, so unless it’s smart enough to optimize out those function calls this can be neglected.

This nice thing about Python is that it has Cython ( http://www.cython.org ). The original Python code took 131s on my machine. The following is the conversion of that code to Cython:

cdef fib(int n):

if n == 0 or n == 1:

return n

else:

return fib(n-1) + fib(n-2)

def test(n):

for i in range(n):

print “n=%d => %d” % (i, fib(i))

This code runs in 2.08 CPU seconds on my machine.

You might also find this comparative test between SBCL Lisp and GHC Haskell useful. The “recursive” test does a series of recursive computations including Fibonacci, and here SBCL is only 20 % slower:

http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=sbcl&lang2=ghc

Also, note the wise words on the front page:

“How can we benchmark a programming language?

We can’t – we benchmark programming language implementations.

How can we benchmark language implementations?

We can’t – we measure particular programs.”

Geir Aalberg wrote

> this comparative test between SBCL Lisp and GHC Haskell

And note all those SBCL “due to type uncertainty” warnings.

Fastest:

Simplest:

f 0 a b = a

f n a b = f (n-1) b (a+b)

fibs n = f n 0 1

calculate fibs 80000 in 2 Sec.