Meditations on programming, startups, and technology
New Relic

Memoization in Ruby and Python

Wikipedia defines memoization as “an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs.”. This typically means caching the returning value of a function in a dictionary of sorts using the parameters passed to the function as a key. This is done in order to reuse that returning value immediately without calculating it again, when the function is invoked with the same arguments. Even though we are trading space for time, it is often invaluable for speeding up certain recursive functions and when dealing with dynamic programming where intermediate calls are often repeated many times.

Using memoization in Ruby is very easy thanks to the memoize gem. The first step to getting started is therefore to install it:

$ sudo gem install memoize
Successfully installed memoize-1.2.3
1 gem installed
Installing ri documentation for memoize-1.2.3...
Installing RDoc documentation for memoize-1.2.3...

Now we can use the memoize method as illustrated in the example below:

require 'rubygems'
require 'memoize'
require 'benchmark'
include Memoize

def fib(n)
  return n if n < 2
  fib(n-1) + fib(n-2)
end

Benchmark.bm(15) do |b|
  b.report("Regular fib:") { fib(35) }
  b.report("Memoized fib:") { memoize(:fib); fib(35)}
end

In the first block we simply invoke fib(35), while in the second one we first invoke the method memoize(:fib) to memoize the method fib. Running this code on my machine prints the following:

                     user     system      total        real
Regular fib:    55.230000   0.160000  55.390000 ( 55.819205)
Memoized fib:    0.000000   0.000000   0.000000 (  0.001305)

We went from almost a minute of run time to an instantaneous execution. Optionally we could even pass a file location to the function memoize and this would use marshaling to dump and load the cached values on/from disk.

For Python we can write a simple decorator that behaves in a similar manner. In its simplest form it can be implemented as follows:

# memoize.py

def memoize(function):
    cache = {}
    def decorated_function(*args):
        try:
            return cache[args]
        except KeyError:
            val = function(*args)
            cache[args] = val
            return val
    return decorated_function

Or more efficiently:

# memoize.py

def memoize(function):
    cache = {}
    def decorated_function(*args):
        if args in cache:
            return cache[args]
        else:
            val = function(*args)
            cache[args] = val
            return val
    return decorated_function

When the memoized function has been invoked, we look in the cache to see if an entry for the given arguments already exist. If it does, we immediately return that value. If not, we call the function, cache the results and return its returning value.

Truth be told, the limit of this approach lies in the fact that since we are using a dictionary, only immutable objects can be used as keys. For example, we can use a tuple but are not allowed to have a list as a parameter. For the example within this article, this approach will suffice, but to take advantage of memoization when using arguments that are mutable, you may want to consider the approach described in this recipe.

We can now rewrite the Ruby example above in Python as follows:

import timeit
from memoize import memoize

def fib1(n):
    if n < 2:
        return n
    else:
        return fib1(n-1) + fib1(n-2)
   
@memoize
def fib2(n):
    if n < 2:
        return n
    else:
        return fib2(n-1) + fib2(n-2)	

t1 = timeit.Timer("fib1(35)", "from __main__ import fib1")
print t1.timeit(1)
t2 = timeit.Timer("fib2(35)", "from __main__ import fib2")
print t2.timeit(1)

Running this code on my machine prints the following:

9.32223105431
0.000314950942993

In Python 2.5’s case by employing memoization we went from more than nine seconds of run time to an instantaneous result.

Granted we don’t write Fibonacci applications for a living, but the benefits and principles behind these examples still stand and can be applied to everyday programming whenever the opportunity, and above all the need, arises.


If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

receive my posts by email

15 Responses to “Memoization in Ruby and Python”

  1. Hi Antonio,

    nice write-up. In the Python example, you appear to invoke the memoized fib2() variant on the argument 10 (instead of 35). Unfair advantage! ;-)

    Cheers,
    –Torsten

  2. Hehe, good spotting Torsten. That was a typo due to my experimenting with several input sizes. It’s fixed now. :)

  3. david says:

    cool!

    I’ve benchmarked ruby 1.9 and python 2.6, the results:

    $ ruby fib.rb
    user system total real
    Regular fib: 42.210000 5.270000 47.480000 ( 48.118022)
    Memoized fib: 0.000000 0.000000 0.000000 ( 0.001342)

    $ /opt/ruby-enterprise/bin/ruby fib.rb
    user system total real
    Regular fib: 27.930000 0.050000 27.980000 ( 28.085893)
    Memoized fib: 0.000000 0.000000 0.000000 ( 0.000714)

    $ ruby1.9 fib.rb
    user system total real
    Regular fib: 7.450000 0.010000 7.460000 ( 7.564142)
    Memoized fib: 0.000000 0.000000 0.000000 ( 0.000578)

    $ python2.5 fib.py
    16.7434659004
    0.000143051147461

    $ python2.6 fib.py
    16.5563249588
    0.000123023986816

  4. adit says:

    great tips

    what if the methods doesn’t have params/inputs …does method like that can be memoized too ?

  5. david says:

    and python3 (for the record):

    $ python3 fib.py
    20.1024479866
    0.000137090682983

  6. To put some shameless self promotion here: I’ve written a pretty extensive article on the topic of memoization. You can find it here: http://www.railway.at/articles/2008/09/20/a-guide-to-memoization.

  7. Bram says:

    JRuby 1.2.0 (ruby 1.8.6 patchlevel 287) results:

    user system total real
    Regular fib: 7.955000 0.000000 7.955000 ( 7.955000)
    Memoized fib: 0.004000 0.000000 0.004000 ( 0.004000)

  8. Marco says:

    mmm but how to use it for a method that is coming from a library that we use ?
    is there a way to decorate a method in an import statement or something like that ?
    cheers
    Marco
    p.s. I’m interested in the python implementation.

  9. ajuc says:

    adit: memoization isn’t magic – it works only on functions that always return the same value for the same parameters.

    So memoization will work for function without parameters, but only if this function always return the same value :) so it is not very useful.

    BTW it works in such way – each call to function first checks if the function was called with the same arguments before, if it was, it returns the value it returned last time, if not, it calculates the value normally, and then keeps the value associated with parameters it was called.

    So it is simple caching, and can be only used on “pure” (as in functional programming) functions.

  10. Marco, you could declare a function that wraps an imported function, and memoize that one.

  11. Neil Cook says:

    I have to disagree with ajuc that memoizing parameter-less function return values is not very useful. Whenever the cost of calculating the return value is high, and there is a possibility that the function may be called more than once, it is useful.

    An example would be a result from a complex database query.

  12. Ivan Baldin says:

    You can speed python example even more if you replace “try/except” construct in your decorator with with “if args in cache”. I had ~40% speed increase on python 3 and ~95% on python 2.6.

  13. That’s interesting, Ivan. That was my very first approach too but I didn’t like the idea of accessing the hash twice for each hit in the cache. Exception handling must be so expensive that it’s still faster to do the double lookup. I modified the post to include this alternative as well.

  14. You can avoid the exception and the double lookup with code like:

    def memoize(function):
        cache = {}
        def decorated_function(*args):
            missing = object()
            val = cache.get(args, missing)
            if not val is missing:
                return val
            else:
                val = function(*args)
                cache[args] = val
                return val
        return decorated_function
    

    As a micro-optimisation you could make both cache and missing keyword arguments to the inner function. Lookup of local variables is faster. Not benchmarked any of this…

  15. Rob Reid says:

    I did benchmark your suggestion, Michael, and it’s a little slower:

    0.000112s (Antonio’s 2nd version)
    0.000155s (your version)

    But if you remove missing, the time goes down to 0.000099s:

    def memoize2(function):
    cache = {}
    def decorated_function(*args):
    val = cache.get(args)
    if not val:
    val = function(*args)
    cache[args] = val
    return val
    return decorated_function

    “if args in cache” seems safer, though. If function(*args) == None, it will always be reevaluated for args. Functions can also return object(), although that is admittedly less likely.

Leave a Reply

I sincerely welcome and appreciate your comments, whether in agreement or dissenting with my article. However, trolling will not be tolerated. Comments are automatically closed 15 days after the publication of each article.

Copyright © 2005-2014 Antonio Cangiano. All rights reserved.