Wikipedia defines memoization as “an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs.”. This typically means caching the returning value of a function in a dictionary of sorts using the parameters passed to the function as a key. This is done in order to reuse that returning value immediately without calculating it again, when the function is invoked with the same arguments. Even though we are trading space for time, it is often invaluable for speeding up certain recursive functions and when dealing with dynamic programming where intermediate calls are often repeated many times.
Using memoization in Ruby is very easy thanks to the memoize gem. The first step to getting started is therefore to install it:
$ sudo gem install memoize
Successfully installed memoize-1.2.3
1 gem installed
Installing ri documentation for memoize-1.2.3...
Installing RDoc documentation for memoize-1.2.3...
Now we can use the memoize method as illustrated in the example below:
require 'rubygems'
require 'memoize'
require 'benchmark'
include Memoize
def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end
Benchmark.bm(15) do |b|
b.report("Regular fib:") { fib(35) }
b.report("Memoized fib:") { memoize(:fib); fib(35)}
end
In the first block we simply invoke fib(35), while in the second one we first invoke the method memoize(:fib) to memoize the method fib. Running this code on my machine prints the following:
user system total real
Regular fib: 55.230000 0.160000 55.390000 ( 55.819205)
Memoized fib: 0.000000 0.000000 0.000000 ( 0.001305)
We went from almost a minute of run time to an instantaneous execution. Optionally we could even pass a file location to the function memoize and this would use marshaling to dump and load the cached values on/from disk.
For Python we can write a simple decorator that behaves in a similar manner. In its simplest form it can be implemented as follows:
# memoize.py
def memoize(function):
cache = {}
def decorated_function(*args):
try:
return cache[args]
except KeyError:
val = function(*args)
cache[args] = val
return val
return decorated_function
Or more efficiently:
# memoize.py
def memoize(function):
cache = {}
def decorated_function(*args):
if args in cache:
return cache[args]
else:
val = function(*args)
cache[args] = val
return val
return decorated_function
When the memoized function has been invoked, we look in the cache to see if an entry for the given arguments already exist. If it does, we immediately return that value. If not, we call the function, cache the results and return its returning value.
Truth be told, the limit of this approach lies in the fact that since we are using a dictionary, only immutable objects can be used as keys. For example, we can use a tuple but are not allowed to have a list as a parameter. For the example within this article, this approach will suffice, but to take advantage of memoization when using arguments that are mutable, you may want to consider the approach described in this recipe.
We can now rewrite the Ruby example above in Python as follows:
import timeit
from memoize import memoize
def fib1(n):
if n < 2:
return n
else:
return fib1(n-1) + fib1(n-2)
@memoize
def fib2(n):
if n < 2:
return n
else:
return fib2(n-1) + fib2(n-2)
t1 = timeit.Timer("fib1(35)", "from __main__ import fib1")
print t1.timeit(1)
t2 = timeit.Timer("fib2(35)", "from __main__ import fib2")
print t2.timeit(1)
Running this code on my machine prints the following:
9.32223105431
0.000314950942993
In Python 2.5’s case by employing memoization we went from more than nine seconds of run time to an instantaneous result.
Granted we don’t write Fibonacci applications for a living, but the benefits and principles behind these examples still stand and can be applied to everyday programming whenever the opportunity, and above all the need, arises.
Get more stuff like this
Subscribe to my mailing list to receive similar updates about programming.
Thank you for subscribing. Please check your email to confirm your subscription.
Something went wrong.
Hi Antonio,
nice write-up. In the Python example, you appear to invoke the memoized fib2() variant on the argument 10 (instead of 35). Unfair advantage! 😉
Cheers,
–Torsten
Hehe, good spotting Torsten. That was a typo due to my experimenting with several input sizes. It’s fixed now. 🙂
cool!
I’ve benchmarked ruby 1.9 and python 2.6, the results:
$ ruby fib.rb
user system total real
Regular fib: 42.210000 5.270000 47.480000 ( 48.118022)
Memoized fib: 0.000000 0.000000 0.000000 ( 0.001342)
$ /opt/ruby-enterprise/bin/ruby fib.rb
user system total real
Regular fib: 27.930000 0.050000 27.980000 ( 28.085893)
Memoized fib: 0.000000 0.000000 0.000000 ( 0.000714)
$ ruby1.9 fib.rb
user system total real
Regular fib: 7.450000 0.010000 7.460000 ( 7.564142)
Memoized fib: 0.000000 0.000000 0.000000 ( 0.000578)
$ python2.5 fib.py
16.7434659004
0.000143051147461
$ python2.6 fib.py
16.5563249588
0.000123023986816
great tips
what if the methods doesn’t have params/inputs …does method like that can be memoized too ?
and python3 (for the record):
$ python3 fib.py
20.1024479866
0.000137090682983
To put some shameless self promotion here: I’ve written a pretty extensive article on the topic of memoization. You can find it here: http://www.railway.at/articles/2008/09/20/a-guide-to-memoization.
JRuby 1.2.0 (ruby 1.8.6 patchlevel 287) results:
user system total real
Regular fib: 7.955000 0.000000 7.955000 ( 7.955000)
Memoized fib: 0.004000 0.000000 0.004000 ( 0.004000)
mmm but how to use it for a method that is coming from a library that we use ?
is there a way to decorate a method in an import statement or something like that ?
cheers
Marco
p.s. I’m interested in the python implementation.
adit: memoization isn’t magic – it works only on functions that always return the same value for the same parameters.
So memoization will work for function without parameters, but only if this function always return the same value 🙂 so it is not very useful.
BTW it works in such way – each call to function first checks if the function was called with the same arguments before, if it was, it returns the value it returned last time, if not, it calculates the value normally, and then keeps the value associated with parameters it was called.
So it is simple caching, and can be only used on “pure” (as in functional programming) functions.
Marco, you could declare a function that wraps an imported function, and memoize that one.
I have to disagree with ajuc that memoizing parameter-less function return values is not very useful. Whenever the cost of calculating the return value is high, and there is a possibility that the function may be called more than once, it is useful.
An example would be a result from a complex database query.
You can speed python example even more if you replace “try/except” construct in your decorator with with “if args in cache”. I had ~40% speed increase on python 3 and ~95% on python 2.6.
That’s interesting, Ivan. That was my very first approach too but I didn’t like the idea of accessing the hash twice for each hit in the cache. Exception handling must be so expensive that it’s still faster to do the double lookup. I modified the post to include this alternative as well.
You can avoid the exception and the double lookup with code like:
As a micro-optimisation you could make both cache and missing keyword arguments to the inner function. Lookup of local variables is faster. Not benchmarked any of this…
I did benchmark your suggestion, Michael, and it’s a little slower:
0.000112s (Antonio’s 2nd version)
0.000155s (your version)
But if you remove missing, the time goes down to 0.000099s:
def memoize2(function):
cache = {}
def decorated_function(*args):
val = cache.get(args)
if not val:
val = function(*args)
cache[args] = val
return val
return decorated_function
“if args in cache” seems safer, though. If function(*args) == None, it will always be reevaluated for args. Functions can also return object(), although that is admittedly less likely.