By Antonio Cangiano, Software Engineer & Technical Evangelist at IBM
Currently Browsing: Python

Thoughts on Clojure

Lisp has had a tremendous impact on the world of programming. Even though Common Lisp and Scheme — the two main Lisp dialects — may not be considered mainstream today, several popular languages have been influenced by one or both of them.

It isn’t stretching things too much to say that both Ruby and Python can be seen as slower, easier (for beginners), object-oriented, infix Lisp dialects.

Some may say Ruby is a bad rip-off of Lisp or Smalltalk, and I admit that. But it is nicer to ordinary people. — Yukihiro “Matz” Matsumoto

Ruby and Python aren’t intimidating and remain very approachable for absolute beginners. Furthermore, their approachability is not confined to the language design itself, but transcends into the community and ecosystem that surrounds them.

I’m not here to discuss how languages like Ruby and Python managed to become more popular than major Lisp dialects nowadays. I’d rather focus on how these gentler introductions to functional programming are acting as gateway drugs to Lisp for many developers.

A community that values metaprogramming and is obsessed with the construction of DSLs (Domain Specific Languages) like the Ruby’s is, will no doubt find in Lisp a valuable ally. Plus, if you know Ruby inside and out, you should find Lisp to be easy enough to learn.

To attract Ruby developers though, Lisp has to offer something more than just a set of powerful features. You could say that Rails is enough of a reason to learn and use Ruby. But what is Lisp able to solve all that better than Ruby? I’ll answer that question by focusing on a specific dialect of Lisp, that I and continually more Ruby developers are getting into: Clojure.

It wouldn’t be fair to characterize the Lisp community as stagnant, but Clojure is definitely a welcomed dose of new blood. Clojure is a JVM-based modern Lisp designed for concurrency, which elegantly includes a set of carefully chosen features that are not easily found in mainstream languages.

In my opinion, Clojure has three main advantages over Ruby:

  • It’s much faster than Ruby, which makes it a better choice for intensive processing. (FlightCaster for example, uses both Rails and Clojure. Rails for the “front-end” and Clojure for the heavy lifting/forecasting.)
  • It greatly simplifies concurrent programming, making the language more future-proof as hardware manufacturers continue to produce processors with more CPU cores.
  • Clojure emphasizes functional programming and tries to minimize side effects.

Clojure’s interoperability with Java resolves the issue of only having a few available libraries, which often affects new languages. It also helps in getting people to use the language within the enterprise world where Java still dominates.

Of all the “new” languages out there, I find Clojure to be the most fun, interesting and pragmatic: it’s something worth getting excited about. I don’t really care if it turns out to be the next Ruby or not, it’s a language that’s worth knowing and using. (If you haven’t tried it yet, a decent, short introductory book is the recently published Practical Clojure.)

Clojure’s popularity may even bring more attention to Lisp in general (for example, most must-read literature uses Scheme or Common Lisp). Perhaps then, it may indirectly help introduce more traditional Lisp dialects to a new generation of programmers.


DB2 support for Django 1.2 is here

The latest release of the IBM Adapter for Django now supports Django 1.2. Aside from enabling you to use the most recent version of Django, this release adds a few new goodies into the mix, that I’m sure many will appreciate.

For example, IBM’s adapter (through the underlying DBI wrapper) now uses persistent connections, which are especially helpful when dealing with Django – as it lacks connection pooling. (Of course DB2 also has the Connection Concentrator to aid in reducing the usage of server resources and improving scalability.)

Furthermore, the adapter adds support for the DECIMAL datatype, a necessary feature when dealing with money and currencies. Various enhancements and bug fixes were included too; check them out on Google Groups.

As a reminder, DB2 Express-C is an absolutely free of charge version of DB2 and it’s production ready (not a toy version). You can download it from here. Take it for a spin, experiment – chances are you’ll like it. If you need a guide to getting started, be sure to check out this free e-book by my colleagues Raul, Ian, and Rav.


Free Python screencast about solving mazes

ThinkCode.TV’s English site is going to be launched on April 19th. To celebrate the upcoming launch and whet your appetite, a 19 minute long screencast about solving ASCII mazes with a few lines of Python code was just released for free. This video serves to illustrate Python’s elegance and power, as well as ThinkCode.TV’s approach to screencasts and education.

Free Python Screencast

In order to download the screencast, you don’t need a credit card, to provide your address or even your last name. Just head on over to this page and join the newsletter. Upon confirming your subscription, you will immediately receive an email with links to DRM-free, 720p HD files in the formats QuickTime Movie (.mov), AVI and Ogg Theora (.ogv). These videos are in English, prepared by a published serial author on the subject of Python/Django, and narrated by a native English speaker. They also include optional subtitles in the .srt format, as well as the source code which is released under the MIT license, as is customary for ThinkCode.TV to do.

I hope you enjoy the free screencast and stay tuned for the launch in three weeks.


Benchmarking Tornado vs. Twisted Web vs. Tornado on Twisted

FriendFeed, which was recently acquired by Facebook, just released an interesting piece of open source software.

Tornado is an open source version of the scalable, non-blocking web server and tools that power FriendFeed. The FriendFeed application is written using a web framework that looks a bit like web.py or Google’s webapp, but with additional tools and optimizations to take advantage of the underlying non-blocking infrastructure.

The story so far

This release generated widespread interest among the Python and open source development communities. Rightfully so. There are many reasons to like Tornado. To begin with, it’s fast — and that’s fundamental for a web server. By using nginx as a load balancer and a static file server, and running a few Tornado instances (usually one per core available on the machine) it’s possible to handle thousands upon thousands of concurrent connections on relatively modest hardware; and this isn’t just theory. Tornado has already proven its worth in the field, by allowing FriendFeed to scale graciously.

Tornado is not only a fast web server, it acts as a very lightweight application framework as well. As such, it’s an appealing alternative to well established frameworks to the growing group of developers who’d like to develop “closer to the metal” and avoid the baggage associated with full-fledged web frameworks. The two things combined make Tornado ideal for developing “real time” web services and applications.

The feedback so far hasn’t been all positive though. Criticism of the project has mainly focused on the lack of test coverage and the fact that FriendFeed has opted not to contribute to, and improve on, the existing Twisted Web project (which has similar goals). To make things worse, there were a few nonchalant comments about it as well. Performance issues and lack of ease of use were the reported motivations for starting a new project from scratch.

Dustin Sallings started working on a hybrid solution (henceforth Tornado on Twisted) that would reportedly keep the good parts that Tornado introduced, while using Twisted as its core for networking and HTTP parsing.

At this point I became naturally curious about the speed of these three web servers. Is Tornado really faster than Twisted Web? And what about Tornado on Twisted, would it be faster or slower? Let’s find out.

Benchmark results

I ran a simple Hello World app for all three web servers. All the web servers were run in standalone mode without a load balancer. I stress tested the web servers with httperf using a progressively larger amount of concurrent requests. 100,000 requests were generated for each test. The web servers were run on a desktop machine with an Intel® Core™2 Quad Processor Q6600 (8M Cache, 2.40 GHz, 1066 MHz FSB) processor and 8GB of RAM. The operating system of choice was Ubuntu 9.04 (x86_64).

Without further ado, here are the results:

Throughput for Tornado, Twisted, and Tornado on Twisted

As you can see Tornado turned out to be faster than the rest of the Python web servers. Handling a peak of almost 3900 req/s with a single front-end and on commodity hardware is nothing to sneer at.

Twisted Web didn’t do too bad either (max. 2703.7 req/s), but the difference in performance is noticeable. Likewise, the performance of Tornado on Twisted was virtually identical to that of Twisted Web.

There you have it. I was curious about the possible outcome and now I know. Remember, this is a report on the numbers I got on my machine, not a research paper. But I hope that you find them interesting nevertheless.

Show me the code

Tornado:

import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web
import logging

from tornado.options import define, options

define("port", default=8888, help="run on the given port", type=int)

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world!")

def main():
    tornado.options.parse_command_line()
    application = tornado.web.Application([
        (r"/", MainHandler),
    ])
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(options.port)
    tornado.ioloop.IOLoop.instance().start()

if __name__ == "__main__":
    main()

Twisted Web:

from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor
from twisted.web import server, resource

class Simple(resource.Resource):
    isLeaf = True
    def render_GET(self, request):
        return "Hello, world!"

site = server.Site(Simple())
reactor.listenTCP(8888, site)
reactor.run()

Tornado on Twisted:

from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor

import tornado.options
import tornado.twister
import tornado.web
import logging

from tornado.options import define, options

define("port", default=8888, help="run on the given port", type=int)

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world!")

def main():
    tornado.options.parse_command_line()
    application = tornado.web.Application([
        (r"/", MainHandler),
    ])

    site = tornado.twister.TornadoSite(application)
    reactor.listenTCP(options.port, site)

    reactor.run()

if __name__ == "__main__":
    main()

UPDATE (September 14, 2009):

  • The original version of this post included Unicorn as well. This wasn’t fair however, since it’s not an asynchronous web server.
  • EventMachine HTTP Server was added, but I have since decided to remove it as I prefer to let the article be a fair comparison between asynchronous Python web servers.
  • I initially used Apache Benchmark (ab). The results were misleading at best. I re-ran the tests with httperf and updated the results above.
  • Stock Tornado couldn’t be tested with httperf because their HTTP Server doesn’t implement getClientIP(). I had to manually modify a method to return the remote ip address. This may introduce a very minimal advantage for Tornado, but it should be negligible in this context.
  • I modified the examples for Twisted and Tornado on Twisted, to ensure that both took advantage of the epoll-based reactor.

Improve the speed and security of your SQL queries

An easy way to improve the performance and security of SQL queries is to replace literals with parameters. By replacing literal values with parameters, advanced relational databases will be able to compile your queries and have their execution plans cached. This saves time and precious resources when the same query (minus the actual values) is executed over and over.

Consider the following series of queries:

SELECT * FROM users WHERE karma BETWEEN 100 AND 499;
SELECT * FROM users WHERE karma BETWEEN 500 AND 999;
SELECT * FROM users WHERE karma BETWEEN 1000 AND 1999;
SELECT * FROM users WHERE karma BETWEEN 2000 AND 4999;
SELECT * FROM users WHERE karma BETWEEN 5000 AND 9999;
SELECT * FROM users WHERE karma BETWEEN 10000 AND 50000;

These each represent the same query and can be transformed into a single parameterized query:

SELECT * FROM users WHERE karma BETWEEN ? AND ?;

Trying to use clever tricks with quotes in order to inject arbitrary SQL code becomes futile. Parameters are considered values, and have no effect on the structure of the query itself.

Parameterized queries are therefore efficient and go a long way towards preventing SQL injection attacks in your applications. They have virtually no downside.

Newbie developers often ignore the existence of this feature and end up irritating seasoned DBAs who have to deal with the consequences of their incompetence. Leon Katsnelson argues that this is such an important matter, that every DBA should forward this Computerworld article to their developers. I tend to agree with how important of an issue that is.

That article provides the following example in Java:

String lastName = req.getParameter("lastName");
String query = "select * from customers where last_name = ?"
PreparedStatement pstmt = connection.prepareStatement(query);
pstmt.setString(1, lastName);
try { ResultSet results = pstmt.execute(); }

Here I’ll show you an example of how to work with parameterized queries from Ruby and Python. I’ll use the Ruby and Python drivers for DB2.

Ruby first:

require 'ibm_db'

conn = IBM_DB.connect("mydb", "db2inst1", "mypassword")

query = "SELECT * FROM users WHERE karma BETWEEN ? AND ?"
pstmt = IBM_DB.prepare(conn, query)

values = [500, 999]
IBM_DB.execute(pstmt, values)

while row = IBM_DB.fetch_array(pstmt)
  puts "#{row[0]}:#{row[1]}"
end

We load the driver (use mswin32/ibm_db on Windows, and ibm_db.bundle on Mac), create a prepared statement, and then bind the two parameter values to it through the execute method. We then fetch the resultset one row at a time and print the value of the first two fields for each record. For fine-tuned control we could have used the IBM_DB::bind_param method.

The Python version is very similar:

import ibm_db

conn = ibm_db.connect("mydb", "db2inst1", "mypassword")

query = "SELECT * FROM users WHERE karma BETWEEN ? AND ?"
pstmt = ibm_db.prepare(conn, query)

values = (500, 999)
ibm_db.execute(pstmt, values)

tuple = ibm_db.fetch_tuple(pstmt)
while tuple:
    print tuple[0] + ":" + tuple[1]
    tuple = ibm_db.fetch_tuple(pstmt)

As you can see, working with parameterized queries is not any harder than dynamically generating SQL queries. Yet the benefits of doing so are huge.

Unfortunately, despite being a very sound choice to base an Object-Relational Mapper (ORM) on, ActiveRecord does not use parameterized queries. Even when it looks like you are passing parameters to a given method, these are actually used to dynamically form an SQL query. Of course you are still free to use parameterized queries in your Rails applications by employing the driver directly. But I really think this is something ActiveRecord should be built upon.

Luckily for Django developers, Django’s ORM uses parameterized queries, thus improving both performance and security with a single design choice. In the Python world you couldn’t get away with ignoring parameterized queries.

For those of you using Rails, all is not lost. DB2 Express-C 9.7 has a killer feature known as the Statement Concentrator, which caches similar queries allowing them to use a shared access plan. It’s not as efficient as using prepared statements in your code, but it’s the best you can do when, as in the case of ActiveRecord, you can’t use parameterized queries directly. Leon’s article explains in greater detail how this feature actually works.


Enabling support for DB2 and Python/Django/SQLAlchemy on Mac OS X Snow Leopard

This is the Python version of a post I made about Ruby a few days ago.

Now that Mac OS X 10.6 is out, it’s time to leave the world of 32 bit computing behind. The pre-installed Python interpreter will run in 64 bit mode by default, so you may need to pay attention when installing some C-based eggs.

Assuming you have DB2 Express-C installed already, the ibm_db Python egg for DB2 can easily be installed by following these simple steps:

$ sudo -s
$ export IBM_DB_LIB=/Users/<username>/sqllib/lib64
$ export IBM_DB_DIR=/Users/<username>/sqllib
$ export ARCHFLAGS="-arch x86_64"
$ easy_install ibm_db

This will install the ibm_db C driver, and the ibm_db_dbi Python module that complies to the DB-API 2.0 specification.

You can verify that the installation was successful my running the following:

$ python
>>> import ibm_db
>>>

Now, for the Django adapter, install Django first (if you haven’t done so already):

$ sudo easy_install django

The Django adapter can then be installed as follows:

$ sudo easy_install ibm_db_django

Finally, if have installed SQLAlchemy and wish to install the DB2 adapter for it, run:

$ sudo easy_install ibm_db_sa

Please let me know if you encounter any issues, I’d be glad to help you.


Startup for sale on eBay (and it’s a great deal)

One of the best programmers I know is selling a web application on eBay, that he’s been developing and running for the past three years. Given the starting price and considering what one lucky person or company will walk away with, I must say, it’s an amazing deal. I’m writing about his auction here so that I can help it get the proper exposure it deserves and because I think it’s an incredible bargain for anyone who is interested!

BlogBabel on eBay

BlogBabel, the aforementioned site/web app, is a blog indexing and aggregation service that began in 2006. Amongst its features are the ability to detect and show the most popular blog discussions, weekly posts, books, videos, and even popular blog entries based on their location (through geotagging). It also features leaderboards of the most popular blogs.

Its codebase uses Python and Django, and consists of 27,359 physical lines of code (roughly equivalent to 6.46 person-years, according to sloccount). The R&D alone makes this application worthwhile to an interested party.

At this stage, BlogBabel has an Italian interface (located at it.blogbabel.com) and aggregates almost 15,000 Italian blogs and 5 million posts. Changing the interface to make it an international project that’s available in several languages, or switching to English (solely), would not be challenging in the least (they used to run a Spanish version as well, for example, but decided to discontinue it so as to focus on the Italian one).

BlogBabel has been featured in the mainstream Italian media and has had a noticeable influence on the Italian blogosphere. One could argue that it has been the yellow pages of the Italian blogosphere. Because of this, Ludovico Magnocavallo (the site’s creator) received substantial offers to buy BlogBabel in the past, but he turned them down because he wanted to continue building this site. Now however, due to personal circumstances and lack of time/resources, he’s willing to sell this application for what may amount to far less than its true value. And here’s the real bargain, the starting price, without a reserve, is 4,999 Euros. This is of course, a ridiculously low price for the value being offered. But Ludovico believes in letting the market decide.

If I had the funds lying around, I would buy it myself and gear it towards the English speaking world (in conjunction with the pre-existing Italian version). It’s a prepackaged, virtually ready-made startup with a great deal of potential both in its current state and in terms of what it could grow to become.

To recap, the auction includes:

  • The domain name blogbabel.com (it.blogbabel.com has a pagerank of 6);
  • The full codebase (almost 30,000 lines of code);
  • A database containing 3 years worth of data relating to the Italian blogosphere (more than 30 GB, lots of data-mining opportunities);
  • 4 hours of work to help you with setting up the site on your own servers.

BlogBabel has been running smoothly for three years, and is currently under-marketed. Optimizing ads, affiliates, and similar sources of revenue wouldn’t be hard at all, especially if one were to aim this site at the English speaking world.

Also, Ludovico has already implemented most of the code that’s necessary to allow users to have accounts (through OpenID), but since these “social features” are not fully implemented yet, they have not been deployed in production. A buyer could decide to disregard them or finish implementing them and roll out a technorati-like service. The winner of this auction could decide to implement support for Twitter, comments on social networks, sentiment analysis, etc, on their own. The possibilities are really limitless when you start with a solid engine and crawler, and already have a great deal of data at your fingertips.

I know Ludovico and he’s a stand-up guy. If you are interested in this great deal, you can bid here. If you have technical questions about this auction, please feel free to contact him directly through eBay.

UPDATE (September 8, 2009): Ludovico received an undisclosed offer for the site and a few years of maintenance work, so the auction for the site alone was suspended.


The DB2 adapter now supports Django 1.1

I’m glad to announce that the API team has just released version 1.0.2 of the adapter for Django. And on my birthday to boot, what a nice present. This version extends its support to the recently released Django 1.1, as well as incorporating the feedback that was received earlier on. :) (For installation instructions, take a look at the README file.)

IBM confirms its commitment to support Python and Django, and gives Django well deserved credentials in environments where having IBM’s support counts. Django is becoming an increasingly mature web framework with the potential to do well within the Enterprise world. Having support for DB2 will surely help.

The next step will be working with the Django team to bake DB2 support directly into Django’s releases. The code for the adapter is released under a liberal OSI-compliant license that is compatible with Django’s own BSD, and the API team is more than willing to work on the development and support of the adapter should it become part of Django. We love Django and ponies. Let’s make this happen, guys.


Serving Django Static Files through Apache

Django’s development server is capable of serving static (media) files thanks to the view django.views.static.serve. Popular web servers like Apache, Lighttpd or NGINX are much faster though, and as such should be used in production mode. Our goal is to bypass Django and let Apache (or other valid alternatives) directly serve static files like images, videos, CSS, JavaScript files, and so on, for us.

Generally speaking, for performance reasons, it’s advised that you have two different webservers serving your dynamic requests and static files. In practice, for smaller sites, people often opt to simply use one webserver. In this article, I’ll discuss how to serve the static files within your Django project, through Apache.

The first thing we need to do is distinguish between development and production mode. We can do so by simply specifying DEBUG = True (development), or DEBUG = False (production) within our settings.py file.

settings.py may include (among others) the following declarations:

# Absolute path to the project directory
BASE_PATH = os.path.dirname(os.path.abspath(__file__))

# Main URL for the project
BASE_URL = 'http://example.org'

DEBUG = False

# Absolute path to the directory that holds media
MEDIA_ROOT = '%s/media/' % BASE_PATH

# URL that handles the media served from MEDIA_ROOT
MEDIA_URL = '%s/site_media/' % BASE_URL

# URL prefix for admin media -- CSS, JavaScript and images.
ADMIN_MEDIA_PREFIX = "%sadmin/" % MEDIA_URL

*PATH constants indicate paths on your filesystem (e.g., /home/myuser/projects/myproject), while *URL constants indicate the actual URL needed to reach a given page or file.

Notice that it’s not unusual to have a /site_media URL that corresponds to a /media folder. In the example above, I opted to separate regular media files for the project from the standard ones that ship with Django for the admin section. To do this, all we have to do is create a symbolic link as follows:

ln -s /usr/lib/python2.5/site-packages/django/contrib/admin/media /path/to/myproject/media/admin

When you’re in development mode, and DEBUG = True, you want to let Django serve your static files. This can be done by adding the following snippet (or similar) to your urls.py:

if settings.DEBUG:
    urlpatterns += patterns('',
        (r'^site_media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),
    )

In production mode, the code contained within the if clause will not be executed as we’ve set DEBUG to False within settings.py.

From the Django side of things, we are good. We now need to instruct Apache. Within your virtual host file, you can specify something along the lines of:

<VirtualHost *:80>

  #...

  SetHandler python-program
  PythonHandler django.core.handlers.modpython
  SetEnv DJANGO_SETTINGS_MODULE myproject.settings
  PythonDebug On
  PythonAutoReload Off
  PythonPath "['/usr/lib/python2.5/site-packages/django', '/path/to/myproject'] + sys.path"

  #...

  Alias /site_media "/path/to/myproject/media"
  <Location "/site_media">
    SetHandler None
  </Location>
</VirtualHost>

The first group of declarations essentially tells Apache to use mod_python to handle any incoming requests. However, we don’t want Django to deal with static files, so the second group of declarations, aliases/maps the /site_media URL with the actual media directory on the server, and tells Apache to threat it as static content (with SetHandler None) bypassing de facto Django.


Memoization in Ruby and Python

Wikipedia defines memoization as “an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs.”. This typically means caching the returning value of a function in a dictionary of sorts using the parameters passed to the function as a key. This is done in order to reuse that returning value immediately without calculating it again, when the function is invoked with the same arguments. Even though we are trading space for time, it is often invaluable for speeding up certain recursive functions and when dealing with dynamic programming where intermediate calls are often repeated many times.

Using memoization in Ruby is very easy thanks to the memoize gem. The first step to getting started is therefore to install it:

$ sudo gem install memoize
Successfully installed memoize-1.2.3
1 gem installed
Installing ri documentation for memoize-1.2.3...
Installing RDoc documentation for memoize-1.2.3...

Now we can use the memoize method as illustrated in the example below:

require 'rubygems'
require 'memoize'
require 'benchmark'
include Memoize

def fib(n)
  return n if n < 2
  fib(n-1) + fib(n-2)
end

Benchmark.bm(15) do |b|
  b.report("Regular fib:") { fib(35) }
  b.report("Memoized fib:") { memoize(:fib); fib(35)}
end

In the first block we simply invoke fib(35), while in the second one we first invoke the method memoize(:fib) to memoize the method fib. Running this code on my machine prints the following:

                     user     system      total        real
Regular fib:    55.230000   0.160000  55.390000 ( 55.819205)
Memoized fib:    0.000000   0.000000   0.000000 (  0.001305)

We went from almost a minute of run time to an instantaneous execution. Optionally we could even pass a file location to the function memoize and this would use marshaling to dump and load the cached values on/from disk.

For Python we can write a simple decorator that behaves in a similar manner. In its simplest form it can be implemented as follows:

# memoize.py

def memoize(function):
    cache = {}
    def decorated_function(*args):
        try:
            return cache[args]
        except KeyError:
            val = function(*args)
            cache[args] = val
            return val
    return decorated_function

Or more efficiently:

# memoize.py

def memoize(function):
    cache = {}
    def decorated_function(*args):
        if args in cache:
            return cache[args]
        else:
            val = function(*args)
            cache[args] = val
            return val
    return decorated_function

When the memoized function has been invoked, we look in the cache to see if an entry for the given arguments already exist. If it does, we immediately return that value. If not, we call the function, cache the results and return its returning value.

Truth be told, the limit of this approach lies in the fact that since we are using a dictionary, only immutable objects can be used as keys. For example, we can use a tuple but are not allowed to have a list as a parameter. For the example within this article, this approach will suffice, but to take advantage of memoization when using arguments that are mutable, you may want to consider the approach described in this recipe.

We can now rewrite the Ruby example above in Python as follows:

import timeit
from memoize import memoize

def fib1(n):
    if n < 2:
        return n
    else:
        return fib1(n-1) + fib1(n-2)

@memoize
def fib2(n):
    if n < 2:
        return n
    else:
        return fib2(n-1) + fib2(n-2)	

t1 = timeit.Timer("fib1(35)", "from __main__ import fib1")
print t1.timeit(1)
t2 = timeit.Timer("fib2(35)", "from __main__ import fib2")
print t2.timeit(1)

Running this code on my machine prints the following:

9.32223105431
0.000314950942993

In Python 2.5′s case by employing memoization we went from more than nine seconds of run time to an instantaneous result.

Granted we don’t write Fibonacci applications for a living, but the benefits and principles behind these examples still stand and can be applied to everyday programming whenever the opportunity, and above all the need, arises.


Better Software 2009 and Pycon Italia Tre

In May I will be presenting at two conferences in Italy. The first is called Better Software 2009; it’s dedicated to the world of software development, Agile methodologies, Web 2.0 and a bunch of other buzzword compliant technologies. This conference will be held on May 6 and 7 in sunny Florence. If you speak Italian and happen to be in Europe, you can register here. Italian conferences tend to be fairly cheap, so you’ll be able to attend one day for 160 Euros or both days for 280 Euros. The price is even lower if you are a student or your company is purchasing multiple tickets. Also the first ten readers who register with the following coupon 3DNMFKNM will receive a 10% discount (I don’t receive commission for this). At “Better Software” I’ll be giving a talk about the world of startups.

Pycon Italia 3If you don’t speak Italian, you may still be interested in the second conference which is being held at the same hotel in Florence from May 8 to the 10th. The main track of Pycon Italia Tre, is in fact, being simultaneously translated into/from English/Italian. I will be presenting a spin-off that’s geared towards Python, of the talk I’m giving giving at Better Software at this conference, which will feature the very provocative title “Getting rich with Python”. Most of the audience will be composed of Italian speakers, so I’ll be presenting in my mother tongue and a real-time English translation will be provided. I’ll be among some notable company at Pycon Italia, such as Guido van Rossum, Alex Martelli (Google), Raymond Hettinger, David Boddie (QT Software), Ariya Hidayat (QT Software) and Fredrik Lundh (aka effbot), who will be speaking publically about Unladen Swallow (Google’s LLVM-based upcoming project that’s geared towards drastically improving the speed of CPython) for the first time ever. With the exception of Alex (who like myself, will be speaking Italian and have his presentation translated in English), all of the names above will be presenting in English (with a translation in Italian to be provided for those who require it).

Whatever your language, if you are in Europe, Pycon Italia Tre is definitely worth attending, especially when you consider the ridiculously low admission price (€60 for non-students or €40 for students, including tax) – which includes two buffet lunches, four coffee breaks, free Wi-Fi access, a free T-Shirt, free gadgets and randomly selected prizes. Plus, there will be all sorts of social activities and opportunities to hang out. If you plan to attend, register now before all the tickets have sold out.

I hope to see, and have the chance to meet, you in Florence!


Ruby’s Biggest Challenge for 2009

According to the TIOBE index, Ruby is holding its own in the 11th position, sandwiched between Delphi and D. Meanwhile, its “cousin” Python has jumped up in rank and is currently the 6th most popular programming language in the world, beating out C#, JavaScript and Perl. Ruby’s exponential growth appears to have truly slowed down. Even if we disregard the TIOBE Index or view it as being entirely inaccurate, there are other factors that indicate a lull in Ruby’s popularity. For example, at the end of 2005, thanks to Rails, Ruby book sales surpassed Python and were up by a hefty 1552%. Yet, according to this post on the O’Reilly radar, Ruby was the language with the biggest decline in unit sales during 2008, dropping out of the top 10 languages and moving from a 5.39% market share in 2007 to just 3.51% in 2008.

So is this decline in interest for the language, Ruby’s biggest challenge to overcome in 2009? I don’t believe so. I’d venture to guess that most developers have heard of Ruby by now, and I think it’s fair to say that as a community, we’ve attracted a lot of attention towards Ruby over the past few years. The Ruby word is clearly out. As Ruby moves forward, organic growth is expected and the numbers above shouldn’t scare you in the least.

Ruby’s challenge for 2009 is not about adoption, marketing or – to adapt a term from the Christian vernacular – trying to convince other developers to accept Ruby into their hearts. The real challenge will be technical, namely moving away from the main Ruby 1.8 interpreter.

Historically, Ruby has been an exceptionally well designed programming language with a very lousy implementation. Some of the main issues surrounding MRI are common knowledge: memory hungry when compared to other scripting languages, extremely slow, lack of native threads, and lack of support for Unicode.

Ruby 1.9.1 resolves these issues though. As such, we as a community should really make an effort to get rid of our MRI baggage and move forward as quickly as possible to embrace Ruby 1.9.x. The payoff is an improved language with a faster and “less memory intensive” VM, as well as native threads (albeit with GIL) and support for multi-byte strings. There’s no reason to look at the past. A stable version is available and we should all be using it.

In practice, very few people have switched to Ruby 1.9. Some developers wrongly believe that Ruby 1.9 is just one intermediary step to Ruby 2.0, and as such it’s not meant to be used in production. Better communication could have avoided this common misconception. More importantly though, developers are not using Ruby 1.9 because there are very few libraries that work with it.

The Rails team is a notable exception, having placed a lot of effort into a release (2.3) that works completely with Ruby 1.9.1. But most libraries, gems and plugins won’t work with it, so inevitably Rails on Ruby 1.9.1 loses a lot of its initial appeal.

Unlike in the Python community where Python 3 is seen as an improvement to the language (Python 2.5/2.6 are perfectly fine for the time being) the Ruby community doesn’t have this sort of “luxury”. We finally have the chance to eliminate the root causes behind the harsh criticism that Ruby is sometimes subjected to, and to have a good implementation at our disposal. All we have to do is make a swift switch to Ruby 1.9.

To achieve this worthy goal I urge project owners to report compatibility with Ruby 1.9.1 information in their README files. I realize that this is open source and that doing so is a voluntary effort, but I truly think that Ruby 1.9.1 should be seen as a priority by the community as a collective whole. If you are not a project owner, you can still help by testing active libraries with Ruby 1.9.1 and informing the author of the library you test of your findings. Those who are able to, could also submit a patch that would enable those projects to work with the latest version of Ruby.

In truth, it wouldn’t be a bad idea to keep a list, perhaps within a wiki, of projects that have already been ported to Ruby 1.9 and that have been tested/confirmed as working. This switch to Ruby 1.9.1 can also act as a reset button when it comes to getting rid of many of the old, unmaintained, half-assed attempts from N years ago. Porting to Ruby 1.9.1 could act as a rough, implicit line of distinction between active and inactive projects.

I don’t know if this is an open letter to the Ruby community per se, but you could view it as such, as I feel that the topic of switching to Ruby 1.9.1 is one of vital importance for us Rubyists. If you agree with this point and assessment of the situation, please consider spreading the word, sharing your thoughts, and linking to this post.

When new developers come to the Ruby world, lets greet them with Ruby 1.9.x. In the long term, doing so will improve our growth as a community more than any marketing effort ever could (and the two efforts are not mutually exclusive either). Ultimately, Ruby’s biggest challenge may just be our greatest opportunity to improve.


IBM’s Python driver is out of beta

There’s a new release of the Python driver/wrapper for DB2 and Informix in town. Version 0.7.0 is officially the first stable, production ready release. It includes fixes for a few known bugs and fully supports Unicode.

This driver, and the DB-API 2.0 wrapper it ships with, have been released under the Apache License 2.0. What’s more, the IBM API team is now legally allowed to accept your contributions to the project.

If you use this release in your projects, let us know, we’d love to hear how our releases are getting used in production.


Introducing Redis: a fast key-value database

RedisOne of the many advantages of having remarkable friends is learning quite early on about their most ambitious and interesting projects. Today, I’m going to talk about Redis, one such project that my friend Salvatore “antirez” Sanfilippo started.

Redis (REmote DIctionary Server) is a key-value database written in C. It can be used like memcached, in front of a traditional database, or on its own thanks to the fact that the in-memory datasets are not volatile but instead persisted on disk. As such it’s also very similar to memcachedb, though unlike the latter, Redis provides you with the ability to define keys that are more than mere strings (as well as being able to handle multiple databases). At this early stage (beta 6), lists, sets and even basic master-slave replication are supported, but more features are in the works (including compression).

Despite being a very young project, it already has client libraries for several languages: Python and PHP (by my friend Ludovico Magnocavallo), Erlang (by my friend Valentino Volonghi of Adroll.com), and Ruby by Ezra Zygmuntowicz. Except for Ezra, who should no doubt be a familiar name to most, Redis is pretty much an Italian product; and like other Italian products such as Lamborghini and Ferrari, this schema-less database is amazingly fast.

On an entry level Linux box, Redis has been benchmarked performing 110,000 SET operations, and 81,000 GETs, per second. As you can imagine, fast performance is one of the major goals of this project, and having chosen linked lists to have at the core of Redis’ implementation allows it to perform PUSH operations in O(1).

Salvatore has implemented a Twitter clone known as Retwis to showcase how you can use Redis and PHP to build applications without the need for a database like MySQL or any SQL query. He used PHP in order to reach a wide audience, but of course you can do the same with Python, Ruby or Erlang. The remarkable thing is how fast this clone is. According to Apache’s benchmark data, Salvatore’s commodity server (a Pentium D which is also running several large sites) could handle 150 pageviews per second (6 milliseconds each) for each of the 50 concurrent users. This was possible while using the grand total of 1 MB of RAM for the database. Of course, this is just a quick benchmark and there wasn’t a huge deal of data in the database either, but the responsiveness was very impressive nevertheless.

Salvatore will be publishing a beginner’s article based on the PHP Twitter clone he wrote, soon. It should appear on this wiki page where the code is already available, within the next couple of days. You can follow Salvatore and the evolutions of this project through his Twitter account. So check Redis out and (especially if you have experience with key-value databases) don’t forget to provide your feedback and/or contribute to the project.


DB2 support for Django is coming

Online Surveys & Market Research

A few weeks after DB2 Express-C for Mac OS X was announced, I’m here to let you in on another great scoop. DB2 support for the Django web framework is going to be available soon to the community, under the permissive Apache 2.0 License. We are presently waiting for clearance from our lawyers, but the code has been written and tested, and Django is finally working with DB2. This comes on the heels of a new release of the Python driver for DB2, version 0.6.0, which adds full support for Unicode.

The Django community will soon be able to use the rock solid database management system which is DB2, and enjoy all the advantages that it provides. Would you like to introduce Django into your enterprise environment, where DB2 is already in use? If so, you’ll now have an easier time with this. Want to use DB2 as a competitive advantage for your startup? Now you can, whether you opt to use Django/Python, Rails/Ruby, Zend Framework/PHP or Perl.

I have been pushing for Django’s ORM support since 2006, and I distinctly remember the initial reactions of some people at IBM, they were along the lines of, “Djan… what?”. Unlike Rails, Django was much less known back then, especially among IT managers, and in all fairness, while powerful and very productive, the inherited Python philosophy that “explicit is better than implicit” made it look more complex – or at least less impressive – than Rails during 10 minute demos. But I insisted that it was important for our DB2 strategy and for the Django community, and now it’s finally a reality, thanks to the hard work of the IBM API team. Just like for Rails and Ruby, IBM will be the first and only vendor to officially support a Python driver, SQLAlchemy and Django’s ORM adapters.

I can’t help but think, what’s next? What language and/or framework truly needs some DB2 love? I’m definitely interested in a few languages and frameworks, and have already advocated for some of these as well, but I’d like to hear your opinions on this topic. I have created a poll that asks you which, among the technologies that we don’t currently support, do you think it would be most beneficial to have DB2 support for. Feel free to express your opinions in the comment section, as well as in the poll.



Disclaimer: The opinions expressed in this post are mine and mine alone, and do not necessarily represents the opinions of my employer, IBM. The poll is not an official IBM survey.


Monte Carlo simulation of the Monty Hall Problem in Ruby and Python

Reading Jeff Atwood’s post The Problem of the Unfinished Game, reminded me of a similar problem. The Monty Hall Problem is a well known probability puzzle that has tricked many people. In fact, if you are not familiar with it already, chances are that you’ll get it wrong. And you would be in good company along with many mathematicians and physicists, including the great mathematician, Paul Erdos. This puzzle is loosely based on the television show Let’s Make a Deal, and is equivalent to some much older puzzles you may be familiar with (e.g. the three prisoners problem). In its simplest form, it asks the following question:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

This definition of the problem is admittedly ambiguous. Thankfully Wikipedia points us towards a more exact definition:

Suppose you’re on a game show and you’re given the choice of three doors. Behind one door is a car; behind the others, goats [that is, booby prizes]. The car and the goats were placed randomly behind the doors before the show. The rules of the game show are as follows: After you have chosen a door, the door remains closed for the time being. The game show host, Monty Hall, who knows what is behind the doors, now has to open one of the two remaining doors, and the door he opens must have a goat behind it. If both remaining doors have goats behind them, he chooses one randomly. After Monty Hall opens a door with a goat, he will ask you to decide whether you want to stay with your first choice or to switch to the last remaining door. Imagine that you chose Door 1 and the host opens Door 3, which has a goat. He then asks you “Do you want to switch to Door Number 2?” Is it to your advantage to change your choice?

The Monty Hall Problem

Think about it for a moment, then read on. To answer this question, most people will try to determine which of the two possible outcomes has a higher probability. Problems arise when trying to correctly calculate the probability of these two events though. There are two closed doors and the car could be behind either of them. Hence, most people’s “common sense” and psychology leads them to believe that there is a 50% chance that the car is behind the initially selected door, and 50% that it’s behind the other closed door that was offered up by Monty. Initially it would seem that switching or staying with the first choice doesn’t really make a difference.

Unfortunately that’s not the right answer. The correct answer is that there is a two out of three chance of winning by switching to the other door; so switching is always to your advantage. This result is considered to be a paradox because it’s very counterintuitive to the way that many people think. It is in fact so counterintuitive that most people will argue with you in an attempt to convince you otherwise. I invite you to check out the Wikipedia entry on the problem/paradox, to read a step-by-step explanation with figures about why switching gives you about 66.7% chance of winning the car and why staying with the initial choice gives you only a 33.3% success rate.

When you make your first choice your probability of winning the car is only 1/3. If you decide to switch, you will win only if the first choice you made was wrong. And since your first choice came with a 2 out of 3 chance of picking a goat, switching will then (logically) give you 2/3 chance of winning. Another easy way to come to intuitively accept this surprising result, is to wildly exaggerate the terms of the problem. If there were a billion doors, you picked one, and then Monty proceeded to open up all the remaining doors but one, we’d have a situation where it would be extremely unlikely that you picked the right door at the beginning, while it would be extremely likely that the remaining door was the one that was concealing the car.

Even after reading several explanations and aids to understand these results, there are still people who are skeptical or refuse to believe them. Let’s verify the outcome with a simulation.

What you find below is a quick Ruby script that I wrote to run a Monte Carlo Simulation of the Monty Hall problem/paradox. It runs the game a million times and then measures how many times the player won by sticking with their first choice, and how many times switching would have led to winning the car.

#!/usr/bin/env ruby -w

# Monte Carlo simulation for the Monty Hall Problem:
# http://en.wikipedia.org/wiki/Monty_Hall_problem

=begin
When using a Ruby version older than 1.8.7
define the following two methods:

  class Array
    def shuffle
      self.sort_by { rand }
    end
    
    def choice
      self.shuffle.first
    end
  end
=end

# Utility class for the simulation of a single Monty Hall game.
class MontyHall
  def initialize
    @doors = ['car', 'goat', 'goat'].shuffle
  end

  # Return a number representing the player's first choice.
  def pick_door
    return rand(3)
  end

  # Return the index of the door opened by the host.
  # This cannot represent a door hiding a car or the player's chosen door.
  def reveal_door(pick)
    available_doors = [0, 1, 2]
    available_doors.delete(pick)
    available_doors.delete(@doors.index('car'))
    return available_doors.choice
  end

  # Return true if the player won by staying
  # with their first choice, false otherwise.
  def staying_wins?(pick)
    won?(pick)
  end

  # Return true if the player won by switching, false otherwise.
  def switching_wins?(pick, open_door)
    switched_pick = ([0, 1, 2] - [open_door, pick]).first
    won?(switched_pick)
  end

  private

  # Return true if the player's final pick hides a car, false otherwise.
  def won?(pick)
    @doors[pick] == 'car'
  end
end

if __FILE__ == $0
  ITERATIONS = (ARGV.shift || 1_000_000).to_i
  staying = 0
  switching = 0

  ITERATIONS.times do
    mh = MontyHall.new
    picked = mh.pick_door
    revealed = mh.reveal_door(picked)
    staying += 1 if mh.staying_wins?(picked)
    switching += 1 if mh.switching_wins?(picked, revealed)
  end

  staying_rate = (staying.to_f / ITERATIONS) * 100
  switching_rate = (switching.to_f / ITERATIONS) * 100

  puts "Staying: #{staying_rate}%."
  puts "Switching: #{switching_rate}%."
end

And here is an “equivalent” version I wrote in Python:

#!/usr/bin/env python
"""
Monte Carlo simulation for the Monty Hall Problem:
http://en.wikipedia.org/wiki/Monty_Hall_problem.
"""
import sys
from random import randrange, shuffle, choice

DOORS = ['car', 'goat', 'goat']

def pick_door():
    """Return a number representing the player's first choice."""
    return randrange(3)

def reveal_door(pick):
    """Return the index of the door opened by the host.
    This cannot be a door hiding a car or the player's chosen door.
    """
    all_doors = set([0, 1, 2])
    unavailable_doors = set([DOORS.index('car'), pick])
    available_doors = list(all_doors - unavailable_doors)
    return choice(available_doors)

def staying_wins(pick):
    """Return True if the player won by staying
    with their first choice, False otherwise.
    """
    return won(pick)

def switching_wins(pick, open_door):
    """Return True if the player won by switching,
    False otherwise.
    """
    other_doors = set([pick, open_door])
    switched_pick = (set([0, 1, 2]) - other_doors).pop()
    return won(switched_pick)

def won(pick):
    """Return True if the player's final pick hides a car,
    False otherwise.
    """
    return (DOORS[pick] == 'car')

def main(iterations=1000000):
    """Run the main simulation as many
    times as specified by the function argument.
    """
    shuffle(DOORS)

    switching = 0
    staying = 0

    for dummy in xrange(iterations):
        picked = pick_door()
        revealed = reveal_door(picked)
        if staying_wins(picked):
            staying += 1
        if switching_wins(picked, revealed):
            switching += 1

    staying_rate = (float(staying) / iterations) * 100
    switching_rate = (float(switching) / iterations) * 100

    print "Staying: %f%%" % staying_rate
    print "Switching: %f%%" % switching_rate

if __name__ == "__main__":
    if len(sys.argv) == 2:
        main(int(sys.argv[1]))
    else:
        main()

Even if you are not familiar with Ruby or Python, you may be able to understand what’s going on here. The main body of the program emulates the game and keeps track of the number of victories when the player sticks with their initial choice, and when they switch. Notice that this code intentionally tries not to be clever, in order not to annoy “skeptical” people.

There are many points in the code where correct assumptions about the problem would lead us to code that is faster and much more compact. For example, if the player wins a given game by sticking with his first answer, it’s obvious that switching would have made him lose. We could just calculate the difference between 100 and the success rate of staying with the first choice, and we’d obtain the success rate for switching. But here we are trying to simulate the problem as faithfully as possible and abstract as little as necessary.

As always with Monte Carlo Simulations, the outcome is slightly variable during each run since it depends on random input; but by the law of large numbers, it will very slowly converge to the expected values (despite the pseudo-randomness used here). For example, when I executed the code above for the first time on my machine, I obtained the following:

Staying: 33.382%.
Switching: 66.618%.

The results of this simulation should be enough to convince you that the theoretical results are actually true; we are easily fooled, and the mathematicians who got it right were not making stuff up. ;-)

Happy New Year to my readers, I wish you all the best for a happy, successful 2009!


Merb, Rails Myths, Language Popularity and other Zenbits

Zenbits are posts which include a variety of interesting subjects that I’d like to talk about briefly, without writing a post for each of them.

Merb: A few days ago Merb 1.0 was released. Congratulations to Ezra Zygmuntowicz on this important milestone, the Merb community and Engine Yard (who finances the project). Merb 1.0 wasn’t even out yet when some people had already started commenting on the fracturing of the Ruby community that this new framework might bring with this, and the impact that this high visibility “competitor” might have on Rails. I believe that having more than one widely adopted web framework will only benefit the Ruby community. Furthermore, it’s important to remember that this is not a zero-sum game. Ruby programmers are perfectly capable of learning two frameworks and using one or the other, depending on the project at hand. This is particularly true if we consider that Merb, for all of its advantages – and disadvantages – when compared to Rails, is not totally different from its forerunner. If you are an expert Rails programmer, you should be able to become proficient in Merb in very little time. To help with this process, the Merb community needs to concentrate on the documentation now, given that the API is finally stable.

Rails Myths: David Heinemeier Hansson began a series of posts about Rails Myths. I like the idea of seeing common myths addressed straight from the horse’s mouth. Over the past two years, Rails has received quite a bit of backslash and old fashion FUD, so it’s important to set the record straight, whether the myths are entirely fabricated or if there is some element of truth to them. Whether you agree with David or not, it’s also nice to hear two sides of the same story. In fact, at the beginning of my book I debunk a few myths, just to set the record straight regarding what some readers may have heard surrounding the framework. It was a fun part to write.

My Book: Speaking of my book, Ruby on Rails for Microsoft Developers, I’m getting closer to the finish line. I’m about to complete Chapter 9 (out of eleven chapters). The initial schedule I was provided with has been extended slightly so that there will be sufficient time to properly review the content and ensure that it’s up to date with the final release of Rails 2.2. Some people wondered what the “Microsoft Developers” part means. Is it for people that work at Microsoft? Is it for .NET programmers? Is it for people who develop on Windows?

The truth is that “Microsoft Developers” is probably just a marketing term that Wrox selected as a catch-all for of the aforementioned categories of programmers. As an author I’m trying to serve all of them well, by providing a guide that sneaks in much of the Rails culture and softens the migration path by using an Operating System, and to a certain extent, tools that they’re already familiar with. In my opinion one of the major obstacles when switching to, or trying, Rails when coming from the Microsoft world, is the culture shock. The documentation and most books assume that you are familiar with *nix systems and tools, and this can be frustrating for those who are forced not only to learn a new language and framework, but also an entirely new set of tools. As it’s targeted at Microsoft developers, the book obviously makes quite a few references and comparisons to the .NET world, where they fit. This is done so that the many .NET programmers amongst the group of so called “Microsoft Developers” will find the book particularly useful. Yet the book remains generic enough so that it can be used by any programmer (particularly Windows users), even those without any knowledge of the Microsoft .NET Framework or ASP.NET.

Python books: While on the subject of books, I wanted to mention that the final version of the Pylons book is available online. Despite the much less fancy UI, the book pretty much does what the Django Book did in the past. And both are available in print as well (The Definitive Guide to Django: Web Development Done Right and The Definitive Guide to Pylons). Pylons is a Python web framework that can be viewed as a Ruby on Rails clone, in a far greater way than Django could ever be considered.

Another thing I want to mention is that I received a copy of Expert Python Programming. I haven’t gotten to far into it yet, but from what I’ve seen so far, things look good. I hope to be able to read it through, over a weekend in the near future and then provide a proper review. Stay tuned.

Language Popularity: If you take a look at the TIOBE Index, you’ll notice a few interesting things: Ruby has dropped two positions since last year, and it’s now the 11th most popular language in the world. This shouldn’t be cause for concern though, as shown by this Ruby graph. Python on the other hand is increasing in popularity and moved from the 7th to the 6th most popular language. Interestingly, according to the index (the results of which are educated guesses only), Python would seem to be more popular than C#. I find this to be true, in terms of online activity within an increasingly vibrant community, but in my opinion, the job market hasn’t caught up yet. In fact, at least in Toronto, when there’s a Python opening it’s pretty much an event that’s worthy of being discussed on the local Python mailing list. C# openings are much more common. This may be different in Silicon Valley, of course. It would also seem that Delphi has experienced a huge come back, moving from the 11th position last year to the 8th one this time around. It’s hard to imagine that Delphi has had a similar level of adoption as C# and thus has become more popular than Perl, JavaScript and Ruby. Delphi is a great solution for Win32 programming, but I don’t quite believe this overly optimistic outlook. And if this is the case, where are all the Delphi jobs and buzz?

DB2: This interview shows a few good reasons why even smaller and medium sized companies are increasingly adopting DB2. And while the video doesn’t mention it, IBM is coming out with an updated version of DB2 Express-C 9.5. This new version, 9.5.2 or 9.5 FixPack 2, is going to introduce exciting new features, including an engine for full text search.

The Great Ruby Shootout These days you hear a lot of talk about parallel programming. Intel promotes it and despite their bias, it’s plausible that parallel programming will become important as the CPU market heads towards an increasingly larger number of cores, as opposed to focusing on the frequency of said CPUs. In the world of Ruby, this translates into multiprocessing, as opposed to multithreading due to the infamous GIL (Global Interpreter Lock). This means that Ruby will most likely approach the problem similarly to how Python 2.6 did with the multiprocessing module, which is a process-based interface. The obvious exceptions are JRuby and IronRuby, which establish a 1 to 1 relationship between green threads and OS threads.

For the shootout, it would be interesting to see some multithreaded code, so as to get a better sense of how well JRuby and IronRuby compare to MRI and 1.9, when more cores are available. In fact, the long-promised shootout will be performed on a quad-core machine with 8GB of RAM. If Charles Nutter, John Lam, or any of their team members would like to contribute some programs that are able to take advantage of “native” multithreading, I’d be very happy to include them in the Ruby Benchmark Suite, to be used for my shootout.

The repository requires some love and refactoring, since it needs to be split in two types of benchmarks. The simpler one will evaluate the execution time minus the startup time, while the more advanced benchmark will also exclude the time required for parsing and loading modules, classes and methods in the AST. It would also be nice to test each program with variable input sizes and report these results accordingly. Right now I’m very busy with the book, but as I become more available, I’ll start working on this.

Finally, I want to point out a very interesting article about performance and UIs. Slow is indeed a very relative concept, and it’s important to understand how to analyze and respond to the user requirements when it comes to the responsiveness of an application as a user interacts with it.

Hardware: I finally bought a Trackball made by Logitech and the Microsoft Ergonomic Keyboard (Microsoft makes great hardware). I don’t have wrist problems, but I’d like to see how these two affect my extensive computer usage. I plan to report my experience as soon as I’ve had a chance to use these input devices for a while, since I know this is a topic that interests lots programmers (many of whom end up being victims of RSI, and some of the IRS :-P ). I also bought a bad-ass color laser printer which is quite handy when you’re a programmer and you are writing a book. I’ll let you know how it goes. What I didn’t buy, but still think is awesome, is the Flip minoHD. It’s the equivalent of an iPod for the world of camcorders. $235 for a camcorder that’s so perfectly compact, and yet that can record in HD, is a pretty sweet deal. I’m considering it for Christmas, assuming it reaches Canada by then.


The Rise of the Functional Paradigm

LambdaIn yesterday’s address to the Ruby community, Dave Thomas invited Rubyists to fork Ruby, to freely research and experiment with new and interesting features. If this process is successful, many of these features will inevitably see their way back into Ruby’s core, thus improving the language in leaps and bounds. And I feel he couldn’t have been any more right. In fact, the whole industry is experiencing the trend of incorporating features developed in less common languages, research languages, “toy languages” if you prefer, within mainstream ones.

Experimenting with these alternative languages is important because occasionally they themselves become widely used, and even when they fail to do so, they lend their insight to the world of software development, finding their way into other languages. This approach greatly accelerates the development of common languages for the good of their large user bases and the improvement of the software industry. It’s a win-win situation for everyone involved and for the development community as a whole.

Pay attention to the development community online, and you’ll quickly notice a few non-mainstream programming languages appear over and over again. I’m referring to languages like F#, Erlang, Haskell, Scala and Clojure. I’ll admit to a certain selection bias, given that I tend to hang out in communities where hackers and developers actively pursue the betterment of their programming skills, beyond the stereotypical 9 to 5 requirements. But nevertheless, three or four years ago the average developer probably wouldn’t have heard about any of them (at least the ones that existed at the time). And today all of these languages have active communities, books being published about them, and most programmers have at least encountered some of these names.

They are all different languages, but their common denominator is the functional paradigm. Notice that I titled this post “The Rise of the Functional Paradigm” and not “The Rise of Functional Languages”. In a sense the latter is true as well, since there’s been much more attention towards functional programming languages lately. But there is a subtle difference. I don’t expect purely functional languages to become the most used programming languages anytime soon. For the foreseeable future, I don’t predict US companies to outsource Haskell jobs to India or China, like they do today for Java or .NET projects.

Yet these functional languages serve a higher purpose. Not only do they satisfy the needs of intellectually curious developers and companies looking for a competitive advantage, but they also have a great deal of influence on the rest of the development world.

We are seeing a convergence between these two groups of languages. Functional languages will strive to become as useful as possible, with libraries and tools that are more adequate for mainstream developers, while conserving their functional purity (I’m looking at you almighty Haskell). Meanwhile, mainstream languages will slowly adopt powerful features found in these functional and other research languages, adding further expressiveness and capabilities to their largely adopted foundations. F#, the evolution of C# and the addition of LINQ should be enough evidence that this is the case at least for the .NET platform. And even C++0x and D are leaning towards the incorporation of some functional features (e.g. lambda expressions and closures). The two types of languages come from different directions but will reach a similar destination.

The ever increasingly popular Ruby, Python and JavaScript owe their success to several factors. And while they are considered multi-paradigm and were mostly aided in their popularity by their immediacy, simplicity, usefulness and a set of historical circumstances, they’re all hybrid languages that adopt functional features. The functional paradigm is becoming so common, that it’s hard to imagine seeing any new programming language rise to fame without including at least a subset of the features available in other functional programming languages. As developers, we’ve grown to expect the elegance of functional features in a language. No lambda, no party.

If the 90s were characterized by the rise of the Object Oriented paradigm, and this decade can be considered as a transition phase, then the future belongs to the functional paradigm. Whether developers prefer to mix this with other paradigms (e.g. in languages like Ruby, Python, C#, etc…), like a powerful cocktail, or shoot it straight down (e.g. in purely functional languages like Haskell), the functional paradigm is here to stay.


Take this survey and win a free ticket for the Professional Ruby Conference

Professional Ruby ConferenceAddison Wesley will hold their first Professional Ruby Conference in Boston, Massachusetts between November 17 and 20, 2008. This conference, for which Obie Fernandez is the Technical Chair, is highly educational and boasts some of the best speakers from the Ruby and Rails communities.

The organizers were kind enough to invite me, offering me a complimentary pass for the Professional Ruby Conference. I won’t be able to attend, so I decided to donate my free admission to one lucky reader. They also provided me with a priority code (like a coupon) for my readers, which entitle you to receive a $200 discount off the regular admission price.

I really value your opinions and I’d appreciate it if you could take this survey, so that I can improve the quality of this blog. At the end of the survey you’ll receive your $200 discount code, and will be entered into my draw for your chance to win one free ticket. I will announce and get in touch with the winner early next week (Monday or Tuesday depending on participation levels).


SURVEY


Pygments TextMate Bundle

Following my last post, a few people asked me to create a Pygments TextMate bundle. Ask and ye shall receive (on GitHub).

The Pygments menu

Prerequisites


Install Pygments following these instructions.


Installation


First method:

sudo mkdir -p /Library/Application\ Support/TextMate/Bundles
cd /Library/Application\ Support/TextMate/Bundles
git clone git://github.com/acangiano/pygments-textmate-bundle.git "Pygments.tmbundle"

If TextMate is running while you perform the update, execute the following:

osascript -e 'tell app "TextMate" to reload bundles'

This is equivalent to selecting Bundles -> Bundle Editor -> Reload Bundles from within TextMate.

Second method: Download this file, unzip it, and double click on Pygments.tmbundle.

By the way, add the following to your stylesheet if you’d like to see a scrollbar when displaying very long lines of code. This adds a nice border as well:

.highlight { border: 1px solid silver; padding-left: 5px; margin-bottom: 0.5em; overflow-x:auto; }

« Previous Entries

Copyright © 2005-2010 Antonio Cangiano. All rights reserved.