By Antonio Cangiano, Software Engineer & Technical Evangelist at IBM
Currently Browsing: Ruby

Why MacRuby Matters (Present & Future)

HotCocoa::Graphics exampleOver the years the inadequacy of Ruby’s main implementation has led to the creation of several alternatives. The greatest common divisor between these is an attempt to improve the performance of Ruby, both in terms of time and space. But every Ruby implementation has another, deeper reason for being. For example, Ruby 1.9.1 is a refactoring of the language that provides the chance to incorporate several much needed features into a relatively fast virtual machine, whereas JRuby’s truest value lies in its ability to interact with the Java ecosystem. Likewise IronRuby (which is admittedly at an earlier stage in its development) is attempting to plug a dynamic language like Ruby into the .NET world (as per its predecessor IronPython).

While MacRuby is a younger, lesser known implementation, it has the potential to become a game changer – at least for Mac developers. Based on Ruby 1.9, MacRuby’s main aim is to provide programmers with the ability to write Mac OS X applications in Ruby, making Ruby a first class Cocoa programming language. In what may sound like an utopistic effort, MacRuby strives to provide the high level abstractions, power and syntax sugar of Ruby, without the characteristic performance hit of its main implementation.

Rather than relying on a mix of Objective-C and Ruby (through a bridge like RubyCocoa), developers can use MacRuby which integrates with Mac OS X core technologies and acts as an alternative language to Objective-C. To be exact, Objective-C’s runtime and generational garbage collector are at the heart of MacRuby, but from an API standpoint, programmers can write code in Ruby, instead of in the more verbose and low level Objective-C. MacRuby maintains the ability to integrate with Objective-C code, but doing so is often unnecessary thanks to a framework known as HotCocoa, which is a Ruby wrapper around the Cocoa API.

MacRuby applications end up being succinct, easy to write and straightforward to maintain. Their look and feel is exactly the same as those of applications written in Objective-C, because they are actually native Cocoa applications. In fact, MacRuby objects are Objective-C objects which use Core Foundation data types and services. The current version, MacRuby 0.4, even allows you to package applications as self-contained .app, without having to redistribute MacRuby itself.

Sponsored and developed by Apple, MacRuby 0.4 is a stable release that can be – and already is – used to write desktop applications. But MacRuby’s real promise lies in its experimental branch. This rewrite will become MacRuby 0.5, as announced by the MacRuby team earlier today (along with a new, nice looking site). This future version of MacRuby is freakishly fast and uses the LLVM to generate code for the Objective-C runtime. The layer composed of LLVM, ObjC Runtime, Generational Garbage Collector and Core Foundation make this specific Mac OS X release possible. To this layer, add in Ruby 1.9′s AST parser and Standard Library, and you get MacRuby 0.5 as it stands today. In the future it’s likely that applications built with MacRuby will be compiled into binary code, like Objective-C ones, thus removing the issue of protecting one’s source code in commercial applications.

To really understand the value of MacRuby, consider the following Hello World program that uses RubyCocoa:

require 'osx/cocoa'; include OSX

app = NSApplication.sharedApplication

win = NSWindow.alloc.initWithContentRect_styleMask_backing_defer(
    [0, 0, 200, 60],
    NSTitledWindowMask|NSClosableWindowMask|NSMiniaturizableWindowMask|NSResizableWindowMask,
    NSBackingStoreBuffered, false)
win.title = 'Hello World'

button = NSButton.alloc.initWithFrame(NSZeroRect)
win.contentView.addSubview(button)
button.bezelStyle = NSRoundedBezelStyle
button.title = 'Hello!'
button.sizeToFit
button.frameOrigin = NSMakePoint((win.contentView.frameSize.width / 2.0) - (button.frameSize.width / 2.0),
                                 (win.contentView.frameSize.height / 2.0) - (button.frameSize.height / 2.0))

button_controller = Object.new

def button_controller.sayHello(sender)
 puts "Hello World!"
end

button.target = button_controller
button.action = 'sayHello:'

win.display
win.orderFrontRegardless

app.run

And compared to the equivalent MacRuby and HotCocoa program:

require 'hotcocoa'
include HotCocoa

application do |app|
 win = window :size => [100,50]
 b = button :title => 'Hello'
 b.on_action { puts 'World!' }
 win << b
end

The first approach reminds me of Objective-C, the second is a pure Ruby DSL.

Benchmarking the experimental branch

The question on many readers minds is probably, how fast is it? To begin with, the start-up time is negligible which is a good quality for desktop applications to have. Using the YARV tests that ship with the experimental branch, let me show you the numbers I obtained on a Mac Pro with two Quad-core Intel Xeon 2.8 GhZ and 18 GB of RAM. The tests were also run on much less “beefier” hardware with analogous results.

All the usual disclaimers apply here. These are just a few very basic micro-benchmarks that should give you a “feel” for how two VMs stack up against each other; but don’t read too much into this and don’t expect it to be a scientific report on the performance of the implementations that were tested. Also keep in mind that MacRuby 0.5 is currently an experimental release, and while it’s able to pass RubySpec’s language specifications, it is not a complete implementation so far. The team is taking incompatibility issues seriously though and will make sure MacRuby will be able to run any Ruby code.

The following table summarizes the results of these benchmarks for Ruby 1.8.6, Ruby 1.9.1 and MacRuby 0.5:

MacRuby table1

This table shows the ratios between Ruby 1.9.1 and MacRuby 0.5, respectively, against the Ruby 1.8.6 baseline. When you see a number 4, for example, that means that the given implementation was four times faster than Ruby 1.8.6.

MacRuby table2

And here is a direct comparison between Ruby 1.9.1 and MacRuby 0.5. In this case, a number 4 (for example) would mean that MacRuby was four times faster than Ruby 1.9.1 for the given test:

MacRuby table3

The following chart should help you better visualize the results shown in the first table (click on it to enlarge the picture):

MacRuby chart

Even when you consider the disclaimer above and the trivial nature of the benchmarks themselves, it’s clear that at this stage of the game, MacRuby 0.5 is built for speed. the fastest Ruby implementation around. MacRuby literally dominates Ruby 1.9.1. On “average”, according to these limited tests the experimental branch of MacRuby appears to be roughly 3 times faster than Ruby 1.9.1 (YARV), and in some cases even faster than that. You should definitely find this impressive.

For full disclosure, I’d like to explain the most likely reasons behind the four tests in which MacRuby is slower than Ruby 1.9.1:

  • bm_app_raise: MacRuby opts for cost free IA64 exceptions. What this means is that begin/rescue clauses don’t require a setjmp() like YARV does, but in case of exceptions at runtime, raising an error is more expensive. Of course, exceptions are… well, exceptional, so this has a trivial impact on real world programs.
  • bm_app_mergesort: Array operations are currently suboptimal. Slight improvements are expected.
  • bm_so_object: MacRuby’s object allocation tends to be relatively expensive. If you are allocating a zillion objects in a test, MacRuby will pay a hefty price for it.
  • bm_so_random: The performance of Fixnum has been optimized in this early release, but both Bignum and floating point operations are still suboptimal. Work is in progress and major improvements are expected to occur in future versions. In the case of Bignum, vectorization will do the trick.

The take home lesson

MacRuby is a serious project that fits in, and serves the flourishing market of Cocoa applications well. Its experimental branch is a major rewrite that, grants this implementation the title of fastest Ruby in the West that, much like Unladen Swallow for CPython, could become a very fast alternative to Ruby 1.9.x and JRuby. It’s mainly aimed at the desktop world, and as such the question of when it will work with Rails, is less pressing than it was in other early Ruby implementations. What’s certain is that Ruby is going to become a first class “scripting” language and a common choice for desktop applications on Mac. And should the difference in performance still remain in future, stable versions, it’s not hard to imagine that Apple’s server segment could also benefit from this, when Rails support becomes available.

Update (2009-04-04): I’ve modified a controversial statement since, technically speaking, while it is true that a very fast, incomplete implementation is promising and worth getting excited about, it cannot be considered the fastest Ruby implementation until a great degree of compatibility has been reached. This doesn’t diminish the value of MacRuby in any way, but rather draws a more accurate conclusion about the data that’s available today. It’s also worth noting that the few benchmarks that have been mentioned here are only part of the story. Speed aside, MacRuby’s aim and potential for Mac development still stand.

The image at the top of the post was generated by James Reynolds with HotCocoa::Graphics, a Processing-like library that uses Mac OS X’s graphics capabilities and makes them available to MacRuby.


Ruby’s Biggest Challenge for 2009

According to the TIOBE index, Ruby is holding its own in the 11th position, sandwiched between Delphi and D. Meanwhile, its “cousin” Python has jumped up in rank and is currently the 6th most popular programming language in the world, beating out C#, JavaScript and Perl. Ruby’s exponential growth appears to have truly slowed down. Even if we disregard the TIOBE Index or view it as being entirely inaccurate, there are other factors that indicate a lull in Ruby’s popularity. For example, at the end of 2005, thanks to Rails, Ruby book sales surpassed Python and were up by a hefty 1552%. Yet, according to this post on the O’Reilly radar, Ruby was the language with the biggest decline in unit sales during 2008, dropping out of the top 10 languages and moving from a 5.39% market share in 2007 to just 3.51% in 2008.

So is this decline in interest for the language, Ruby’s biggest challenge to overcome in 2009? I don’t believe so. I’d venture to guess that most developers have heard of Ruby by now, and I think it’s fair to say that as a community, we’ve attracted a lot of attention towards Ruby over the past few years. The Ruby word is clearly out. As Ruby moves forward, organic growth is expected and the numbers above shouldn’t scare you in the least.

Ruby’s challenge for 2009 is not about adoption, marketing or – to adapt a term from the Christian vernacular – trying to convince other developers to accept Ruby into their hearts. The real challenge will be technical, namely moving away from the main Ruby 1.8 interpreter.

Historically, Ruby has been an exceptionally well designed programming language with a very lousy implementation. Some of the main issues surrounding MRI are common knowledge: memory hungry when compared to other scripting languages, extremely slow, lack of native threads, and lack of support for Unicode.

Ruby 1.9.1 resolves these issues though. As such, we as a community should really make an effort to get rid of our MRI baggage and move forward as quickly as possible to embrace Ruby 1.9.x. The payoff is an improved language with a faster and “less memory intensive” VM, as well as native threads (albeit with GIL) and support for multi-byte strings. There’s no reason to look at the past. A stable version is available and we should all be using it.

In practice, very few people have switched to Ruby 1.9. Some developers wrongly believe that Ruby 1.9 is just one intermediary step to Ruby 2.0, and as such it’s not meant to be used in production. Better communication could have avoided this common misconception. More importantly though, developers are not using Ruby 1.9 because there are very few libraries that work with it.

The Rails team is a notable exception, having placed a lot of effort into a release (2.3) that works completely with Ruby 1.9.1. But most libraries, gems and plugins won’t work with it, so inevitably Rails on Ruby 1.9.1 loses a lot of its initial appeal.

Unlike in the Python community where Python 3 is seen as an improvement to the language (Python 2.5/2.6 are perfectly fine for the time being) the Ruby community doesn’t have this sort of “luxury”. We finally have the chance to eliminate the root causes behind the harsh criticism that Ruby is sometimes subjected to, and to have a good implementation at our disposal. All we have to do is make a swift switch to Ruby 1.9.

To achieve this worthy goal I urge project owners to report compatibility with Ruby 1.9.1 information in their README files. I realize that this is open source and that doing so is a voluntary effort, but I truly think that Ruby 1.9.1 should be seen as a priority by the community as a collective whole. If you are not a project owner, you can still help by testing active libraries with Ruby 1.9.1 and informing the author of the library you test of your findings. Those who are able to, could also submit a patch that would enable those projects to work with the latest version of Ruby.

In truth, it wouldn’t be a bad idea to keep a list, perhaps within a wiki, of projects that have already been ported to Ruby 1.9 and that have been tested/confirmed as working. This switch to Ruby 1.9.1 can also act as a reset button when it comes to getting rid of many of the old, unmaintained, half-assed attempts from N years ago. Porting to Ruby 1.9.1 could act as a rough, implicit line of distinction between active and inactive projects.

I don’t know if this is an open letter to the Ruby community per se, but you could view it as such, as I feel that the topic of switching to Ruby 1.9.1 is one of vital importance for us Rubyists. If you agree with this point and assessment of the situation, please consider spreading the word, sharing your thoughts, and linking to this post.

When new developers come to the Ruby world, lets greet them with Ruby 1.9.x. In the long term, doing so will improve our growth as a community more than any marketing effort ever could (and the two efforts are not mutually exclusive either). Ultimately, Ruby’s biggest challenge may just be our greatest opportunity to improve.


Do you read the Rubyist and Rails Magazine?

Books and magazines have always fascinated me. Perhaps this is due to the fact that until I was nine, my father owned a bookstore and I would spend a lot of my time hanging out in a world of dust jackets and big words. More recently, the internet has brought information sharing to a whole new level and opened up a realm of amazing possibilities. I love this this element of being online to death, but it also means witnessing the decline of (printed) book and magazine sales. After all, the information is out there on the web, and in most cases it’s available free of charge, which makes a lot of people hesitant to pay for a paper version.

Personally though, I feel that the way information is collected and organized in books and magazines still has an important, complementary role. Say that you wanted to learn Scala. You could read about it on the web – there are countless blog entries about it – but it’s hard to beat the cohesive, comprehensive approach of an excellent book about the subject. Likewise, assuming that you were past the first or second book on the topic, you might find the information that’s available online more than adequate, but what a treat would it be to have a magazine that periodically covered the subject with a collection of essays, interviews and other goodies – all authored by the best experts in your particular field of interest.

Presently, Scala does not have such a luxury, given that, while vocal, its community is still relatively small. The great news though is that Ruby however does! In fact, there are now two magazines dedicated entirely to the subject of Ruby and Ruby on Rails, both of which are free of charge if you’re happy with the electronic version (a PDF). Alternatively they can be purchased (at the rate of production cost), if you wish to receive a printed version, just like in the good old days.

The two magazines I’m talking about are “the Rubyist” and “Rails Magazine“. The Rubyist has already put out two copies, while the first edition of “Rails Magazine” just hit the stands. Our community is maturing and the existence of such initiatives is a clear vital sign of the growth that encompasses more than just numerical expansion. These publicarions are far from amateur efforts; both magazines are beautifully laid out, have or are in the process of getting an ISSN, and can boast the commercial support of several sponsors. I was blown away by the quality of the content as well. These are two serious and exciting projects, and we as a community should really get behind them by reading these magazines, blogging about them, answering their call for papers, and if you fee like it, purchasing printed copies.


Introducing Redis: a fast key-value database

RedisOne of the many advantages of having remarkable friends is learning quite early on about their most ambitious and interesting projects. Today, I’m going to talk about Redis, one such project that my friend Salvatore “antirez” Sanfilippo started.

Redis (REmote DIctionary Server) is a key-value database written in C. It can be used like memcached, in front of a traditional database, or on its own thanks to the fact that the in-memory datasets are not volatile but instead persisted on disk. As such it’s also very similar to memcachedb, though unlike the latter, Redis provides you with the ability to define keys that are more than mere strings (as well as being able to handle multiple databases). At this early stage (beta 6), lists, sets and even basic master-slave replication are supported, but more features are in the works (including compression).

Despite being a very young project, it already has client libraries for several languages: Python and PHP (by my friend Ludovico Magnocavallo), Erlang (by my friend Valentino Volonghi of Adroll.com), and Ruby by Ezra Zygmuntowicz. Except for Ezra, who should no doubt be a familiar name to most, Redis is pretty much an Italian product; and like other Italian products such as Lamborghini and Ferrari, this schema-less database is amazingly fast.

On an entry level Linux box, Redis has been benchmarked performing 110,000 SET operations, and 81,000 GETs, per second. As you can imagine, fast performance is one of the major goals of this project, and having chosen linked lists to have at the core of Redis’ implementation allows it to perform PUSH operations in O(1).

Salvatore has implemented a Twitter clone known as Retwis to showcase how you can use Redis and PHP to build applications without the need for a database like MySQL or any SQL query. He used PHP in order to reach a wide audience, but of course you can do the same with Python, Ruby or Erlang. The remarkable thing is how fast this clone is. According to Apache’s benchmark data, Salvatore’s commodity server (a Pentium D which is also running several large sites) could handle 150 pageviews per second (6 milliseconds each) for each of the 50 concurrent users. This was possible while using the grand total of 1 MB of RAM for the database. Of course, this is just a quick benchmark and there wasn’t a huge deal of data in the database either, but the responsiveness was very impressive nevertheless.

Salvatore will be publishing a beginner’s article based on the PHP Twitter clone he wrote, soon. It should appear on this wiki page where the code is already available, within the next couple of days. You can follow Salvatore and the evolutions of this project through his Twitter account. So check Redis out and (especially if you have experience with key-value databases) don’t forget to provide your feedback and/or contribute to the project.


Scaling Rails Screencasts

Scaling RailsWithin the Rails community, New Relic is a company that doesn’t need any introductions. They are synonymous with performance and reliability, thanks to their RPM product for monitoring, detecting, and fixing Rails application performance problems in real time.

What everybody may not have noticed though, is that New Relic started something called RailsLab, a site in which they publish videos and other useful information about scaling and improving the performance of Rails applications.

The first series, known as Scaling Rails, produced in collaboration with my friend Gregg Pollack, is absolutely impressive. The following is a list of the videos they’ve posted so far (they’re also available through iTunes):

  1. Introduction
  2. Page Responsiveness
  3. Page Caching
  4. Chache Expiration
  5. New Relic RPM
  6. Advanced Page Caching
  7. Action Caching
  8. Fragment Caching
  9. Memcahed
  10. Taylor Weibley & Databases
  11. Client-side Caching
  12. Additional HTTP Caching

I was surprised to see so little mention of these fantastic short tutorials in the blogosphere. These videos are a gold mine of information, which is made all the more better by the fact that they’re entirely free. Do yourself a favor and check out these awesome clips, they’re well worth your time.


Monte Carlo simulation of the Monty Hall Problem in Ruby and Python

Reading Jeff Atwood’s post The Problem of the Unfinished Game, reminded me of a similar problem. The Monty Hall Problem is a well known probability puzzle that has tricked many people. In fact, if you are not familiar with it already, chances are that you’ll get it wrong. And you would be in good company along with many mathematicians and physicists, including the great mathematician, Paul Erdos. This puzzle is loosely based on the television show Let’s Make a Deal, and is equivalent to some much older puzzles you may be familiar with (e.g. the three prisoners problem). In its simplest form, it asks the following question:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

This definition of the problem is admittedly ambiguous. Thankfully Wikipedia points us towards a more exact definition:

Suppose you’re on a game show and you’re given the choice of three doors. Behind one door is a car; behind the others, goats [that is, booby prizes]. The car and the goats were placed randomly behind the doors before the show. The rules of the game show are as follows: After you have chosen a door, the door remains closed for the time being. The game show host, Monty Hall, who knows what is behind the doors, now has to open one of the two remaining doors, and the door he opens must have a goat behind it. If both remaining doors have goats behind them, he chooses one randomly. After Monty Hall opens a door with a goat, he will ask you to decide whether you want to stay with your first choice or to switch to the last remaining door. Imagine that you chose Door 1 and the host opens Door 3, which has a goat. He then asks you “Do you want to switch to Door Number 2?” Is it to your advantage to change your choice?

The Monty Hall Problem

Think about it for a moment, then read on. To answer this question, most people will try to determine which of the two possible outcomes has a higher probability. Problems arise when trying to correctly calculate the probability of these two events though. There are two closed doors and the car could be behind either of them. Hence, most people’s “common sense” and psychology leads them to believe that there is a 50% chance that the car is behind the initially selected door, and 50% that it’s behind the other closed door that was offered up by Monty. Initially it would seem that switching or staying with the first choice doesn’t really make a difference.

Unfortunately that’s not the right answer. The correct answer is that there is a two out of three chance of winning by switching to the other door; so switching is always to your advantage. This result is considered to be a paradox because it’s very counterintuitive to the way that many people think. It is in fact so counterintuitive that most people will argue with you in an attempt to convince you otherwise. I invite you to check out the Wikipedia entry on the problem/paradox, to read a step-by-step explanation with figures about why switching gives you about 66.7% chance of winning the car and why staying with the initial choice gives you only a 33.3% success rate.

When you make your first choice your probability of winning the car is only 1/3. If you decide to switch, you will win only if the first choice you made was wrong. And since your first choice came with a 2 out of 3 chance of picking a goat, switching will then (logically) give you 2/3 chance of winning. Another easy way to come to intuitively accept this surprising result, is to wildly exaggerate the terms of the problem. If there were a billion doors, you picked one, and then Monty proceeded to open up all the remaining doors but one, we’d have a situation where it would be extremely unlikely that you picked the right door at the beginning, while it would be extremely likely that the remaining door was the one that was concealing the car.

Even after reading several explanations and aids to understand these results, there are still people who are skeptical or refuse to believe them. Let’s verify the outcome with a simulation.

What you find below is a quick Ruby script that I wrote to run a Monte Carlo Simulation of the Monty Hall problem/paradox. It runs the game a million times and then measures how many times the player won by sticking with their first choice, and how many times switching would have led to winning the car.

#!/usr/bin/env ruby -w

# Monte Carlo simulation for the Monty Hall Problem:
# http://en.wikipedia.org/wiki/Monty_Hall_problem

=begin
When using a Ruby version older than 1.8.7
define the following two methods:

  class Array
    def shuffle
      self.sort_by { rand }
    end
    
    def choice
      self.shuffle.first
    end
  end
=end

# Utility class for the simulation of a single Monty Hall game.
class MontyHall
  def initialize
    @doors = ['car', 'goat', 'goat'].shuffle
  end

  # Return a number representing the player's first choice.
  def pick_door
    return rand(3)
  end

  # Return the index of the door opened by the host.
  # This cannot represent a door hiding a car or the player's chosen door.
  def reveal_door(pick)
    available_doors = [0, 1, 2]
    available_doors.delete(pick)
    available_doors.delete(@doors.index('car'))
    return available_doors.choice
  end

  # Return true if the player won by staying
  # with their first choice, false otherwise.
  def staying_wins?(pick)
    won?(pick)
  end

  # Return true if the player won by switching, false otherwise.
  def switching_wins?(pick, open_door)
    switched_pick = ([0, 1, 2] - [open_door, pick]).first
    won?(switched_pick)
  end

  private

  # Return true if the player's final pick hides a car, false otherwise.
  def won?(pick)
    @doors[pick] == 'car'
  end
end

if __FILE__ == $0
  ITERATIONS = (ARGV.shift || 1_000_000).to_i
  staying = 0
  switching = 0

  ITERATIONS.times do
    mh = MontyHall.new
    picked = mh.pick_door
    revealed = mh.reveal_door(picked)
    staying += 1 if mh.staying_wins?(picked)
    switching += 1 if mh.switching_wins?(picked, revealed)
  end

  staying_rate = (staying.to_f / ITERATIONS) * 100
  switching_rate = (switching.to_f / ITERATIONS) * 100

  puts "Staying: #{staying_rate}%."
  puts "Switching: #{switching_rate}%."
end

And here is an “equivalent” version I wrote in Python:

#!/usr/bin/env python
"""
Monte Carlo simulation for the Monty Hall Problem:
http://en.wikipedia.org/wiki/Monty_Hall_problem.
"""
import sys
from random import randrange, shuffle, choice

DOORS = ['car', 'goat', 'goat']

def pick_door():
    """Return a number representing the player's first choice."""
    return randrange(3)

def reveal_door(pick):
    """Return the index of the door opened by the host.
    This cannot be a door hiding a car or the player's chosen door.
    """
    all_doors = set([0, 1, 2])
    unavailable_doors = set([DOORS.index('car'), pick])
    available_doors = list(all_doors - unavailable_doors)
    return choice(available_doors)

def staying_wins(pick):
    """Return True if the player won by staying
    with their first choice, False otherwise.
    """
    return won(pick)

def switching_wins(pick, open_door):
    """Return True if the player won by switching,
    False otherwise.
    """
    other_doors = set([pick, open_door])
    switched_pick = (set([0, 1, 2]) - other_doors).pop()
    return won(switched_pick)

def won(pick):
    """Return True if the player's final pick hides a car,
    False otherwise.
    """
    return (DOORS[pick] == 'car')

def main(iterations=1000000):
    """Run the main simulation as many
    times as specified by the function argument.
    """
    shuffle(DOORS)

    switching = 0
    staying = 0

    for dummy in xrange(iterations):
        picked = pick_door()
        revealed = reveal_door(picked)
        if staying_wins(picked):
            staying += 1
        if switching_wins(picked, revealed):
            switching += 1

    staying_rate = (float(staying) / iterations) * 100
    switching_rate = (float(switching) / iterations) * 100

    print "Staying: %f%%" % staying_rate
    print "Switching: %f%%" % switching_rate

if __name__ == "__main__":
    if len(sys.argv) == 2:
        main(int(sys.argv[1]))
    else:
        main()

Even if you are not familiar with Ruby or Python, you may be able to understand what’s going on here. The main body of the program emulates the game and keeps track of the number of victories when the player sticks with their initial choice, and when they switch. Notice that this code intentionally tries not to be clever, in order not to annoy “skeptical” people.

There are many points in the code where correct assumptions about the problem would lead us to code that is faster and much more compact. For example, if the player wins a given game by sticking with his first answer, it’s obvious that switching would have made him lose. We could just calculate the difference between 100 and the success rate of staying with the first choice, and we’d obtain the success rate for switching. But here we are trying to simulate the problem as faithfully as possible and abstract as little as necessary.

As always with Monte Carlo Simulations, the outcome is slightly variable during each run since it depends on random input; but by the law of large numbers, it will very slowly converge to the expected values (despite the pseudo-randomness used here). For example, when I executed the code above for the first time on my machine, I obtained the following:

Staying: 33.382%.
Switching: 66.618%.

The results of this simulation should be enough to convince you that the theoretical results are actually true; we are easily fooled, and the mathematicians who got it right were not making stuff up. ;-)

Happy New Year to my readers, I wish you all the best for a happy, successful 2009!


DB2 on Mac officially released

As pre-announced in my two previous posts, DB2 for Mac OS X Leopard is finally available for download. It’s now official, DB2 on Mac is here.

Reflections on DB2 on Mac

Several people, including myself, would happily ditch their virtual machines and start introducing DB2 into their native Mac development stacks. But this milestone represents much more than the immediate implications would have us believe. A few years ago, the idea of giving away DB2 for free would have been met with rejection. Yet, DB2 Express-C came along, and unlike the other “express” databases, it’s a true production-ready DB2 version that can be used free of charge.

Likewise, the idea of having a DB2 version for Mac was unthinkable up to a few years ago. Yet today we finally have a copy of DB2 Express-C for Mac OS X that’s available for download. Aside from this being an acknowledgment of the growing importance of Mac as a development and business platform, I feel it underlines IBM’s ability to change. The desire that a few of us mac addicts had, coupled with reasonable pressure from the community, was sufficient enough to make DB2 on Mac a reality. This matters and appeals to both the developer and the technical evangelist in me.

In the list of downloads, you’ll notice that the Mac download is only 138 MB, versus the 412 MB of Linux’s 64-bit. The reason for this difference is that DB2 Express-C for Mac currently ships in English only, and at this stage it doesn’t include either DB2 Text Search or the Java based tools like the DB2 Control Center. This lighter package is, in my opinion, a welcome side effect of this brand new beta release.

Getting started with DB2 and Rails on Mac OS X

Since the first download went live on Friday, a newer release that includes a guide for installing DB2 on Mac OS X was published and it incorporates a few changes that will make the lives of developers easier, as they approach building and using drivers (e.g. the ibm_db Ruby gem). If you downloaded this beta version over the weekend, do not worry: just grab – and execute – this shell script (e.g. sudo fixlib.sh). If you are downloading DB2 on Mac now, you won’t need this script of course.

Once you’ve downloaded DB2 for Mac OS X Leopard, please proceed to read this PDF guide, which will tell you everything you need to know (and more) about installing DB2 on your Mac, as well as providing extra details. It’s best not to skip over reading this document, as the installation on Mac OS X requires a few more steps than simply running the setup wizard.

With DB2 installed and started (sudo db2start), and the SAMPLE database created (db2sampl), you’re ready to start playing with this power horse. For details about SAMPLE’s structure you can read this article in the InfoCenter.

To run the DB2 console (known as the Command Line Processor or CLP for short), run:

$ db2

To connect to the SAMPLE database, from within the CLP run:

db2 => connect to sample

Unless you get an error, you should now be ready to query the database. For example, run the following query:

db2 => select count(*) from staff

Then to exit from the CLP, simply run:

db2 => quit

If this sanity test worked well you can proceed with installing the ibm_db gem (which includes the Ruby driver and the Rails adapter for DB2). To do so, run the following, adjusting the path to your own username of course:

$ sudo -s
$ export IBM_DB_INCLUDE=/Users/acangiano/sqllib/include
$ export IBM_DB_LIB=/Users/acangiano/sqllib/lib32
$ export ARCHFLAGS="-arch i386"
$ gem update --system
$ gem install ibm_db
$ exit

The ibm_db gem will be installed on your system and is ready to be used. To verify that this is the case, run a small Ruby program with the following code:

require 'rubygems'
require 'ibm_db.bundle'

conn = IBM_DB.connect("sample","my_username", "my_password")
if conn
 stmt = IBM_DB.exec(conn, "select count(*) from staff")
 count = IBM_DB.fetch_array(stmt)[0]
 puts "The staff table contains #{count} records."
else
  puts "Connection error: #{IBM_DB.conn_errormsg}"
end

If everything is fine and dandy, you should see the message “The staff table contains 35 records.”.

Now that Ruby can talk with DB2, we can move on to Rails. Assuming you have Rails 2.2.x installed, run the following to create a sample bookshelf application:

$ rails books -d ibm_db

This generates a Rails application (as usual) with a config/database.yml file customized for DB2. You’ll notice that unlike with MySQL, the database names are not books_development, books_production and books_test. The names are truncated by default due to the fact that DB2 currently only allows for database names that are up to 8 characters long. Feel free to change the development database in database.yml simply to ‘books’.

As a Rails developer you may also be accustomed to running rake db:create to automatically create the development database, yet this feature is not available for DB2 at this point, so instead you can create the database using the db2 command, as follows:

db2 create database books

DB2 allows you to specify all kinds of options for the creation of databases, but in its simplest form, the line above will work just fine.

Once the development database has been created, you should be able to use Rails with DB2 as you normally would with other database management systems. For example, you could scaffold a resource as follows:

$ ruby script/generate scaffold Book title:string
 author:string isbn:string description:text loaned:boolean

Start the webserver with:

$ ruby script/server

And then visit http://localhost:3000/books to perform CRUD operations on book records.

At this stage, the only caveats are that you’ll have to use the db2 command, rather than ruby script/dbconsole, and that you won’t be able to use the rename_column method in your migrations. On the plus side, you’ll have the XML datatype (t.xml in your sexy migrations) at your disposal, to natively store XML documents and retrieve them through XQuery and SQL/XML.

I really hope that you’ll enjoy DB2 on Mac! Don’t be afraid to ask for help, if you need it, in the DB2 Express-C forum. Oh and we are trying to get the word out there. Your help is highly appreciated. You can promote this story on Twitter, Hacker News, Reddit, DZone, StumbleUpon and Digg.

Disclaimer: The opinions expressed in this post are mine and mine alone, and do not necessarily represents the opinions of my employer, IBM.


Learn Merb

Merb's logoThe most effective martial artists specialize in their discipline, but are not afraid to cross-train in others. Bruce Lee—arguably the most famous and influential martial artist of the past century—trained first in Tai Chi Chuan, then Gung Fu, and boxing, as well as learning western fencing. The insight taken from so many disciplines led him to create the Jeet Kune Do form of combat.

Programmers are not all that different. Cross-training in other languages and frameworks can only improve one’s overall mastery of the craft. When it comes to Ruby frameworks, the two most popular choices are Ruby on Rails and Merb. They’re often seen as being contenders, but this truly isn’t a zero-sum game; learning both is a very sensible move. They both enable you to write web applications in Ruby, and are somewhat similar, so learning one after you know the other shouldn’t be very challenging. In the many cases people learn Merb after they’ve had some experience with Rails, but either way, acquiring a solid grasp of both frameworks provides developers with extra flexibility. Often people who learn both, will end up mostly just using one or another, depending on their individual preferences. But it’s worth knowing them so as to be able to write both CRUD-style applications that fall within Rails’ solution space, and more complex, edge cases where Rails’ opinions will end up contending with yours.

Among the reasons to give Merb a chance, is its focus on performance, a smaller memory footprint and an extreme level of modularity, which enables you to pick and choose which components you’d like to use.

Merb is not as mature as Rails, of course, but it has reached version 1.0.x and with it developers can have greater confidence in a more stabilized API. Now is perhaps the best moment to get involved and learn more about this rising framework. Not surprisingly though, Merb finds itself in a similar spot to the one that Rails was in a couple of years ago (in terms of weakness of documentation when it comes to getting started). Thankfully, this point is being taken seriously and there’s been some major progress in terms of improving the documentation for Merb. Below are some useful links to get you started with Merb.

Merb has an official API documentation, a wiki, a google group, and a community site called Merbunity for news, projects and tutorials. The irc.freenode.net #merb channel is also a useful and welcoming spot. Furthermore, there is a Peepcode PDF draft called Meet Merb. If you want something even more substantial, on the book front there are several titles coming out in the near future. These include Merb in Action, The Merb Way, Beginning Merb and Merb: What You Need To Know. There is also an open source Merb book, whose development is led by Matt Aimonetti. It’s a work in progress, but probably a very good starting point, which just happens to have the added bonus of being free. And if your interested in Merb, don’t miss InfoQ’s interview with Yehuda Katz, who’s Merb’s lead developer and one of the sharpest guys we have in the Ruby community.

Finally, if you are a professional developer who wants to quickly progress with Merb and bring their skills to the next level, do not miss your chance to attend a three day intensive course on Merb, which is being offered by Yehuda and Matt in Phoenix, AZ between January 19 and 21 (2009). Registration has been open for two days already and 20 out of the 30 available spots have already been snapped up. The remaining seats won’t last more than a day or two, so if you are interested, don’t delay (sign up now and you’ll also benefit from an early registration price).

2009 is almost here, so why not take the opportunity to learn Merb this year?


DB2 on Mac to ship before Christmas

PC Vs. MAC, DB2 Edition

This is not an official announcement, but I must share the news with you. DB2 Express-C for Mac OS X Leopard will finally be shipping out (before Christmas), in all likelihood it could be as soon as early next week. You may recall how more than a year ago I blogged about how the work on porting DB2 to the Mac had started. It took admittedly longer than expected but DB2 on Mac is coming, and is absolutely free of charge, of course. The team is still playing with the bubble wrap, but DB2 on Mac is a reality.

What took IBM so long? DB2 is a database management system that’s highly optimized for each platform that it’s available for, so that it can take full advantage of the operating system at hand. In other words, porting DB2 from one platform to another, is not so trivial. The task is made more challenging by the extremely high standards set by IBM. You may be familiar with the whole scandal surrounding MySQL 5.1, which was released despite known fatal bugs. Something like that is simply not acceptable to IBM. Each release of DB2 has to go through a huge amount of regression and performance tests – for months. If the product does not pass all these tests and others, then DB2 is not shipped.

On top of this, a few months ago the decision to ship DB2 Express-C 9.5.2 (rather than 9.5) was made, and as you probably know, DB2 Express-C 9.5.2 was only released a little while ago for other supported platforms. So the first piece of good news is that you’ll get the latest version of DB2 on the Mac. It’s going to be a 64 bit version and will require Leopard to work:

$ db2level
DB21085I  Instance "acangiano" uses "64" bits and DB2 code release "SQL09052" with level identifier "03030107".
Informational tokens are "DB2 v9.5.0.2", "s081205", "DARWIN64", and Fix Pack "2".
Product is installed at "/Users/acangiano/sqllib".

The second good thing is that unlike MySQL 64 bit, you won’t have to jump though hoops to build the Ruby driver due to the fact that the database is 64 bits and Ruby ships on Leopard as 32 bits. We ensured that gem install ibm_db would work out of the box, so you don’t have to.

According to Apple, my personal Mac is broken for good (the video chip is dead), which is very bad timing. But I installed DB2 and played around with it on a work Mac Pro machine. I had some fun with Ruby and Rails as well. This is great news for many categories of developers, including those who have been trying to convince their managers to get them a MacBook Pro but didn’t have much of a case due to the lack of availability of a DB2 version. Now, you’ll have a good excuse to get yourself a Mac. ;-)

Stay tuned for the official announcement and keep in mind that this is going to be a beta (perfect for development purposes) and extra features and performance improvements will be added in future releases.

Disclaimer: The opinions expressed in this post are mine and mine alone, and do not necessarily represents the opinions of my employer, IBM.


Reflections on the Ruby shootout

Yesterday I published The Great Ruby Shootout and it quickly gathered a fair deal of attention. It was on the front page of Slashdot, Hacker News, Reddit, and so on. More than 15,000 people came by to read about the results of my comparison between Ruby implementations.

Those numbers looked good but something didn’t add up. Ever since I clicked the “Publish” button, I had a very uneasy feeling about the main shootout figures. They just didn’t seem right. I had a chance, particularly during the writing of my book, to extensively use Ruby on Vista and I can guarantee you that it’s visibly slower than on GNU/Linux. The Phusion team had benchmarked their Ruby Enterprise Edition against Ruby 1.8.6 many times, and found it to be about 25% faster. Yet my results were showing it as twice as fast than Ruby 1.8.7, which in turn is already faster than 1.8.6. To makes things worse, I’ve used Ruby 1.9 and found it to be faster than Ruby 1.8.7, but not 5 times as fast. For most programs that I tried Rubinius didn’t seem faster than Ruby 1.8. And the more I pondered it, the more it began to feel like one too many things didn’t add up.

In the comments, Isaac Gouy reported a couple of issues with the Excel formulas, where a few unsuccessful tests were mistakenly added to the totals. This skewed the results slightly, particularly in terms of penalizing JRuby. However, this wasn’t really it. Sure, the totals were inaccurate, but not enough to fundamentally change the main outcome of those results.

As I was discussing this somewhat unexpected result with Hongli Lai (co-author of Ruby Enterprise Edition), he mentioned that he knew what might be causing this anomaly. I had run the initial test against Ruby installed through apt-get, because I’d made a couple of assumptions. The first was that most people would probably be using the Ruby version that was deployed by their OS’ packaging system in both development and production mode. The second was that the performance of this version would be roughly similar to the one built from scratch. This second assumption would turn out to be highly mistaken.

I decided to run a test using Ruby 1.8.7 built from source as the baseline and added a column for Ruby 1.8.7, installed through apt-get, to the tables. In addition I also corrected the issue pointed out by Isaac. I updated the original shootout with the correct data, and what you see below is a bar chart for the geometric mean of the ratios for the successful benchmarks.

Geometric mean bar chart


Notice how everything makes much more sense now. Ruby 1.9 and JRuby are very close, respectively 2.5 and 1.9 faster than Ruby 1.8.7 (from source) on these benchmarks. Less impressive result sure, but I suspect much more realistic. The results for Ruby Enterprise Edition are in line with the 25% speed increase, if we consider that 1.8.7 is a bit faster than 1.8.6. Rubinius is still slower than MRI for most tests, but it’s improving. Ruby on Windows is slow. So slow in fact, that Ruby on GNU/Linux is twice as fast.

The really big, flashing warning though is what happens when you install Ruby through apt-get. Compiling from source gives you double the speed, according to these tests. I expected a 10/20% increase, not 100%. The gist of it is that prepackaged Ruby is compiled using the option –enable-pthreads and there is the whole issue of shared vs static libraries. But whatever the reason, this is a significant difference. For production use, in light of these results, I feel that it would be foolish to use the slower version of Ruby provided by apt-get/aptitude.

I rectified the results as soon as possible because the last thing I wanted was to mislead the Ruby community or worse still, betray its trust. Major kudos to Isaac for spotting the calculation issue, and Hongli for selflessly pointing out that the excellent Ruby Enterprise Edition results were probably due to the low performance of the Ubuntu’s version of Ruby.


The Great Ruby Shootout (December 2008)

The long awaited Ruby virtual machine shootout is here. In this report I’ve compared the performances of several Ruby implementations against a set of synthetic benchmarks. The implementations that I tested were Ruby 1.8 (aka MRI), Ruby 1.9 (aka Yarv), Ruby Enterprise Edition (aka REE), JRuby 1.1.6RC1, Rubinius, MagLev, MacRuby 0.3 and IronRuby.


Disclaimer


Just as with the previous shootout, before proceeding to the results, I urge you to consider the following important points:

  • Engine Yard sponsors this website, and also happens to sponsor, to a much greater extent, the Rubinius project. Needless to say, there is no bias in the reporting of the data below concerning Rubinius;
  • Don’t read too much into this and don’t draw any final conclusions. Each of these exciting projects has its own reason for being, as well as different pros and cons, which are not considered in this post. They each have a different level of maturity and completeness. Furthermore, not all of them have received the same level of optimization yet. Take this post for what it is: an interesting and fun comparison of Ruby implementations;
  • The results here may change entirely in a matter of months. There will be other future shootouts on this blog. If you wish, grab the feed and follow along;
  • The scope of the benchmarks is limited because they can’t test every single feature of each implementation nor include every possible program. They’re just a sensible set of micro-benchmarks which give us a general idea of where we are in terms of speed. They aren’t meant to be absolutely accurate when it comes to predicting real world performance;
  • Many people are interested in the kind of improvements that the tested VMs can bring to a Ruby on Rails deployment stack. Please do not assume that if VM A is three times faster than VM B, that Rails will serve three times the amount of requests per minute. It won’t. That said, a faster VM is good news and can definitely affect Rails applications positively in production;
  • These tests were run on the machines at my disposal, your mileage may vary. Please do test the VMs that interest you on your hardware and against programs you actually need/use;
  • In this article, I sometimes blur the distinction between “virtual machine” and “interpreter” by simply calling them “virtual machines” for the sake of simplicity;
  • Some of the benchmarks are more interesting for VM implementers than for end users. That said, if you think the benchmarks being tested are silly/inadequate/lame, feel free to contribute code to the Ruby Benchmark Suite and if accepted, they’ll make it into the next shootout;
  • Finally, keep in mind that there are three kinds of lies: lies, damned lies, and statistics.


Ruby implementations being tested


All of the Ruby implementations that were able to run the current Ruby Benchmark Suite have been grouped together in one main shootout. This group consists of Ruby 1.8.7 (p72, built from source, and installed through apt-get), Ruby 1.9.1 (from trunk, p5000 revision 20560), Ruby Enterprise Edition (1.8.6-20081205), JRuby 1.1.6RC1 and Rubinius (from trunk), all of them were tested on Ubuntu 8.10 x64, plus Ruby 1.8.6 (p287. from the One-Click Installer) on Windows Vista Ultimate x64. The hardware used for this benchmark was my desktop workstation with an Intel Core 2 Quad Q6600 (2.4 GHz) CPU and 8 GB of RAM. JRuby was run with the -J-server option enabled and by specifying 4 Mb of stack (required to pass certain recursive benchmarks). The best times out of five iterations were reported, and these do not include startup times or the time required to parse and compile classes and method for the first time. Several of these new tests also have variable input sizes.

The MagLev team provided me with an early alpha version of MagLev for the purpose of testing it in this shootout. Since this VM is not mature enough yet to run the Ruby Benchmark Suite, I used custom scripts against an old version of the Ruby Benchmark Suite on Ubuntu 8.10 x64. MagLev was tested, along with Ruby 1.8.6 (p287), on the same machine as that of the main shotoout, though the benchmarks were different (even when they had the same names as the ones in the main shootout).

MacRuby 0.3 and Ruby 1.8.6 (p114) were tested on Mac OS X Leopard using the previous version of the Ruby Benchamrk Suite. Since my MacBook Pro died (sigh), for this benchmark I used a Mac Pro, with two Quad-Core Intel Xeon 2.8 Ghz processors and 18 GB of RAM.

IronRuby (from trunk) and Ruby 1.8.6 (p287) were tested on a previous version of the Ruby Benchmark Suite on Windows Vista x64 on the same quad-core used for the main shootout. The MagLev, MacRuby and IronRuby numbers reported here were the best times out of five iterations, and include startup time. IronRuby on Mono was not tested because I couldn’t get it to work on my machine, despite having tried several IronRuby versions and two different Mono versions. Please also notice that Ruby 1.8.6 (p287) was tested twice on Windows, once for the main shootout against the current Ruby Benchmark Suite, and a second time to compare it with IronRuby, against the old benchmarks.

Note: As tempting as it is, do not compare implementations that belong to different shootouts directly to one another. It would be very disingenuous to directly compare VMs tested with different benchmarks and/or different machines. The only comparisons that make sense are the ones within each of the four groups.


Main shootout


The following table shows the run times for the main implementations. The table is fairly wide, so you’ll have to click on the image to view the data in a new tab.

Main Shootout's times


Green, bold values indicate that the given virtual machine was faster than Ruby 1.8.7 on GNU/Linux (our baseline), whereas a yellow background indicates the absolute fastest implementation for a given benchmark. Values in red are slower than the baseline. Timeout indicates that the script didn’t terminate in a reasonable amount of time and was (automatically) interrupted. The values reported at the bottom are the total amounts of time (in seconds) that it would take to run the common subset of benchmarks which were successfully executed by every virtual machine. When our baseline VM generated an error, others were used, starting with Ruby 1.8.7 on Vista (for color coding purposes only).

The following image shows a bar chart of the total time requested for the common subset of successfully executed benchmarks (those whose names are in blue within the tables):

Total Time


More interestingly, the following table shows the ratios of each Ruby implementation based on the baseline (MRI):

Main Shootout's ratios


The baseline time is divided by the time at hand to obtain a number that tells us “how many times faster” an implementation is for a given benchmark. 2.0 means twice as fast, while 0.5 means half the speed (so twice as slow). The geometric mean at the bottom of the table tells us how much faster or slower a virtual machine was when compared to the main Ruby interpreter, on “average”. Just as with the totals above, only those 101 tests, which were successfully run by each VM, where included in the calculation.

More concisely, here is a bar chart showing the geometric mean of the ratios for the various implementations tested:

Geometric Mean


I prefer to let the data speak for itself, but I’d like to briefly comment on these results. Just a few quick considerations.

Working off of the geometric mean of the ratios for the successful tests, Ruby MRI compiled from source is twice as fast than the Ruby shipped by Ubuntu, and by the One-Click Installer on Vista. The huge performance gap between ./configure && make && sudo make install and sudo apt-get install ruby-full should not be taken lightly when deploying in production. These numbers also reveal what most of us already knew: Ruby is particularly slow on Windows (800-pound gorillas in the room, or not).

Performance-wise Rubinius has more work left to be done to catch up with Ruby 1.8.7 and other faster VMs, particularly if we take into account the number of timeouts. But it has improved in the past year and I think it’s on the right track.

Ruby Enterprise Edition is about as fast as Ruby 1.8.7 compiled from source, which is reasonable considering that it’s a patched version of Ruby 1.8.6 aimed at the reduction of memory consumption (a parameter which wasn’t tested within the current shootout).

Speaking of excellent results, Ruby 1.9.1 and JRuby 1.1.6 both did very well. It looks like we finally have a couple of relatively fast alternatives to what is a slow main interpreter. According to the results above, and with the exception of a few tests, on average they are respectively 2.5 and 2 times faster than Ruby 1.8.7 (from source), and 5 and 4 times faster than Ruby 1.8.7 installed through apt-get on Ubuntu or Ruby 1.8.6 installed through the One-Click installer on Vista. Again, this does not mean than every program (particularly Rails) will gain that kind of speed, but these results are very encouraging nevertheless.


MagLev


There has been a lot of buzz about MagLev since Avi Bryant’s first benchmarks were shown a few months ago. Here we finally see it being put to the test. The table below shows the times obtained by running MagLev and Ruby 1.8.6 (p287) against MagLev’s set of benchmarks based on the old Ruby Benchmark Suite:

MagLev's times


And here are the ratios:

MagLev's ratios


You’ll notice how MagLev swings from being much faster than MRI to being much slower. I believe there is much room for improvement, but at almost twice the speed of MRI, these early results are definitely promising.


MacRuby


These are the times for MacRuby 0.3 on Mac OS X 10.5.5:

MacRuby's times


And of course, the ratios against the MRI baseline:

MacRuby's ratios


MacRuby is relatively new, so these are not bad results. More work is required, but it’s a good start.


IronRuby


Finally (I promise these are the last ones), here are the two tables for IronRuby and Ruby 1.8.6:

IronRuby's times
IronRuby's ratios


IronRuby is slower than Ruby 1.8.6 on Windows, which in turn is much slower than Ruby 1.8.7 on GNU/Linux. This is not very surprising. This project has been focusing on integrating with .NET and catching up with the implementation of the language by improving the RSpec pass rate, as opposed to performing any optimizations and/or fine tuning (as per John Lam’s presentation at RubyConf 2008). We’ll measure its improvements in the next shootouts.


Conclusion


Overall I think these are great results. Ruby 1.8 (MRI), with its slowness and memory leaks, belongs to the past. It’s time for the community to move forward and on to something better and faster – and we don’t lack interesting alternatives to do so at this stage.

I hope that for the next shootout, MagLev, MacRuby and IronRuby will be able to run the benchmark suite, so that they can all be tested and directly compared with each other. I also hope to include Tim Bray’s XML benchmark, some sort of “Pet Shop” sample Rails and Merb application and, above all, include memory usage statistics.

You can find the Excel file for the main shootout here. That’s all for now. Feel free to comment, subscribe to my feed, share this link and promote it on Hacker News, Reddit, DZone, StumbleUpon, Twitter, and Co. Putting together this shootout was a lot of work, so I definitely appreciate you spreading the word about it. Until next time…

Update (December 10, 2008): This article has been updated to correct a couple of major issues with yesterday’s results. I adjusted my commentary as well, in light of the corrected figures.

Update (February 7, 2009): Thanks to Makoto Kuwata, a Japanese version of this article was published in the Rubyist Magazine.


Merb, Rails Myths, Language Popularity and other Zenbits

Zenbits are posts which include a variety of interesting subjects that I’d like to talk about briefly, without writing a post for each of them.

Merb: A few days ago Merb 1.0 was released. Congratulations to Ezra Zygmuntowicz on this important milestone, the Merb community and Engine Yard (who finances the project). Merb 1.0 wasn’t even out yet when some people had already started commenting on the fracturing of the Ruby community that this new framework might bring with this, and the impact that this high visibility “competitor” might have on Rails. I believe that having more than one widely adopted web framework will only benefit the Ruby community. Furthermore, it’s important to remember that this is not a zero-sum game. Ruby programmers are perfectly capable of learning two frameworks and using one or the other, depending on the project at hand. This is particularly true if we consider that Merb, for all of its advantages – and disadvantages – when compared to Rails, is not totally different from its forerunner. If you are an expert Rails programmer, you should be able to become proficient in Merb in very little time. To help with this process, the Merb community needs to concentrate on the documentation now, given that the API is finally stable.

Rails Myths: David Heinemeier Hansson began a series of posts about Rails Myths. I like the idea of seeing common myths addressed straight from the horse’s mouth. Over the past two years, Rails has received quite a bit of backslash and old fashion FUD, so it’s important to set the record straight, whether the myths are entirely fabricated or if there is some element of truth to them. Whether you agree with David or not, it’s also nice to hear two sides of the same story. In fact, at the beginning of my book I debunk a few myths, just to set the record straight regarding what some readers may have heard surrounding the framework. It was a fun part to write.

My Book: Speaking of my book, Ruby on Rails for Microsoft Developers, I’m getting closer to the finish line. I’m about to complete Chapter 9 (out of eleven chapters). The initial schedule I was provided with has been extended slightly so that there will be sufficient time to properly review the content and ensure that it’s up to date with the final release of Rails 2.2. Some people wondered what the “Microsoft Developers” part means. Is it for people that work at Microsoft? Is it for .NET programmers? Is it for people who develop on Windows?

The truth is that “Microsoft Developers” is probably just a marketing term that Wrox selected as a catch-all for of the aforementioned categories of programmers. As an author I’m trying to serve all of them well, by providing a guide that sneaks in much of the Rails culture and softens the migration path by using an Operating System, and to a certain extent, tools that they’re already familiar with. In my opinion one of the major obstacles when switching to, or trying, Rails when coming from the Microsoft world, is the culture shock. The documentation and most books assume that you are familiar with *nix systems and tools, and this can be frustrating for those who are forced not only to learn a new language and framework, but also an entirely new set of tools. As it’s targeted at Microsoft developers, the book obviously makes quite a few references and comparisons to the .NET world, where they fit. This is done so that the many .NET programmers amongst the group of so called “Microsoft Developers” will find the book particularly useful. Yet the book remains generic enough so that it can be used by any programmer (particularly Windows users), even those without any knowledge of the Microsoft .NET Framework or ASP.NET.

Python books: While on the subject of books, I wanted to mention that the final version of the Pylons book is available online. Despite the much less fancy UI, the book pretty much does what the Django Book did in the past. And both are available in print as well (The Definitive Guide to Django: Web Development Done Right and The Definitive Guide to Pylons). Pylons is a Python web framework that can be viewed as a Ruby on Rails clone, in a far greater way than Django could ever be considered.

Another thing I want to mention is that I received a copy of Expert Python Programming. I haven’t gotten to far into it yet, but from what I’ve seen so far, things look good. I hope to be able to read it through, over a weekend in the near future and then provide a proper review. Stay tuned.

Language Popularity: If you take a look at the TIOBE Index, you’ll notice a few interesting things: Ruby has dropped two positions since last year, and it’s now the 11th most popular language in the world. This shouldn’t be cause for concern though, as shown by this Ruby graph. Python on the other hand is increasing in popularity and moved from the 7th to the 6th most popular language. Interestingly, according to the index (the results of which are educated guesses only), Python would seem to be more popular than C#. I find this to be true, in terms of online activity within an increasingly vibrant community, but in my opinion, the job market hasn’t caught up yet. In fact, at least in Toronto, when there’s a Python opening it’s pretty much an event that’s worthy of being discussed on the local Python mailing list. C# openings are much more common. This may be different in Silicon Valley, of course. It would also seem that Delphi has experienced a huge come back, moving from the 11th position last year to the 8th one this time around. It’s hard to imagine that Delphi has had a similar level of adoption as C# and thus has become more popular than Perl, JavaScript and Ruby. Delphi is a great solution for Win32 programming, but I don’t quite believe this overly optimistic outlook. And if this is the case, where are all the Delphi jobs and buzz?

DB2: This interview shows a few good reasons why even smaller and medium sized companies are increasingly adopting DB2. And while the video doesn’t mention it, IBM is coming out with an updated version of DB2 Express-C 9.5. This new version, 9.5.2 or 9.5 FixPack 2, is going to introduce exciting new features, including an engine for full text search.

The Great Ruby Shootout These days you hear a lot of talk about parallel programming. Intel promotes it and despite their bias, it’s plausible that parallel programming will become important as the CPU market heads towards an increasingly larger number of cores, as opposed to focusing on the frequency of said CPUs. In the world of Ruby, this translates into multiprocessing, as opposed to multithreading due to the infamous GIL (Global Interpreter Lock). This means that Ruby will most likely approach the problem similarly to how Python 2.6 did with the multiprocessing module, which is a process-based interface. The obvious exceptions are JRuby and IronRuby, which establish a 1 to 1 relationship between green threads and OS threads.

For the shootout, it would be interesting to see some multithreaded code, so as to get a better sense of how well JRuby and IronRuby compare to MRI and 1.9, when more cores are available. In fact, the long-promised shootout will be performed on a quad-core machine with 8GB of RAM. If Charles Nutter, John Lam, or any of their team members would like to contribute some programs that are able to take advantage of “native” multithreading, I’d be very happy to include them in the Ruby Benchmark Suite, to be used for my shootout.

The repository requires some love and refactoring, since it needs to be split in two types of benchmarks. The simpler one will evaluate the execution time minus the startup time, while the more advanced benchmark will also exclude the time required for parsing and loading modules, classes and methods in the AST. It would also be nice to test each program with variable input sizes and report these results accordingly. Right now I’m very busy with the book, but as I become more available, I’ll start working on this.

Finally, I want to point out a very interesting article about performance and UIs. Slow is indeed a very relative concept, and it’s important to understand how to analyze and respond to the user requirements when it comes to the responsiveness of an application as a user interacts with it.

Hardware: I finally bought a Trackball made by Logitech and the Microsoft Ergonomic Keyboard (Microsoft makes great hardware). I don’t have wrist problems, but I’d like to see how these two affect my extensive computer usage. I plan to report my experience as soon as I’ve had a chance to use these input devices for a while, since I know this is a topic that interests lots programmers (many of whom end up being victims of RSI, and some of the IRS :-P ). I also bought a bad-ass color laser printer which is quite handy when you’re a programmer and you are writing a book. I’ll let you know how it goes. What I didn’t buy, but still think is awesome, is the Flip minoHD. It’s the equivalent of an iPod for the world of camcorders. $235 for a camcorder that’s so perfectly compact, and yet that can record in HD, is a pretty sweet deal. I’m considering it for Christmas, assuming it reaches Canada by then.


And the winner is…

A few days ago I announced that I was going to give away a free ticket for the first Professional Ruby Conference, organized by Obie Fernandez and Addison-Wesley, to one of my readers.

Each person who took the survey, received a discount code for the conference valued at $200. More excitingly, every eligible participant in the survey was added to a draw for a free ticket. Many replies came in, but of them only 30 were eligible to participate, as they answered “yes” to the question “will you be able to attend the conference if you win?”.

You might think that I used some sort of script to come up with the random winner, perhaps using the Roo gem (for Google Spreadsheet). But I didn’t. I used a Rhombic Triacontahedron, or in layman’s terms, a 30-sided dice. Without further ado, I’m here to announce the winner.

The lucky winner is: Nick Quaranto, from the US.

Congratulations Nick! I’ll get in contact with the conference organizers this morning and provide them with your info, so that you can claim your conference pass. Should you not be able to attend the conference and therefore claim this prize, please get in touch with me immediately so that a second draw can be made.

I must really thank everyone who provided feedback. It’s incredible what you can learn from a survey like this and I’ll be sure to incorporate many of your suggestions into this blog. In fact, you can still take the survey and provide me with further feedback (as well as take advantage of the $200 discount code). The only difference, of course, is that there won’t be a further draw for another free ticket.

Disappointed that you weren’t the winner? Let me bring a different type of contest with plenty of prizes to your attention (this one is organized by IBM). It’s the XML Challenge, which includes 5 programming related contests/challenges, with prizes like 17″ laptops, Nintendo Wiis, MP3 players and so on. Aside from the chance to score some neat goodies, you also get to show IBM and the world your coding and XML-fu.


The Rise of the Functional Paradigm

LambdaIn yesterday’s address to the Ruby community, Dave Thomas invited Rubyists to fork Ruby, to freely research and experiment with new and interesting features. If this process is successful, many of these features will inevitably see their way back into Ruby’s core, thus improving the language in leaps and bounds. And I feel he couldn’t have been any more right. In fact, the whole industry is experiencing the trend of incorporating features developed in less common languages, research languages, “toy languages” if you prefer, within mainstream ones.

Experimenting with these alternative languages is important because occasionally they themselves become widely used, and even when they fail to do so, they lend their insight to the world of software development, finding their way into other languages. This approach greatly accelerates the development of common languages for the good of their large user bases and the improvement of the software industry. It’s a win-win situation for everyone involved and for the development community as a whole.

Pay attention to the development community online, and you’ll quickly notice a few non-mainstream programming languages appear over and over again. I’m referring to languages like F#, Erlang, Haskell, Scala and Clojure. I’ll admit to a certain selection bias, given that I tend to hang out in communities where hackers and developers actively pursue the betterment of their programming skills, beyond the stereotypical 9 to 5 requirements. But nevertheless, three or four years ago the average developer probably wouldn’t have heard about any of them (at least the ones that existed at the time). And today all of these languages have active communities, books being published about them, and most programmers have at least encountered some of these names.

They are all different languages, but their common denominator is the functional paradigm. Notice that I titled this post “The Rise of the Functional Paradigm” and not “The Rise of Functional Languages”. In a sense the latter is true as well, since there’s been much more attention towards functional programming languages lately. But there is a subtle difference. I don’t expect purely functional languages to become the most used programming languages anytime soon. For the foreseeable future, I don’t predict US companies to outsource Haskell jobs to India or China, like they do today for Java or .NET projects.

Yet these functional languages serve a higher purpose. Not only do they satisfy the needs of intellectually curious developers and companies looking for a competitive advantage, but they also have a great deal of influence on the rest of the development world.

We are seeing a convergence between these two groups of languages. Functional languages will strive to become as useful as possible, with libraries and tools that are more adequate for mainstream developers, while conserving their functional purity (I’m looking at you almighty Haskell). Meanwhile, mainstream languages will slowly adopt powerful features found in these functional and other research languages, adding further expressiveness and capabilities to their largely adopted foundations. F#, the evolution of C# and the addition of LINQ should be enough evidence that this is the case at least for the .NET platform. And even C++0x and D are leaning towards the incorporation of some functional features (e.g. lambda expressions and closures). The two types of languages come from different directions but will reach a similar destination.

The ever increasingly popular Ruby, Python and JavaScript owe their success to several factors. And while they are considered multi-paradigm and were mostly aided in their popularity by their immediacy, simplicity, usefulness and a set of historical circumstances, they’re all hybrid languages that adopt functional features. The functional paradigm is becoming so common, that it’s hard to imagine seeing any new programming language rise to fame without including at least a subset of the features available in other functional programming languages. As developers, we’ve grown to expect the elegance of functional features in a language. No lambda, no party.

If the 90s were characterized by the rise of the Object Oriented paradigm, and this decade can be considered as a transition phase, then the future belongs to the functional paradigm. Whether developers prefer to mix this with other paradigms (e.g. in languages like Ruby, Python, C#, etc…), like a powerful cocktail, or shoot it straight down (e.g. in purely functional languages like Haskell), the functional paradigm is here to stay.


Take this survey and win a free ticket for the Professional Ruby Conference

Professional Ruby ConferenceAddison Wesley will hold their first Professional Ruby Conference in Boston, Massachusetts between November 17 and 20, 2008. This conference, for which Obie Fernandez is the Technical Chair, is highly educational and boasts some of the best speakers from the Ruby and Rails communities.

The organizers were kind enough to invite me, offering me a complimentary pass for the Professional Ruby Conference. I won’t be able to attend, so I decided to donate my free admission to one lucky reader. They also provided me with a priority code (like a coupon) for my readers, which entitle you to receive a $200 discount off the regular admission price.

I really value your opinions and I’d appreciate it if you could take this survey, so that I can improve the quality of this blog. At the end of the survey you’ll receive your $200 discount code, and will be entered into my draw for your chance to win one free ticket. I will announce and get in touch with the winner early next week (Monday or Tuesday depending on participation levels).


SURVEY


Benchmarking DB2 pureXML against 1 TB of XML data

Once upon a time there was a Ruby library called Hpricot. Well it’s still here in fact. This library is the de facto standard for parsing HTML in Ruby, and is often used to parse XML as well.

Hpricot is normally considered to be quite fast, as far as Ruby libraries go. Yet Nokogiri recently garnered some buzz thanks to a microbenchmark that emphasized its speed over Hpricot’s, when it comes to parsing XML in a microbenchmark setting. And I can’t stress the “micro” part enough, since this was the file that was tested:

<location>
<refUrl>http://wikitravel.org/en/Singapore</refUrl>
  <info>
    &lt;b&gt;Singapore&lt;/b&gt; is an island-state in Southeast
    Asia, connected by bridges to Malaysia. Founded as a British trading colony
    in 1819, since independence it has become one of the &lt;b&gt;world's
    most prosperous countries&lt;/b&gt; and sports the world's busiest
    port.   Combining the skyscrapers and subways of a &lt;b&gt;modern,
    affluent city&lt;/b&gt; with a medley of Chinese, Indian and Malay
    influences and a &lt;b&gt;tropical climate&lt;/b&gt;, with
    tasty food, good shopping and a vibrant nightlife scene, this Garden City
    makes a great stopover or springboard into the region.
  </info>
</location>

Over the weekend, why made a few tweaks to his library et voilĂ , it was suddenly faster than Nokogiri in terms of parsing an XML document smaller than 18K. It’s nice to see them striving to improve the speed of these libraries. After all, for parsing HTML or even the occasional small XML document, those two Ruby libraries are fine and have their place.

pureXMLDB2 users do not generally need them for XML though. In fact, DB2 offers a technology called pureXML to help in this area. In short, XML documents can be stored, indexed, cached, queried, updated, validated and compressed within the database in XML columns. This means that the data is securely stored, properly backed up, and easily restored. What’s more, there is no need to parse large strings to obtain an object representation of the XML document(s). Queries and updates require no parsing at all, since the XML is stored in a parsed hierarchical format. You simply ask for the data that you need (or need to update) directly, and DB2 will eagerly oblige. You can use XPath, XQuery, and also integrate SQL and XML queries to retrieve relational and XML data. It’s as easy as it gets, and of course, all this supports Unicode.

And DB2 pureXML is blazingly fast. How fast? Well, at IBM we like benchmarks too, only we don’t use 18K of XML data. With very little tweaking (pretty much letting DB2 automate and self-tune almost everything), DB2 was tested with the help of the good folks over at Intel, on the latest Xeon processors. The TPoX (Transaction Processing over XML Data) Open Source benchmark was used. This is a well balanced and realistic benchmark that proposes a financial transaction processing scenario. The raw XML data was 1 Terabyte and was stored in three very straightforward tables (with XML indexes). DB2 managed to store all the information, including the indexes, in just 440 GB. And the row data was 1 TB without indexes.

Compression aside, we were pretty much blown away by how immensely fast DB2 is. The throughput with 200 concurrent users was stable throughout the 2 hour run, and under a mixed workload, performed about 34 million queries, almost 7 million updates, almost 4 million insertions and deletitions (for a total of almost 48.5 million transactions). To be exact, the average was 6,763.42 transactions per second. The two tables upon which inserts were performed did 4,913 insert per second (4 to 20Kb of data) and 11,904 inserts per second (1 to 2Kb), respectively. Not only is DB2 often benchmarked as the fastest database in the world when it comes to relational data, but it’s also undisputedly state of the art when it comes to XML handling.

As a side node, by switching from Intel Xeon 7300 processors (4 cores) to Intel Xeon 7400 CPUs (6 cores) the number of cores was increased by 50%. DB2 managed to increase its throughput by 48%. You know, we don’t waste anything in this neighborhood. ;-) For full details about the results of the benchmark, you can download the slides of the IOD presentation (PDF warning).

Antonio, I hear you say, we don’t have that kind of hardware. That’s true, but you probably don’t need to process 18 Million documents an hour either. What you do have is that kind of software – for free. In fact, DB2 Express-C is a free version of DB2 that doesn’t impose any restrictions on the size of the database, how many users can be connected or how many databases you can have. It has the same code that ran the test above. So you can have a lightening fast XML engine that opens up a world of possibilities, free of charge.

If your startup or established company is serious about XML, DB2 Express-C 9.5 is a godsend. Now you can even try DB2 Express-C 9.5.2 Beta (currently available only on Windows). This version ships with both pureXML and a very fast Text Search technology, so that you don’t have to use fuzzy creatures like ferrets and sphinxes.



Disclaimer: The opinions expressed in this post, and any last minute remarks about how Oracle’s license won’t allow me to publish comparative benchmarks, are mine and mine alone, and do not necessarily represents the opinion of my employer, IBM, or the aforementioned Intel.


Pygments TextMate Bundle

Following my last post, a few people asked me to create a Pygments TextMate bundle. Ask and ye shall receive (on GitHub).

The Pygments menu

Prerequisites


Install Pygments following these instructions.


Installation


First method:

sudo mkdir -p /Library/Application\ Support/TextMate/Bundles
cd /Library/Application\ Support/TextMate/Bundles
git clone git://github.com/acangiano/pygments-textmate-bundle.git "Pygments.tmbundle"

If TextMate is running while you perform the update, execute the following:

osascript -e 'tell app "TextMate" to reload bundles'

This is equivalent to selecting Bundles -> Bundle Editor -> Reload Bundles from within TextMate.

Second method: Download this file, unzip it, and double click on Pygments.tmbundle.

By the way, add the following to your stylesheet if you’d like to see a scrollbar when displaying very long lines of code. This adds a nice border as well:

.highlight { border: 1px solid silver; padding-left: 5px; margin-bottom: 0.5em; overflow-x:auto; }

Integrating TextMate and Pygments

Like many, I don’t use TextMate just for coding. All of my posts are first drafted in my trusty editor before being published. One of the problems that I had, and that others probably face too, is the less than smooth process of publishing properly highlighted code in posts and HTML pages. A few solutions exist, including embedding gist snippets, using “Create HTML from Document” in TextMate, or adopting JavaScript libraries or WP plugins. But when it comes to highlighting code, for me Pygments is simply unbeatable.

Pygments is a Python library but ships as a command line tool as well. However, switching between TextMate and the command line is not as convenient as I’d like. So on the weekend I pulled out my big sharp razor and started yak shaving. The result of that brief session is a hack that delivers the integration of TextMate and Pygments, so that code can be easily converted to HTML in order to beautifully present it.

First, let’s see how I use it. When I select a snippet of Ruby code in TextMate and press ⌃⌥1 a snippet of code is transformed into the proper HTML. ⌃⌥2 is for Python snippets, ⌃⌥3 for any other language, and ⌃⌥4 for any language as well but with the option of adding line numbers. In practice, this means that I use 1 and 2 most of the time and these shortcuts are easy enough to remember. Note that this is not necessarily the best arrangement, but it works well for me. I could, if so inclined, associate all 4 commands to the same shortcut and be prompted by a menu every time this combination is pressed, obtaining something along the lines of the image shown below:

A possible prompt menu for Pygments

Should I ever forget these 4 shortcuts, I can take a quick look at the Text bundle menu shown below. I placed these commands under the Text menu, since they are globally available for textual formats, whether I’m composing HTML, Textile, Markdown or ReST; but this is entirely arbitrary and I suspect that many would consider the HTML menu instead or place a “Convert to HTML” entry in the menu of the specific language.

The Text menu

Ruby and Python deserve their own command because they are the languages whose code I publish the most, but pressing ⌃⌥3 (or 4) prompts a long list of languages to choose from as shown below (the image is cut to reduce its length):

The select a language dialog

The following are a series of steps that you can take to reproduce the same results as mine. The HTML required to present the code nicely in this section was generated from within TextMate. In other words, I’m eating my own dog food.

Step 1: If you haven’t done so already, install Pygments. You can get it from the official site.

Step 2: Within TextMate click on the menu entry: Bundles -> Bundle Editor -> Show Bundle Editor and click on the triangle to open up Text in the left pane.

Step 3: Click on the +- button in the lower left corner of the window and select New Command, then name the command Pygmentize Ruby (assuming that you want a command for Ruby).

Step 4: Ensure that each option for Save, Input, Output and Activation are the same as shown below (click to enlarge):

Step 5: Fill the Command(s) text area with the following code:

#!/usr/bin/env python

import os
import sys
from pygments import highlight
from pygments.lexers import RubyLexer
from pygments.formatters import HtmlFormatter

try:
    code = os.environ['TM_SELECTED_TEXT']
except KeyError:
    sys.exit()

formatter = HtmlFormatter()
print highlight(code, RubyLexer(), formatter)

Step 6: Repeat the process for Pygmentize Python, Pygmentize… and Pygmentize with line numbers… but select a different Activation key equivalent (replace 1 with 2, 3 and 4, respectively).

The command code for Pygmentize Python is as follows:

#!/usr/bin/env python

import os
import sys
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter

try:
    code = os.environ['TM_SELECTED_TEXT']
except KeyError:
    sys.exit()

formatter = HtmlFormatter()
print highlight(code, PythonLexer(), formatter)

For Pygmentize… use the following:

#!/usr/bin/env python

import os
import sys
from commands import getoutput
from pygments import highlight
from pygments.lexers import get_all_lexers, get_lexer_by_name
from pygments.formatters import HtmlFormatter

try:
    code = os.environ['TM_SELECTED_TEXT']
except KeyError:
    sys.exit()

available_languages = ", ".join(sorted('"'+lex[1][0]+'"' for lex in get_all_lexers()))
chosen_language = getoutput("""echo $(osascript <<'AS'
    tell app "TextMate"
        activate
        choose from list { %(languages)s } \
            with title "Pick a language" \
            with prompt "Select a language"
    end tell
AS)""" % {'languages':available_languages})
os.system("osascript -e 'tell app ""TextMate"" to activate' &>/dev/null &")

lexer = get_lexer_by_name(chosen_language.lower())
formatter = HtmlFormatter() # linenos=False
print highlight(code, lexer, formatter)

And finally for Pygmentize with line numbers… use the almost identical script below:

#!/usr/bin/env python

import os
import sys
from commands import getoutput
from pygments import highlight
from pygments.lexers import get_all_lexers, get_lexer_by_name
from pygments.formatters import HtmlFormatter

try:
    code = os.environ['TM_SELECTED_TEXT']
except KeyError:
    sys.exit()

available_languages = ", ".join(sorted('"'+lex[1][0]+'"' for lex in get_all_lexers()))
chosen_language = getoutput("""echo $(osascript <<'AS'
    tell app "TextMate"
        activate
        choose from list { %(languages)s } \
            with title "Pick a language" \
            with prompt "Select a language"
    end tell
AS)""" % {'languages':available_languages})
os.system("osascript -e 'tell app ""TextMate"" to activate' &>/dev/null &")

lexer = get_lexer_by_name(chosen_language.lower())
formatter = HtmlFormatter(linenos=True)
print highlight(code, lexer, formatter)

Step 7: Click on Text and by dragging and dropping arrange the menu to include the Pygmentize commands as shown below (click to enlarge):

Editing the Text menu

Step 8: At this point everything should work, whether you invoke the commands through a keyboard shortcut or through the Text menu. However, you will need to upload and include a Pygments stylesheet from within your site. To generate a stylesheet run the following from the command line:

pygmentize -S default -f html > pygmentize.css

In the above command, default is the name of the style. For example, the Python code you see in this article is styled with the style pastie (because I globally adopted that stylesheet for this site). For a comparison of the available styles check out this demo page.

Step 9: ????

Step 10: Profit!

I hope these hacked together commands can be useful to others. Feel free to customize them and improve upon them as it suits your needs.

UPDATE: I made a Pygments TextMate Bundle out of this.


What Arc should learn from Ruby

There was a lot of buzz surrounding Arc before it was released. Then Paul Graham made an early version available to the public and most people weren’t too impressed. Paul is a charismatic figure and has his own following, so despite the uncertain welcome that the language received, Arc managed to attract a small community of curious developers. Then silence. For a few months, most people hardly heard anything about Arc. Until today. A post on news.arc suddenly found its way into the spotlight. In this post, its author tried to summarize the status of the language, its failure to attract new developers or even retain the existing ones, and what is perceived as a lack of leadership for the project and a very uncertain future.

I joked about this and said that “Arc is the infogami of programming languages”. Infogami was a project started by Aaron Swartz, that was claiming it was going to revolutionize the web “like the Macintosh did for computers” (if I recall the quote correctly). Despite the great expectations and the grand announcements that were made, the project (launched by what was a YCombinator startup) never really flew. But I digress; the point is that Arc, like infogami was wrapped in layers of expectations, like a scallop with bacon, but the delivery and the outcome were less than stellar.

Don’t hold this against Paul (or Aaron); projects fail or have rocky starts all the time. It’s the nature of software and of this business. Don’t jump to conclusions too quickly and assume that Arc has failed and is dead, either. Paul seems to have long term plans for the language. Many popular languages today were absolutely obscure to most developers for their first several years. Should Arc become the great language many had hoped for, people will have no qualms in adopting it in 1, 3 or even 5 years time.

I think that Paul is trying to distinguish between what the core features of the language are and its libraries, and his aim and focus is currently directed towards the former; a core that he wishes to change and evolve with a certain degree of freedom. From many of his writings, it is apparent that Paul has set out one clear cut criterium for the features of his language: whatever changes are made, need to make Arc concise. Concise doesn’t mean terse or unreadable, it means powerful and expressive enough to enable the language’s users to write a certain program with much less code than what’s required by other commonly adopted languages, and if possible, even by the most popular dialects of Lisp.

This approach to the design of the core language is far from off course. For example, Ruby’s conciseness is one of its most appreciated characteristics and one of the prominent features that sets it apart from compiled languages like C or Java. But a powerful and well designed programming language is not enough. That’s not really what developers are after.

Programmers are looking for solutions to existing problems and pains. They want tools, and this includes languages, that are good at helping them while they try to implement certain types of software. The languages may be general purpose, but if they don’t represent an advantage over other alternatives in a particular domain, they are hardly used.

In short, libraries and the ecosystem surrounding programming languages are often as important as the languages themselves.

In a somewhat related “ask News.YCombinator” question, Clojure was mentioned as a possible “next big thing” for the Lisp world. As you can imagine, the subject of Arc was brought up and Paul stepped in with a few comments. Two significant ones specifically focused on the subject of libraries, trying to address what is currently perceived as one of Arc’s big weaknesses, whereas it’s a strong point for Clojure, which can rely on all the existing code available for the Java world (since Clojure targets the JVM).

The first of these comments included the following quote:

Powerful libraries are a cheap way to make a new language the language du jour. (Think Rails.) They’re not the critical ingredient if you’re trying to make something to last; they may even hurt. — Paul Graham

And the second one, further explained what his position was:

The dangers of libraries are that they distract one’s attention from the core language, and that they could conceal or perpetuate flaws in the core language. I’m not saying that languages *shouldn’t* have powerful libraries, btw, just that they may not be 100% upside. — Paul Graham

Read that first quote again. I agree with the initial part of it, but find the conclusion is not supported by the facts. Paul himself brought up the example of Ruby and Rails, so let’s explore this further and see what Arc could learn from Ruby’s experience.

Ruby was a very good programming language that for a long time was hardly used outside of Japan. Beside the scarcity of documentation in English, one of the major untold challenges of the language was that it didn’t really solve a particular problem. It was a beautiful general purpose language and that was it. Sure, quite a few smart early adopters realized the potential of the language, and it started to gain some momentum amongst Perl developers, as a possible Perl successor. But let’s face it, for most people Perl was good enough.

Ruby’s core didn’t really change in 2006, when a large majority of the development community acknowledged the merits of the language. Ruby in 2000 and Ruby in 2006 were not really that different. What was drastically different about the two was the ecosystem surrounding Ruby. And the fact that Ruby now offered an incredible solution for those developers who desired to address the problem of web development. Matz was right, Rails is Ruby’s killer application.

But let’s not stop here. Paul agrees that Ruby’s growth rate has been exponentially increased thanks to Rails. So much so that he mentioned it as an emblematic example of a “cheap way” to bring a language to the front of the class. He concludes however, that this is not really the right approach or the critical ingredient needed to create something that lasts. He even goes so far as to say that it may end up hurting the language. And in this, I absolutely disagree.

I disagree because the history of Ruby and that of a few other programming languages shows how the arrival of a community of interested developers has always had many more benefits than drawbacks, provided that a minimal leadership for the projects existed. Taking an accelerated course in Ruby History 101, we’d discover that Ruby already had its own group of fans and a growing English speaking community. Ruby was able to stand on its own merits. What Rails did was inject an incredible amount of attention towards Ruby from developers, companies and the tech oriented media.

What was the effect of all this on Ruby? Countless libraries were written for the language; thousands of companies, particularly startups, adopted Ruby as their language of choice. There was a spur of alternative new Ruby web frameworks (like Merb, Ramaze, etc…) and also a dozen alternative implementations that attempt to improve upon the shortcomings of the language’s main implementation and, in a few occasions, to integrate Ruby with other existing VMs (e.g. with JRuby and IronRuby). There were a grand total of two generally available books on the topic, yet now bookstores are filled with Ruby-specific books, not just ones about Rails (something I predicted to my wife back in 2004). Perhaps more importantly, Ruby is now also widely used outside of the web development world as the truly general purpose language it was intended to be. Every time a DSL is needed, Ruby delivers. There are very popular conferences for the language held throughout the year and companies (e.g. Engine Yard) and VCs are investing money in the future of Ruby. mod_ruby wasn’t working for Rails? Along came Mongrel (and several other alternatives) and now even mod_rails to address the deployment issues.

Paul, I’ll take this flourishing community and ecosystem if its downside is just a sea of newbies asking silly questions in forums. Rails has shown the world what Ruby is capable of, and by doing so, it also presented Ruby’s faults in a clear light. Only this time, there weren’t 100 people ready to jump in and fix it; there were 10,000, several companies and millions of dollars to back it up. To me this is a critical ingredient to make something last.

I understand that Paul isn’t rushing when it comes to Arc, and that he wants to perfect his language before letting a large crowd gather around it, but the risk here is the alienation of early adopters and letting many people who were genuinely interested in the project down. Arc like Ruby did (and unlike Clojure), cannot rely on a wealth of existing code. To ensure its bright future Arc will therefore need to make up for this with, yes, a well designed powerful core, but above all with a growing set of libraries, a growing community and if possible, at least one easily identifiable area where Arc shines in resolving one problem better than other existing languages are capable of doing. Based on the trend of the development world, this killer application will probably come in the form of a Rails or Seaside equivalent for Arc. There isn’t a mad dash, Arc is still very young after all, but it would be a mistake to underestimate the importance of such a critical component for the future and success of this language.


Review of the first two Envycasts

The following quiz contains five fairly simple questions about ActiveRecord and Rails 2.2. Try to see if you can answer all of them.

1) What’s wrong with the following (technically valid) line of code?

Guide.find(:all, :include => [:user, {:questions => [:user, {:answers => :user}]}],
                 :conditions => "answers.user_id = 42")

2) Having specified :counter_cache => true in an association, what’s the difference between invoking size, length or count on the associated collection?

3) How do the following models work?

class Rate < ActiveRecord::Base
  belongs_to :rateable, :polymorphic => true
end

class Post < ActiveRecord::Base
  has_many :rates, :as => :rateable
end

4) In Rails 2.2 how would you disable validation of a model’s associated objects?

5) How would you use caches_action to cache the content of an action but not the layout?

Confused, lost, not sure? Read on.

Gregg Pollack and Jason Seifer, of Rails Envy Podcast fame are up to something interesting, yet again. They recently started publishing commercial screencasts.

The first envycast is called Advanced ActiveRecord and as you’d expect, it covers several intermediate level topics that often confuse beginners. From relatively simple concepts like dirty objects to polymorphic associations and Single Table Inheritance (STI), including several performance considerations. The video is 37 minutes long and ships with a useful PDF cheatsheet.

The second envycast is titled Ruby on Rails 2.2 Screencast. It comes in at almost 45 minutes long and will prove useful to any Rails developer who’d like to upgrade their skills to Rails 2.2. It covers most of the changes, but for a more thorough analysis of every detail in the change log, they are also making a PDF by Carlos Brando available. Both the video and the PDF sell for $9 each, but you can get them together in a bundle for $16.

This is a preview of the Rails 2.2 one:


Overall I think these videos are very solid. They are informative, accessible, have good examples, and above all are fun and entertaining to watch. If you enjoy the style of humor distilled weekly through Gregg and Jason’s podcast, you’ll love their videos as well. Nine dollars to learn more about ActiveRecord or to get you up to date with Rails 2.2 in less than an hour, is truly an inexpensive price to pay. I also like the fact that their videos are made available in both Quicktime and Theora Ogg format.

Final verdict: I can confidently recommend them.


Disclosure: I obtained the two videos for free, as review copies, but I have no commercial affiliation whatsoever with the Envycasts producers.


« Previous Entries Next Entries »

Copyright © 2005-2010 Antonio Cangiano. All rights reserved.