Meditations on programming, startups, and technology
New Relic

Ruby Shootout Status Update

I want to provide those who are waiting for the Ruby shootout with a heads up. The benchmark suite needs some substantial changes in order to ensure accuracy and fairness for all the VMs involved.

This will delay the execution (and reporting) of the shootout further, but it will be worth it. I definitely prefer a shootout that’s published later in July (or heck even August) that is realistic, fair and provides interesting metrics (e.g. CPU time and memory) over an inaccurate one that was put together in a rush just for the sake of publishing it tomorrow.

For those interested in the technical details, we are trying to separate the parsing and “compilation” of definitions from the actual execution of the code (which needs to be timed). I accomplished this by creating a Proc for each benchmark, and then tested the time spent executing its call method. The problem with this approach is that it penalizes VMs that don’t JIT procs, like JRuby for example.

We also thought about defining a method instead of a Proc, but eval won’t accept class definitions or constants within methods. The workaround would be using MyClass.class_eval instead of class MyClass in the benchmarks and Module#const_set for the constants (or changing them to instance variables, for example). But we’re shooting for a cleaner solution in which we divide definitions from their actual execution in separate files, and only time the latter.

And of course, we also need to add cross-platform memory measurement into the picture. It may take a while, but stay tuned. 😉

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

receive my posts by email

11 Responses to “Ruby Shootout Status Update”

  1. Glad to hear it. :) That will give me time to write a decent “pmap -d” parser for Linux. :)

  2. malcontent says:

    Too bad. I was really looking forward to it.

  3. vic says:

    Will it include php and python?

  4. Tim says:

    Are you maybe overthinking this? As someone who wonders “How fast will this Ruby run this code?” I really don’t care whether the time goes into parsing or running. They are all part of the time-to-the-answer.

    What am I missing?

  5. Hi Tim,

    my first approach was very direct: load a file and see how long it takes to get a response. However given that the emphasis has been posed on “fairness” for all the VMs involved in order not to misrepresent their speed, a few objections were raised regarding my simple proposal. Somewhat ironically, Charles Nutter was the one who mostly raised the issue of fairness, given that accounting for compiling and parsing at each iteration would somehow penalize JRuby. You can read about it (and join the discussion) in this thread.

  6. Greg Donald says:

    I’m with Tim..

    Sounds to me like JRuby has some implementation issues and you are being asked to “work around” them.

    If you’re not passing the exact same code, byte for byte to each VM, then it’s not a valid benchmark.

  7. @Greg & @Tim: I don’t think I agree with you. We are talking about very different VMs and runtimes. Reality is not always black & white and I think we need different ways to compare things together, in order to avoid the usual issues of comparing apples with oranges.

    On the other side, Antonio, I think you should provide several different measures: don’t exclude an absolute number like what Greg & Tim are talking about.

  8. David Brady says:

    What happened to agility? Ship it! Release what you have now, annotate the deficiencies, and iterate, iterate, iterate!

    What better way to see what really needs to get fixed next?

  9. Ezra says:

    Why not run tests both including and excluding startup/parse/compile time and report both times? This way we can see all the data and it also doesn’t cater to any VM over another.

  10. @Tim and @Greg: You’re wrong. The original proposal for the tests would have been doing repeated loads of a file in a loop, which on all implementations would require extra processing in the form of parsing and possibly compilation. And on optimizing implementations, this is a further penalty because the code is essentially being loaded anew, so all previous optimizations get thrown out. But the larger point is that it’s no longer a benchmark of some algorithm…it’s a benchmark of that algorithm plus load/compile/optimize time. If that’s the goal, so be it…but in this case all involved agreed it would be extra noise unrelated to the actual code under test.

    @Greg: JRuby does not have implementation problems, and we were not trying to get Antonio to work around anything. The original benchmarking logic was flawed, and I suggested a way that would not penalize all the implementations based on parse/compile overhead. And the other implementers agreed.

Leave a Reply

I sincerely welcome and appreciate your comments, whether in agreement or dissenting with my article. However, trolling will not be tolerated. Comments are automatically closed 15 days after the publication of each article.

Copyright © 2005-2014 Antonio Cangiano. All rights reserved.