I want to provide those who are waiting for the Ruby shootout with a heads up. The benchmark suite needs some substantial changes in order to ensure accuracy and fairness for all the VMs involved.
This will delay the execution (and reporting) of the shootout further, but it will be worth it. I definitely prefer a shootout that’s published later in July (or heck even August) that is realistic, fair and provides interesting metrics (e.g. CPU time and memory) over an inaccurate one that was put together in a rush just for the sake of publishing it tomorrow.
For those interested in the technical details, we are trying to separate the parsing and “compilation” of definitions from the actual execution of the code (which needs to be timed). I accomplished this by creating a Proc for each benchmark, and then tested the time spent executing its call
method. The problem with this approach is that it penalizes VMs that don’t JIT procs, like JRuby for example.
We also thought about defining a method instead of a Proc, but eval
won’t accept class definitions or constants within methods. The workaround would be using MyClass.class_eval
instead of class MyClass
in the benchmarks and Module#const_set
for the constants (or changing them to instance variables, for example). But we’re shooting for a cleaner solution in which we divide definitions from their actual execution in separate files, and only time the latter.
And of course, we also need to add cross-platform memory measurement into the picture. It may take a while, but stay tuned. 😉
Get more stuff like this
Subscribe to my mailing list to receive similar updates about programming.
Thank you for subscribing. Please check your email to confirm your subscription.
Something went wrong.
Glad to hear it. 🙂 That will give me time to write a decent “pmap -d” parser for Linux. 🙂
Too bad. I was really looking forward to it.
Will it include php and python?
@vic: nope. 🙂
Are you maybe overthinking this? As someone who wonders “How fast will this Ruby run this code?” I really don’t care whether the time goes into parsing or running. They are all part of the time-to-the-answer.
What am I missing?
Hi Tim,
my first approach was very direct: load a file and see how long it takes to get a response. However given that the emphasis has been posed on “fairness” for all the VMs involved in order not to misrepresent their speed, a few objections were raised regarding my simple proposal. Somewhat ironically, Charles Nutter was the one who mostly raised the issue of fairness, given that accounting for compiling and parsing at each iteration would somehow penalize JRuby. You can read about it (and join the discussion) in this thread.
I’m with Tim..
Sounds to me like JRuby has some implementation issues and you are being asked to “work around” them.
If you’re not passing the exact same code, byte for byte to each VM, then it’s not a valid benchmark.
@Greg & @Tim: I don’t think I agree with you. We are talking about very different VMs and runtimes. Reality is not always black & white and I think we need different ways to compare things together, in order to avoid the usual issues of comparing apples with oranges.
On the other side, Antonio, I think you should provide several different measures: don’t exclude an absolute number like what Greg & Tim are talking about.
What happened to agility? Ship it! Release what you have now, annotate the deficiencies, and iterate, iterate, iterate!
What better way to see what really needs to get fixed next?
Why not run tests both including and excluding startup/parse/compile time and report both times? This way we can see all the data and it also doesn’t cater to any VM over another.
@Tim and @Greg: You’re wrong. The original proposal for the tests would have been doing repeated loads of a file in a loop, which on all implementations would require extra processing in the form of parsing and possibly compilation. And on optimizing implementations, this is a further penalty because the code is essentially being loaded anew, so all previous optimizations get thrown out. But the larger point is that it’s no longer a benchmark of some algorithm…it’s a benchmark of that algorithm plus load/compile/optimize time. If that’s the goal, so be it…but in this case all involved agreed it would be extra noise unrelated to the actual code under test.
@Greg: JRuby does not have implementation problems, and we were not trying to get Antonio to work around anything. The original benchmarking logic was flawed, and I suggested a way that would not penalize all the implementations based on parse/compile overhead. And the other implementers agreed.