My previous post about MagLev and the planning of the next Ruby shootout received a lot of attention. MagLev’s speed claims have been subject to a lot of skepticism, and many believe that these impressive figures are due to a combination of clever optimization for trivial tests and incompleteness. The skepticism is understandable. There have been very bright people working on alternative VMs for years, and this new product shows up after only 3 months, and claims to be way faster than anything seen before.
Except, that it’s not entirely new. What makes the fact that they may be onto something and develop a faster implementation credible, is that they are leveraging decades worth of Smalltalk experience, where the Smalltalk VMs underwent similar development challenges. Ruby and Smalltalk are family, there is not inherent reason why Ruby has to be dramatically slower than certain Smalltalk implementations. Parsing and compiling Ruby code into “smalltalk-ish” bytecode is not the hardest thing to do. So, from a certain prospective, MagLev is 30 years old, not 3 months old. I’m enthusiastic about the VM because I think MagLev is promising, but don’t let people tell you that I’m naive. Despite the fact that MagLev is incomplete, I want to challenge it so that we verify and clarify what kind of speed improvements are really offered at this stage. Let’s investigate how.
One very valid point that was raised by several people both in my comment section and on Slashdot, is the fact that many of the benchmarks that have been employed so far are not very useful. Some of them are meaningless, not because of the usual “micro-benchmark must be taken with a grain of salt” sound logic, but rather because they offer an opportunity for VMs to optimize them out. When a lazy/smart VM realizes that a given loop doesn’t produce any results which will be used in some way, it’ll just skip it, giving us the impression of being many, many times faster than the standard Ruby 1.8 implementation by Matz et al.
In the real world, when that loop has to do something meaningful and the results of the computation have to be printed on the screen, that impressive performance is nowhere to be seen. So far this set of tests, which were employed by Yarv for testing its own progress, have also been used to compare different implementations (and these benchmarks can be found in Rubinius’ repository as well). This was the easy thing to do, but if we’re going to get serious about it, we need to produce a better set of benchmarks, especially when the current ones question both Yarv and MagLev’s impressive results.
In the long run, it would be good to come up with some serious benchmarks based on AST nodes in order to test each of Ruby’s features. We can work on that, but let’s get started with some “beefed up” micro-benchmarks for the imminent shootout. In one of my comments I wrongly called the Yarv tests “standard”. That was unfortunate wording, because there are no “standard” benchmarks that we can rely on even minimally in the Ruby community. Let’s fix that.
I created an empty project on GitHub, called Ruby Benchmark Suite. This project will hold a set of benchmarks that VM implementers can use to monitor their own progresses and that I can use to run periodical shootouts between all of the major implementations.
I also created a Lighthouse project, so that we can have some support for communication and project management. For on going discussion about the project, I created a public Google Group which I invite you to join, if you’re interested in helping out. I’d like to see VM implementers get involved with this, in order to make it a set of reasonable, standard benchmarks that we can all agree upon.
For the next shootout, I’d like to start my multiple testing within the next week or two. So it’d be great if we could come up with a bunch of new tests and revisit the existing ones. What I’d like to see is the following:
I hope I can count on your help for this project.
I sincerely welcome and appreciate your comments, whether in agreement or dissenting with my article. However, trolling will not be tolerated. Comments are automatically closed 15 days after the publication of each article.