This post contains the results of a Ruby shootout on Windows that I recently conducted. You can find the Mac edition, published last month, here. I was planning to have this one ready much sooner, but a couple of serious events in personal life prevented that from happening. Be sure to grab my feed or join the newsletter to avoid missing the upcoming Linux shootout.
The setup
For this shootout I included a subset of the Ruby Benchmark Suite. I opted to primarily exclude tests that were executed in fractions of a second in most VMs, focusing instead of more substantial benchmarks (several of which come from the Computer Language Benchmarks Game). The best times out of five runs are reported here for each benchmark.
All tests were run on Windows 7 x64, on an Intel Core 2 Quad Q6600 2.40 GHz, 8 GB DDR2 RAM, with two 500 GB 7200 rpm disks.
The implementations tested were:
- Ruby 1.8.7 (2010-01-10 patchlevel 249) [i386-mingw32] (RubyInstaller)
- Ruby 1.9.1 p378 (2010-01-10 revision 26273) [i386-mingw32] (RubyInstaller)
- Ruby 1.9.2 dev (2010-05-31) [i386-mingw32] (experimental)
- JRuby 1.5.1 (ruby 1.8.7 patchlevel 249) (2010-06-06 f3a3480) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_20) [amd64-java]
- IronRuby 1.0 x64 for .NET 4.0
JRuby was run with the --fast
and --server
optimization flags.
Disclaimer
Synthetic benchmarks cannot predict how fast your programs will be when dealing with a particular implementation. They provide an (entertaining) educated guess, but you shouldn’t draw overly definitive conclusions from them. The values reported here should be assumed to be characteristic of server-side – and long running – processes and should be taken with a grain of salt.
The results
Please find below the execution times for the selected tests. Timeouts indicate that the execution of a single iteration for a given test took more than 60 seconds and had to be interrupted. Bold values indicate the best performance for each test.
Conclusions
Despite a couple of errors and a few timeouts, JRuby was the fastest of the lot, which can be seen as impressive if we consider that this is Windows we are talking about after all.
Ruby 1.9.1 and 1.9.2 were almost as fast as JRuby on these tests. With a few exceptions, the performances of the two 1.9 implementations were, expectedly, very similar.
JRuby, 1.9.1 and 1.9.2 were all faster than the current MRI implementation, which can be seen as a prerequisite as we move, as a community, away from Ruby 1.8. Finally, it’s worth noting that IronRuby’s performance was however in line with that of Ruby 1.8.7.
Update (July 3, 2010): The following box plot compares the various implementations for the tests for which all the implementations were successful. Only times for the largest successful input number were used in those tests where multiple input numbers were tested.
Get more stuff like this
Subscribe to my mailing list to receive similar updates about programming.
Thank you for subscribing. Please check your email to confirm your subscription.
Something went wrong.
Thank you Antonio for this shootout.
One point to mention is that these shootouts do not exercise IO operations.
Disk-based IO between 1.9.1 and 1.9.2 has improved a lot, which could be considered one *big* difference.
Of course, JRuby beats MRI on that, too.
@Luis Lavena >> do not exercise IO operations <<
Please say more.
Doesn't bm_fasta.rb write ~1MB? Doesn't bm_regex_dna.rb read 150kB 20 times? Doesn't bm_sum_file.rb read 390kB 100 times?
iirc the JRuby bm_meteor_contest.rb error can be avoided by not using –fast (perhaps also the bm_app_pentomino.rb error?)
@Issac:
bm_fasta: no real IO:
http://github.com/acangiano/ruby-benchmark-suite/blob/master/benchmarks/micro-benchmarks/bm_fasta.rb
bm_regex_dna.rb
File is read 20 times in text mode:
http://github.com/acangiano/ruby-benchmark-suite/blob/master/benchmarks/micro-benchmarks/bm_regex_dna.rb#L10
The actual IO operations are reduced to the brute force required by the iteration itself.
The bm test are oriented to the operations, not the IO part.
IO benchmarking is something else.
>> IO benchmarking is something else. <<
Something more to do with measuring hardware?
Actually no.
Ruby IO implementation in Windows does a lot of stuff in C code where should be using Windows own functionality.
That defined how Ruby works, and can’t be changed.
JRuby had a hard time implementing those “specs” of Ruby, but it leveraged on Java NIO functionality, which is pretty damn fast, after all, is Java.
Anyhow, your mileage may vary on every case, the shootout is just to give a comparison, but is not reality 🙂
Antonio,
Thanks for doing this shootout! I’ve got a couple of questions about your setup:
How come you picked the 64-bit versions of JRuby and IronRuby to benchmark against the 32-bit versions of MRI 1.8.7, 1.9.1, and 1.9.2? The 64-bit versions will definitely be slower than the 32-bit versions, just by definiton. I’m not sure if there are any benefits to using the 64-bit JVM, but for .NET it is preferred to use the 32-bit .NET runtime. I’d suggest re-running this with all 32-bit versions of the Ruby engines; for IronRuby this just means running ir.exe rather than ir64.exe.
Also, IronRuby does have an optimization flag that should be used for these types of raw-performance benchmarks, but whether or not it should be used depends on how you’re running these benchmarks. Ideally, you’d allow for some warm-up time, like run the benchmark for 60 seconds, and then start timing your desired number of iterations; this more-accurately simulates the server-scenario your disclaimer states, and IronRuby will perform optimally in this case. However, if you are not doing that, then IronRuby should be run with the “-X:NoAdaptiveCompilation” flag, which will force IronRuby to generate optimal code from the start (by default, IronRuby will use an interpreter until a certain method-call threshold is reached, and then start generating .NET bytecode; this let’s IronRuby avoid the overhead of emitting bytecode through .NET, but obviously trades off on raw performance as an interpreter is being used). The downside to using “-X:NoAdaptiveCompilation” is it will force evals to also be compiled, making any eval benchmarks much slower, which is why it’s ideal to just warm the process up first.
~Jimmy
This was done under the (reasonable) assumption that people would be running 64 bit VMs on a 64 bit machine. Unfortunately 64 bit versions of MRI/KRI are not available on Windows yet. I was not expecting a major difference in speed between the two either.
Only the best time for each test is reported, so you could think of the first few iterations as the warmup (for most tests here). I will add additional warmup time in the future though.
The 64-bit JIT-compiler has very different performance characteristics than the 32-bit JIT. Specifically, the 64-bit JIT is (as you assumed correctly) optimized for the server, so it does a ton more optimizations. For IronRuby, we see a 2x slowdown when JIT-compiling in 64-bit. This is OK if the process is warmed up enough, but it essentially needs double the warm-up time.
WRT warm-up: the tests are run repeatedly in the same process, taking the best run from that? If so, that’s fine. Otherwise, shelling out to ir.exe multiple times, doesn’t do any good for the compiler-warmup, so adaptive-compilation should be turned off.
Oh the joys of performance testing =)
That’s correct, Jimmy.
If this is expected “server-side” performance, shouldn’t this be a Ruby on Linux shootout?
Also, you left out Rubinius :/
Some people use Windows. The Linux one is upcoming.
As far as I know, Rubinius doesn’t have a ready to run Windows version.
Ah, thanks for the reply. I probably should have put a smiley or something to indicate: the linux comment was meant to be tongue-in-cheek 😉 (although I do use linux, server AND desktop).
I’ll look forward to the linux shootout and seeing how Rubinius compares (I’m assuming you intend to include Rubinius on the linux shootout). I hope you use the same hardware, because it would also be interesting to see how linux compares to the windows shootout.
Yes, Rubinius will be included and the same hardware will be used. Stay tuned. 🙂
Thanks for posting this. I love coming across your detailed comparisons.
WRT Linux shootout: Will you include REE?
Yes.
> JRuby was the fastest of the lot, which can be seen as impressive if we consider that this is Windows we are talking about after all.
Why the Windows caveat? Why should this NOT be seen as impressive if you’re talking about Linux, etc.?
This would be impressive on Linux as well, Michael. However, given that it’s a language that’s based on the JVM, it is especially impressive on Windows where you wouldn’t necessarily expect JVM based implementations to shine.
Why wouldn’t you necessarily expect JVM based implementations to shine on Windows?
Here are Java measurements on x86 Ubuntu –
http://shootout.alioth.debian.org/u32q/measurements.php?lang=java
Here are the corresponding measurements –
http://shootout.alioth.debian.org/demo/measurements.php?lang=java
The only big differences might be linked to threading: chameneos-redux and thread-ring.
The JVM has been more available on Windows than it has on Linux for a long time, I remember trying to get that stuff working on Linux many years ago and suffering all kinds of headaches. Solaris is a different story, but seriously, that caveat did seem a bit more like a random stab than based on industry reality.
Hell, you work for IBM, we used to resort to jikes on Linux back in the day…
Anyway, I had originally ignored that statement, which I’m now going to go back to doing.
It wasn’t meant to be a stab. I simply thought that the performance of the JVM on Linux would be better than that on Windows. I’m happy to learn that the JVM performs just as well on Windows. 🙂