26 responses

  1. david
    June 5, 2009

    impressive!

    I would like to see postgres, could you include it in your benchmarks?

  2. Tom K
    June 5, 2009

    I’d like to second david’s request for a postgres take on the benchmarks.

  3. john
    June 5, 2009

    In addition to Postgres, I would be very interested to see similar benchmarks for other MySQL table types than InnoDB. Could you add such tests and update your article?

  4. Antonio Cangiano
    June 5, 2009

    Hi guys,

    I’ll see what I can do next week. Have a great weekend! :)

  5. Chris
    June 5, 2009

    Wow, very nice. I’m also very curious to see postgres.

  6. Marcus Brito
    June 5, 2009

    Given that the gender column has only two possible values despite being a varchar, shouldn’t this have an index? It would be interesting comparing performance numbers with and without indices.

  7. Oskar Pearson
    June 5, 2009

    Hi

    Unfortunately this is a known limitation of the innodb mysql backend.

    Mysql’s innodb backend doesn’t cache “select count(*)” requests. So while db2 is able to simply return the last result from it’s cache, mysql has to actually count the number of matching records by traversing the index.

    If you were to do searches with a randomly changing “where” clause, you’d find the numbers closer. That’s contrary to your article’s intent, of course, but it’ll be more accurate. I generally do “select count(*) from some_table where foreign_key_id = 123″ – so doing a similar query will give us additional data.

    Some of the other mysql database backends do return cached results, afaik. But activerecord defaults to innodb for transaction support.

    Have a look at this – “way back” from 2006.

    http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/ for more information.

    And this has some explanation of why:
    http://www.scribd.com/doc/2085411/MySQL-UC-2007-Innodb-Performance-Optimization

    Oskar

  8. Jan
    June 5, 2009

    It is widely known that COUNT(*) is not InnoDB’s strength. Try MyISAM just for kicks :)

    Other than that, there’s so much wrong with micro benchmarks like this. It gives people the wrong ideas. I hope these kinds of comparisons are only taken into account when *all* other things are equal — and they rarely are. Or more elaborate: http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html

  9. Paul
    June 5, 2009

    Can you repeat this for DB2 on the iSeries?

  10. Antonio Cangiano
    June 5, 2009

    @Marcus: An index on gender significantly speeds up “Count females” for both DB2 and MySQL. However this significantly slows down “Count males /w profession” too. (In both cases, DB2 would still be several times faster.)

  11. Robert Young
    June 5, 2009

    MyISAM cannot be called a RDBMS backend; it is nothing more than a file handler, which is why it has always been fast, it provides no Relational or Database Services. If that’s all you want, then fine.

  12. Jan
    June 5, 2009

    @Robert The only point that MyISAM makes is that it uses a cached index number where InnoDB will do an index scan. yet another proof of the silliness of micro benchmarks.

  13. Giuseppe Maxia
    June 6, 2009

    This benchmark can be translated into: “how long does it take to repeat the same COUNT query 100 times.” This is a naive (at best) benchmark that doesn’t tell me nothing about the database potential. But anyway, if you enable the query cache in MySQL, the repetition of 100 queries is at least three times faster than DB2.

    Try
    set global query_cache_size=1024*1024;
    and repeat this test.

    Moreover: the table structure doesn’t correspond to the data from the freebase project. These androids don’t strike me as extremely reliable …

  14. Alex
    June 6, 2009

    What about posting the database configuration settings? I fully agree with Jan, that this article is a primary example of how silly micro benchmarks are and how misleading. DB2 does NOT need this kind of posts to be recognized as a great database. It all depends on the context and the use-case and this article is ignoring context.

  15. Justin Swanhart
    June 6, 2009

    Now, I’m not a PostgreSQL expert, but it is my understanding that PG will create bitmap indexes /on the fly/ to satisfy such lookups on non-indexed columns.

    This would make a tremendous difference in such a scenario, especially when the query cache is disabled in MySQL.

    Gender is a low cardinality column, so bitmaps are extremely useful with such data.
    So if PG is creating the bitmap indexes dynamically, that shows why this is faster, and is a good argument for the feature.

    Would you kindly provide PG and MySQL explain plan for your queries so that we may figure out what is going on?

  16. Justin Swanhart
    June 6, 2009

    I took a look at the feature, and it seems that PG will dynamically combine indexes using bitmaps. I guess this is similar to the ‘index merge’ functionality provided by MySQL.

  17. Justin Swanhart
    June 6, 2009

    Wow. I shouldn’t read things in the middle of the night. DB2 not PG. Oops.

    *slaps forehead*

  18. Antonio Cangiano
    June 6, 2009

    Guys, I just want to clarify a few points. InnoDB was used because it’s the default – and most widely adopted – engine in the Rails community. You may notice that an id column was added for ActiveRecord, and that I only used 4 columns from the initial dataset for the sake of simplicity. To keep things simple, I used the default limits which are provided by ActiveRecord (e.g., varchar(255)). This is a simple, micro-benchmark which doesn’t need to prove much. It simply shows that with this data, under these conditions, DB2 is much faster than MySQL.

    The reasons for choosing DB2 over MySQL are many, and they are not all related to performance. And even when considering performance alone, this post didn’t set out to prove that DB2 is faster than MySQL in general. There are industry benchmarks that cost millions of dollars to run that take into account all the possible moving parts. You can’t expect that from a simple post. I observed some data under particular circumstances and reported the outcome. While it is undisputed that DB2’s counting abilities are still faster than MySQL with the InnoDB engine, this post only claims that with this data, for the query used, DB2 was faster than MySQL. And I think that’s fair.

    The “100 times repetition” freaked a few people out. I understand that. Please bear in mind though that I went for 100 iterations only after having seen somewhat comparable times obtained for the first execution of each query (i.e., n=1).

  19. Lonny Eachus
    June 6, 2009

    Yes, but you still didn’t address the point. The way your queries were performed acted precisely on the LEAST efficient operation you possibly could have chosen for MySQL. Regardless of whether it was done on purpose, it does skew the results.

    Were you to add indexes, and count on indexed fields rather than (*), the performance for InnoDB in MySQL goes up dramatically. In one case, where I had a huge database (millions of records), the difference was between a full 10 minutes using count (*) and only a couple of seconds using count (indexed column).

    Perhaps DB2 would outperform MySQL anyway, but, unintentional as it may be, this was still not a valid comparison. It is rather like racing a Ford sports car against a Chevy sports car, when the gas pedal on the Chevy has a block underneath that only lets the pedal go part way down.

  20. Leon Katsnelson
    June 6, 2009

    I am surprised and amazed how many people are saying that InnoDB is lousy at COUNT(*) so, Antonio, you should use MyISAM instead. Isn’t the point of measuring things to compare them and to identify strengths and weaknesses? So, if Antonio did a benchmark that found that MyISAM did not do subselects very well then people would cry wolf and say you should have done this with InnoDB?
    I think the benchmark is right on. If you are a Ruby on Rails programmer it is helpful to know how fast your object.count method will perform with MySQL vs. DB2. Majority of RoR programmers use MySQL and object.count is a common operation.

  21. Lonny Eachus
    June 6, 2009

    To be clear, what I am saying is: Yes, this was just a “micro-benchmark”, and one should not read too much into that. However, under the circumstances, this benchmark does in fact NOT show what it purports to show: that DB2 is GENERALLY faster than MySQL.

  22. Antonio Cangiano
    June 6, 2009

    Lonny wrote:
    Were you to add indexes, and count on indexed fields rather than (*), the performance for InnoDB in MySQL goes up dramatically.

    As I mentioned before, adding an index improves the results a lot for COUNT(indexed field) but it does so in a way that is proportionate to the results shown here. This means that DB2 still is several times faster than MySQL when counting records, regardless of whether an index has been defined on a column for both databases or not.

    It would be nice to do a follow up post which includes indexes and a few million records, but I suspect that people would still find other reasons to complain about the outcome. So I’m not so sure it is worth the effort.

    And just to clarify something once again, this post was not meant to demonstrate that DB2 is generally faster than MySQL, even if such is the case.

  23. Mark Callaghan
    June 6, 2009

    My initial reaction to this result is that it was a ‘bad benchmark’ because a significant difference was reported without explanation. And my first guess was that DB2 cached the table in the DBMS buffer cache and InnoDB did not because of different configurations.

    My reaction was probably wrong and Antonio may have found an interesting result and something we need to investigate for MySQL.

    Is DB2 that much more efficient than MySQL/InnoDB? Note that MyISAM is slower than InnoDB when there is anything in the WHERE clause that prevents it from using the codepath for fast count(*) queries.

    When I run ‘select count(*)’ on a table with a similar size and no indexes, the query takes ~0.4 seconds which is close to the result here. Maybe DB2 is that much faster on count queries and we need to fix MySQL.

    From oprofile, these are the top 4 functions:
    25.0744 rec_get_offsets_func
    19.3973 row_search_for_mysql
    11.1162 buf_page_optimistic_get_func
    10.1157 mtr_commit

  24. Neil Lalonde
    June 11, 2009

    I find it hard to believe that any Rails app would use MyISAM tables, unless the developers don’t plan to use transactions (scary!) AND they have no intention of using WHERE clauses. In which case, maybe they can replace both MySQL and DB2 with a flat file or two.

Leave a Reply

 

 

 

Back to top
mobile desktop