Do Androids Count Electric Sheep with DB2 or MySQL?

Counting rows is an ubiquitous operation on the web, so much so that it’s often overused. Regardless of misuse, there is no denying that the performance of counting operations has an impact on most applications. In this post I’ll discuss my findings about the performance of DB2 9.5 and MySQL 5.1 regarding counting records.

For those of you who are not into science fiction, let me clarify that the odd title of this post is a tongue-in-cheek reference to the great novel, Do Androids Dream of Electric Sheep?.

I connected to the database, created the table, imported the data and benchmarked counting operations using ActiveRecord in a standalone script. Here is the code I used:

#!/usr/bin/env ruby
require "rubygems"
require "active_record"
require 'benchmark'

ActiveRecord::Base.establish_connection(
  :adapter  => :mysql,
  :username => "myuser",
  :password => "mypass",
  :database => "mydb")

ActiveRecord::Schema.define do
  create_table :people, :force => true do |t|
    t.string :name, :null => false
    t.string :fbid, :null => false
    t.string :gender
    t.string :profession
  end
end

class Person < ActiveRecord::Base
end

# This can be sped up by performing an import instead
Person.transaction do
  File.open("person.tsv").each_line do |line|
    line = line.split(/\t/)
    p = Person.new
    p.name = line[0]
    p.fbid = line[1]
    p.gender = line[6]
    p.profession = line[17]
    p.save!
  end
end

n = 100
Benchmark.bm(26) do |x|
  x.report("Count all:") { n.times { Person.count } }
  x.report("Count profession:") { n.times { Person.count(:profession) } }
  
  x.report("Count females:") do
    n.times { Person.count(:conditions => "gender = 'Female'") }
  end

  x.report("Count males w/ profession:") do
    n.times { Person.count(:profession, :conditions => "gender = 'Male'") }
  end
end

Please note that importing records in a huge transaction containing hundreds of thousands of INSERT operations is far from the most efficient way to import. Massive imports of data using the load/import facilities provided by each database is the way to go (also see the ar-extensions plugin). The lengthy import wasn’t benchmarked here though, so it isn’t determinant for this article.

people.tsv is a 92.7 MB tab separated values file that contains 875,857 records from the Freebase project (in my file I removed the header line, leaving only records).

For those who are not familiar with ActiveRecord, the queries executed behind the scenes are (in order):

SELECT count(*) AS count_all FROM people

SELECT count(people.profession) AS count_profession FROM people

SELECT count(*) AS count_all FROM people WHERE (gender = 'Female')

SELECT count(people.profession) AS count_profession FROM people WHERE (gender = 'Male')

While the table definition (for MySQL) is:

CREATE TABLE `people` (
	`id` int(11) DEFAULT NULL auto_increment PRIMARY KEY,
	`name` varchar(255) NOT NULL,
	`fbid` varchar(255) NOT NULL,
	`gender` varchar(255),
	`profession` varchar(255)
) ENGINE=InnoDB

As easily verified by enabling logging with:

ActiveRecord::Base.logger = Logger.new(STDOUT)

Without much further ado, here are the times I obtained on my last generation MacBook Pro 2.66 GHz with 4 GB DDR3 RAM, and 320 GB @ 7200 rpm hard disk, running Mac OS X Leopard:


MySQL:

  Count all:                  42.467522
  Count profession:           52.130935
  Count females:              54.575469
  Count males w/ profession:  64.046631

DB2:

  Count all:                  5.818485
  Count profession:           7.714391
  Count females:              8.556377
  Count males w/ profession:  9.656739

Or in graph form:

COUNT performance graph

That’s an impressive difference. To be exact, in this example DB2 was between 6 and 7 times faster than MySQL. In the case of COUNT(*), DB2 counted almost a million records in 58 milliseconds, or in about the blink of an eye according to Wolfram Alpha.

For those who are skeptical, please note that DB2 was not manually fine-tuned in any way. The client codepage was set to 1252 to allow Greek letters, and the log size was increased to permit such a huge transaction during the import. That’s it, no optimizations were attempted. This is DB2 Express-C out of the box. It looks like smart androids count electric sheep with DB2 after all. 😛

The advantages of DB2 over MySQL when dealing with a massive volume of traffic are well known (and not limited to performance either), but DB2 can dramatically improve performance even for your average web application. And DB2 9.7, which will be released this month, increases the performance and the ability to self-tune itself to the available resources and required workload even further. If you’d like to try DB2 Express-C for yourself, you can download it here. It doesn’t cost you a dime to obtain and can be used for development, testing and production absolutely free of charge.

Get more stuff like this

Subscribe to my mailing list to receive similar updates about programming.

Thank you for subscribing. Please check your email to confirm your subscription.

Something went wrong.

24 Comments

  1. david June 5, 2009
  2. Tom K June 5, 2009
  3. john June 5, 2009
  4. Antonio Cangiano June 5, 2009
  5. Chris June 5, 2009
  6. Marcus Brito June 5, 2009
  7. Oskar Pearson June 5, 2009
  8. Jan June 5, 2009
  9. Paul June 5, 2009
  10. Antonio Cangiano June 5, 2009
  11. Robert Young June 5, 2009
  12. Jan June 5, 2009
  13. Giuseppe Maxia June 6, 2009
  14. Alex June 6, 2009
  15. Justin Swanhart June 6, 2009
  16. Justin Swanhart June 6, 2009
  17. Justin Swanhart June 6, 2009
  18. Antonio Cangiano June 6, 2009
  19. Lonny Eachus June 6, 2009
  20. Leon Katsnelson June 6, 2009
  21. Lonny Eachus June 6, 2009
  22. Antonio Cangiano June 6, 2009
  23. Mark Callaghan June 6, 2009
  24. Neil Lalonde June 11, 2009

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.