By Antonio Cangiano, Software Engineer & Technical Evangelist at IBM
Currently Browsing: DB2

Enabling LIMIT and OFFSET in DB2 9.7.2

In order to enable the new LIMIT and OFFSET syntax in DB2 9.7.2, you’ll need to execute the following commands:

$ db2set DB2_COMPATIBILITY_VECTOR=MYS
$ db2stop
$ db2start

With this enabled, you’ll be able to execute queries such as:

SELECT * FROM users LIMIT 5
SELECT * FROM users LIMIT 5 OFFSET 10

IBM releases DB2 Express-C 9.7.2

Download DB2 Express-C for freeI’m glad to report that we’ve just released a new version of DB2, namely DB2 Express-C 9.7.2. This version — of what is arguably the most advanced hybrid database system in the world — is (as usual) absolutely free to develop with, deploy in production, and even redistribute.

Unlike SQL Server Express which is limited to 10GB of data, there are no data limits with DB2 Express-C. This means no database size limits or any restrictions on the number of databases, connections, users, etc…

Oracle XE is even more restrictive than SQL Server Express and seriously insecure, having not been updated for a long time despite the presence of major security vulnerabilities. Unlike Oracle XE, DB2 Express-C uses the same core code as its commercial editions and is always kept up-to-date.

In case you are wondering, there’s no catch. We generate our revenue from affordable, but entirely optional, dedicated 24/7 technical support (a popular business model in the open source community). If you haven’t tried DB2 Express-C before or you currently have an older version installed, consider getting the latest one here and taking it for a spin.

What’s new

Some of the improvements and features of this FixPack 2 edition include:

  • Support for the MySQL syntax for LIMIT and OFFSET. This greatly simplifies the process of getting existing applications to work with DB2.
  • The IBM Data Server Provider for .NET has been enhanced in several ways that may help you with application performance, data server compatibility, and simplifying application development.
  • Both Rails 3 and Django 1.2.1 are supported from out of the gate thanks to our IBM provided, open source drivers and adapters.
  • Enhancements made to user-defined functions (UDFs), which now support both default values for parameters and named arguments.
  • The RESTORE command can now be used with the TRANSPORT option to copy table spaces and SQL schemas as a set from a database backup image to another active database.
  • Auditing improvements now allow replay of past database activities.

FREE DOWNLOAD


IBM_DB 2.5.0 with support for Rails 3 is out

This is a tiny post to let you know that IBM just released version 2.5.0 of the IBM_DB gem with support for the upcoming Rails 3. That’s what I call both proactive and a true testament of IBM’s commitment towards DB2 on Rails.

Aside from providing a working adapter and driver before the new framework release is even out, this release has a few improvements and fixes, such as getting rid of a minor bug related to prepared statements and has_many associations.

Finally, ibm_db 2.5 improves upon Unicode integration with support for any encoding format that’s permitted by Ruby 1.9.


Rails, DB2 and the Enterprise

The Enterprise :)

Recently Matt Aimonetti wrote an insightful article about Rails and the Enterprise. In it he identifies five core Enterprise application needs:

  • Reliability
  • Support
  • Performance
  • Advantage over the competition
  • Integration and transition path

Matt then proceeds to illustrate how Rails does a good job in regards to most of these points, despite a few existing challenges.

Among these challenges, I can clearly see the following:

  • There isn’t a 1-800-RAILS number; the community may be great, but there are not exactly yearly contracts in place. Furthermore, the author of the Ruby driver for, say, Oracle doesn’t owe you a thing. He may or may not be there for you when you need something to be fixed quickly.

  • XML support is less than ideal (but it is improving).

  • Integration with the Enterprise world is not easy, due in part to less than stellar SOAP support (but that is also improving).

  • If you’re taking advantage of ActiveRecord and an Enterprise database like Oracle, your DBA isn’t likely to be happy that you aren’t using prepared statements.

You may think these are small points, and in the startup world they generally are. However, in the Enterprise world they do make the difference between adoption and niche.

One thing that Matt forgot to mention is DB2, which should be the poster child for how Rails can be Enterprise ready. And as a bonus, you get to throw around IBM’s name (which is synonymous with Enterprise) because Rails is both supported by, and used within, IBM.

Let’s address each point above with DB2 on Rails in mind.

IBM is the only database vendor to provide a Ruby driver and ActiveRecord/Rails adapter for its databases. This means that you have a team that’s accountable when things don’t work as they’re supposed to. This team is accountable, regardless of whether you have a contract with IBM or not; it’s their job, not a hobby. This involvement with Rails dates back to 2006, with continuing releases and improvements ever since. Our IBM’s optional yearly 24/7 support contracts (e.g., for less than $3000 a year per server with DB2 Express-C) include support for DB2 on Rails as well.

DB2 supports native storage and querying of XML documents and data (plus it’s fast). This technology is known as pureXML.

DB2 can both consume and serve SOAP web services. This lets DB2 do the integration for you.

DB2 on Rails supports parameterized queries. (While on the subject of queries, if you are using JRuby you can take advantage of pureQuery, which is an IBM created Enterprise solution that’s aimed at making your queries fast, reliable, manageable and easy to debug.)

If you are trying to introduce Rails into your Enterprise job, chances are that DB2 will already be present within the company infrastructure. If not, you can use DB2 Express-C which is entirely free — thus making it easier to introduce than an expensive solution.

IBM is one of the most trusted brands on the market today, as it has been for decades now. Banks and Enterprise companies the world over trust DB2 with their most critical data. One way for the Rails community to increase the adoption of Rails in the Enterprise, is to acknowledge and embrace the great pair that is Rails and DB2.


DB2 support for Django 1.2 is here

The latest release of the IBM Adapter for Django now supports Django 1.2. Aside from enabling you to use the most recent version of Django, this release adds a few new goodies into the mix, that I’m sure many will appreciate.

For example, IBM’s adapter (through the underlying DBI wrapper) now uses persistent connections, which are especially helpful when dealing with Django – as it lacks connection pooling. (Of course DB2 also has the Connection Concentrator to aid in reducing the usage of server resources and improving scalability.)

Furthermore, the adapter adds support for the DECIMAL datatype, a necessary feature when dealing with money and currencies. Various enhancements and bug fixes were included too; check them out on Google Groups.

As a reminder, DB2 Express-C is an absolutely free of charge version of DB2 and it’s production ready (not a toy version). You can download it from here. Take it for a spin, experiment – chances are you’ll like it. If you need a guide to getting started, be sure to check out this free e-book by my colleagues Raul, Ian, and Rav.


Heads up: IBM is looking for top notch student hackers

As a thank you for following my blog, I’d like to introduce you to what I think is a great opportunity for the right students. My team is looking for two bright students for a 16 month, full-time internship opportunity with IBM.

Aside from being a bright and ambitious student, you should currently be working towards a Computer Science degree at any recognized University in the world (and have completed at least 2 years of your degree). You read that right, this opportunity is not limited to Canadian students. We are looking for the most talented students in the world, so as long as you’re ready to relocate to Toronto for 16 months, and are able to start the internship program in May 2010, we have the means to get you here. The location for this internship will be the Toronto Software Lab, in Markham, Ontario.

In particular, we are looking for students who are highly skilled in the area of web technologies and who are enthusiastic about cloud computing. This is NOT your run-of-the-mill, boring student job. You’ll be working with cutting edge technology on high visibility, fast paced projects that involve web technologies, cloud computing, Xp/Agile methodologies, and other related fields. You’ll be granted ample freedom to express yourself and your programming creativity. Should you be accepted for this well paid (in my opinion) internship program, you will report directly to my manager (a cool, down to earth guy) and I’ll be your mentor.

If you fit the criteria described above, please get in touch (cangiano at ca dot ibm dot com) and tell me a bit about yourself. You don’t need to attach your résumé at this stage, but if you drop me a line, I’ll be able to provide you with info on how to apply through the official IBM channels. We’ll take it from there.

Please note that I do NOT receive any form of compensation for referring you. I’m just hoping to provide a tiny subset of my readership with what could be a really great opportunity.


DB2 support for Ruby/Rails turns 2.0

The API development team just released a major version of the ibm_db gem. Detailed installation instructions are available on RubyForge (PDF). Among several improvements, there are three particularly newsworthy features:

  1. Support for Ruby 1.9;
  2. Support for mingw32, used by the latest version of the Ruby installer for Windows;;
  3. Performance and security enhancements through “automatic” parameterized queries.

As we approach the release of Rails 3, supporting Ruby 1.9 is becoming more of a necessity. Likewise, the so called “One-Click installer” on Windows has been replaced by a current project that uses mingw32, which offers a much needed performance boost on Windows. Having a mingw32 compatible gem is starting to become a requirement for many of our Windows users.

Finally, DB2 is now the only database that supports prepared statements in ActiveRecord (without changing any of the application’s code). This has important performance benefits, as I explained in my article Improve the security and performance of DB2 Ruby on Rails applications using parameterized queries, which was published today by developerWorks.


Grab DB2 Express-C 9.7.1 now and join the DB2Night Show

IBM just released DB2 Express-C 9.7.1 (aka DB2 Express-C 9.7 FixPack 1). This version includes a long list of enhancements. Don’t forget to download and install it; as per its previous versions, it’s entirely free of charge.

You can grab it directly from here:

The DB2Night Show

While we are on the subject, I want to mention a great new independent initiative called the DB2Night Show. In the words of its organizers:

The DB2Night Show has the simple mission of building a fun, interactive community around DB2 LUW. Each episode will include fun stuff, education, tips, and special guests. Guests will range from industry experts to “Joe the DBA”.

The first six episodes have been wonderful, and the seventh one, which is about a comparison of DB2 Express-C with other free versions like SQL Server, as well as Cloud Computing, will air tomorrow.

You can reserve your free virtual seat for Friday, December 4, 2009 at 9am PST/ 10 am CST / 11 am EST, here. Even if you can’t make it, register anyway, so you’ll have access to replays. Every participant who attends will receive a $50 voucher to use for the Amazon EC2 cloud. Not only that, but one lucky, randomly selected attendee will also receive an Amazon.com gift certificate. With the holiday season coming up, that could come in handy. :)


What is DB2 pureScale?

There are two main types of scalability: vertical and horizontal. Vertical scalability consists of potentiating the hardware specs of a given server. This is typically done by increasing the number (and to a minor extent, speed) of the processors, adding more RAM, and so on. Commodity hardware tends to impose a strong limit on the resources that can be augmented. There is only so much that can be stuffed into your typical x64_86 Dell server, even replacing the entire machine with the top of the line model (having the same architecture). For this very reason, many customers opt to purchase more expensive hardware that has the advantage of facilitating vertical scalability, by allowing much more headroom for additional resources. This is why Unix-based solutions like IBM System p + AIX and Sun’s Solaris + SPARC are common choices in demanding environments.

For all their merits, even premium hardware solutions can only do so much in terms of vertical scalability. They may postpone the problem, by allowing a large number of processors and a huge amount of RAM to be installed, but they are still not limitless. And that’s when it becomes cheaper and much more feasible to attempt horizontal scalability, or the process of adding extra servers that share data and the overall system load.

Commercial databases, particularly DB2, excel at vertical scalability, which incidentally is one of the weakest aspects of open source RDBMS like MySQL. When it comes to horizontal scalability though, even commercial databases tend do struggle a little. Scalability is achieved, but a lot of work is required to configure complex setups involving masters, slaves, replication, failover servers, and so on. Until now.

DB2 pureScale

IBM has just introduced a new technology for DB2 called pureScale, which accomplishes three incredible feats:

  1. Horizontal scalability: Virtually unlimited capacity by simply adding new nodes to the cluster.
  2. Application transparency: No code changes needed.
  3. Continuos data availability: regardless of which server fails, the system will continue to operate albeit with a proportionally reduced throughput.

DB2 pureScale applies to DB2 running on Unix (IBM System p running AIX only, for the moment), the same principles and rock solid technology employed by DB2 for z/OS Data Sharing and SYSPLEX. This technology happens to be employed by some of the largest companies in the world and it’s considered the golden standard when it comes to easily create OLTP database clusters, providing virtually unlimited scalability capabilities.

You can read more about it here and here. If you are going to IOD this year, don’t forget to attend this overview by one the architects of DB2 pureScale. And even if you can’t head over to Las Vegas next week, you can still attend online our Chat with the Lab today at 11:30am EST. Database scalability is a hot topic and this is going to be one of the most popular chats with the lab ever, so register for free now.


Installing and configuring DB2 Express-C 9.7 on Ubuntu 9.04 Server

DB2 Express-C 9.7 can easily be installed on Ubuntu 9.04 Desktop edition by simply issuing sudo ./db2setup. This will open up a launchpad and you’ll be able to install the product through a wizard. But what about setting up DB2 Express-C 9.7 on Ubuntu 9.04 Server edition? When you are ssh-ing into your VPS or dedicated sever, there are no GUIs to help you out.

IBM provides all the information on how to proceed through a series of detailed documents, which explain why each command is required and what your options are. But if you are looking for a single document that is slightly more concise in nature, you can read this excellent step by step guide that was published today on the blog bits.of.info.


Improve the speed and security of your SQL queries

An easy way to improve the performance and security of SQL queries is to replace literals with parameters. By replacing literal values with parameters, advanced relational databases will be able to compile your queries and have their execution plans cached. This saves time and precious resources when the same query (minus the actual values) is executed over and over.

Consider the following series of queries:

SELECT * FROM users WHERE karma BETWEEN 100 AND 499;
SELECT * FROM users WHERE karma BETWEEN 500 AND 999;
SELECT * FROM users WHERE karma BETWEEN 1000 AND 1999;
SELECT * FROM users WHERE karma BETWEEN 2000 AND 4999;
SELECT * FROM users WHERE karma BETWEEN 5000 AND 9999;
SELECT * FROM users WHERE karma BETWEEN 10000 AND 50000;

These each represent the same query and can be transformed into a single parameterized query:

SELECT * FROM users WHERE karma BETWEEN ? AND ?;

Trying to use clever tricks with quotes in order to inject arbitrary SQL code becomes futile. Parameters are considered values, and have no effect on the structure of the query itself.

Parameterized queries are therefore efficient and go a long way towards preventing SQL injection attacks in your applications. They have virtually no downside.

Newbie developers often ignore the existence of this feature and end up irritating seasoned DBAs who have to deal with the consequences of their incompetence. Leon Katsnelson argues that this is such an important matter, that every DBA should forward this Computerworld article to their developers. I tend to agree with how important of an issue that is.

That article provides the following example in Java:

String lastName = req.getParameter("lastName");
String query = "select * from customers where last_name = ?"
PreparedStatement pstmt = connection.prepareStatement(query);
pstmt.setString(1, lastName);
try { ResultSet results = pstmt.execute(); }

Here I’ll show you an example of how to work with parameterized queries from Ruby and Python. I’ll use the Ruby and Python drivers for DB2.

Ruby first:

require 'ibm_db'

conn = IBM_DB.connect("mydb", "db2inst1", "mypassword")

query = "SELECT * FROM users WHERE karma BETWEEN ? AND ?"
pstmt = IBM_DB.prepare(conn, query)

values = [500, 999]
IBM_DB.execute(pstmt, values)

while row = IBM_DB.fetch_array(pstmt)
  puts "#{row[0]}:#{row[1]}"
end

We load the driver (use mswin32/ibm_db on Windows, and ibm_db.bundle on Mac), create a prepared statement, and then bind the two parameter values to it through the execute method. We then fetch the resultset one row at a time and print the value of the first two fields for each record. For fine-tuned control we could have used the IBM_DB::bind_param method.

The Python version is very similar:

import ibm_db

conn = ibm_db.connect("mydb", "db2inst1", "mypassword")

query = "SELECT * FROM users WHERE karma BETWEEN ? AND ?"
pstmt = ibm_db.prepare(conn, query)

values = (500, 999)
ibm_db.execute(pstmt, values)

tuple = ibm_db.fetch_tuple(pstmt)
while tuple:
    print tuple[0] + ":" + tuple[1]
    tuple = ibm_db.fetch_tuple(pstmt)

As you can see, working with parameterized queries is not any harder than dynamically generating SQL queries. Yet the benefits of doing so are huge.

Unfortunately, despite being a very sound choice to base an Object-Relational Mapper (ORM) on, ActiveRecord does not use parameterized queries. Even when it looks like you are passing parameters to a given method, these are actually used to dynamically form an SQL query. Of course you are still free to use parameterized queries in your Rails applications by employing the driver directly. But I really think this is something ActiveRecord should be built upon.

Luckily for Django developers, Django’s ORM uses parameterized queries, thus improving both performance and security with a single design choice. In the Python world you couldn’t get away with ignoring parameterized queries.

For those of you using Rails, all is not lost. DB2 Express-C 9.7 has a killer feature known as the Statement Concentrator, which caches similar queries allowing them to use a shared access plan. It’s not as efficient as using prepared statements in your code, but it’s the best you can do when, as in the case of ActiveRecord, you can’t use parameterized queries directly. Leon’s article explains in greater detail how this feature actually works.


Enabling support for DB2 and Python/Django/SQLAlchemy on Mac OS X Snow Leopard

This is the Python version of a post I made about Ruby a few days ago.

Now that Mac OS X 10.6 is out, it’s time to leave the world of 32 bit computing behind. The pre-installed Python interpreter will run in 64 bit mode by default, so you may need to pay attention when installing some C-based eggs.

Assuming you have DB2 Express-C installed already, the ibm_db Python egg for DB2 can easily be installed by following these simple steps:

$ sudo -s
$ export IBM_DB_LIB=/Users/<username>/sqllib/lib64
$ export IBM_DB_DIR=/Users/<username>/sqllib
$ export ARCHFLAGS="-arch x86_64"
$ easy_install ibm_db

This will install the ibm_db C driver, and the ibm_db_dbi Python module that complies to the DB-API 2.0 specification.

You can verify that the installation was successful my running the following:

$ python
>>> import ibm_db
>>>

Now, for the Django adapter, install Django first (if you haven’t done so already):

$ sudo easy_install django

The Django adapter can then be installed as follows:

$ sudo easy_install ibm_db_django

Finally, if have installed SQLAlchemy and wish to install the DB2 adapter for it, run:

$ sudo easy_install ibm_db_sa

Please let me know if you encounter any issues, I’d be glad to help you.


Installing the Ruby driver for DB2 on Mac OS X Snow Leopard

Now that Mac OS X 10.6 is out, it’s time to leave the world of 32 bit computing behind. The pre-installed Ruby interpreter will run in 64 bit mode by default, so you may need to pay attention when installing some C-based gems. The ibm_db Ruby gem for DB2 can easily be installed or updated to the latest available version by following these simple steps:

$ sudo -s
$ export IBM_DB_LIB=/Users/<username>/sqllib/lib64
$ export IBM_DB_INCLUDE=/Users/<username>/sqllib/include
$ export ARCHFLAGS="-arch x86_64"
$ gem install ibm_db

You can verify that the installation was successful my running the following:

$ irb
>> require 'ibm_db.bundle'
=> true

Please let me know if you encounter any issues, I’d be glad to help you.


DB2 on Mac OS X Snow Leopard

Earlier today I headed over to the local Apple Store to purchase a copy of Snow Leopard, the newest version of Apple’s operating system. There was a decent line up, as I expected. Not the kind of line up you’d encounter with the launch of a new iPhone, but it was fairly busy for a Friday morning. When I arrived, I took my place at the end of the queue where rumors were swirling around about the store having sold out of single copies of Snow Leopard in its first hour. Luckily, that was just a rumor as I suspected, and they still had a few copies left. So I got my copy for $39.99 (CND) including taxes and left.

The installation was pretty much automatic and took roughly an hour. No versions to select from, no serial numbers to insert, no online activations, and a price that “keeps honest people honest”. Yes, it’s just an update, but $25 (US) for an improved operating system is definitely a fair price. Microsoft are you listening? My first impression is that Snow Leopard is a very polished version of Leopard and it’s darn fast. Well done Apple.

DB2 users may be wondering if it is safe to install this version of Mac OS X 10.6 or if their existing installation will go awry. I’m happy to report that DB2 Express-C 9.5 FixPack 2 for Mac works fine on Snow Leopard. Both pre-existing installations and brand new installations of DB2 work properly, I can attest to that. So install away my friends!

Installing DB2 on Snow Leopard


The DB2 adapter now supports Django 1.1

I’m glad to announce that the API team has just released version 1.0.2 of the adapter for Django. And on my birthday to boot, what a nice present. This version extends its support to the recently released Django 1.1, as well as incorporating the feedback that was received earlier on. :) (For installation instructions, take a look at the README file.)

IBM confirms its commitment to support Python and Django, and gives Django well deserved credentials in environments where having IBM’s support counts. Django is becoming an increasingly mature web framework with the potential to do well within the Enterprise world. Having support for DB2 will surely help.

The next step will be working with the Django team to bake DB2 support directly into Django’s releases. The code for the adapter is released under a liberal OSI-compliant license that is compatible with Django’s own BSD, and the API team is more than willing to work on the development and support of the adapter should it become part of Django. We love Django and ponies. Let’s make this happen, guys.


Reasons to switch to DB2 9.7

IBM recently put out a video with a few quotes from customers who’ve had a chance to try DB2 9.7 in production. At times the editing is a bit cheesy, but what our customers are saying is factual. DB2 9.7 is quite a breakthrough and it can save millions of dollars for large companies. If your needs are more modest, like a demanding Web application or a point of sale server, DB2 Express-C can still give you most of the same benefits at zero the cost. Wanna take it for a spin? You can download it here or sign up at RightScale.com to try it on the Cloud.


DB2 Express-C 9.7 and the Django adapter released

This is a great day for those of us who love DB2, as DB2 Express-C 9.7 has just been released. As mentioned before, this is the best DB2 ever, and an extremely important release.

To learn more about what’s new in this release, please check out the recording of our latest webinar:


If you run Linux, Unix or Windows, download it while it’s hot.

DB2 9.7 on the Cloud

Another great aspect of this release is that for the first time ever, DB2 has been released both as a product and as a deployment on the Cloud. If you pop over to RightScale, you can get a trial account for free and should see DB2 Express-C 9.7 on both CentOS and Ubuntu within the partner catalog. RightScale has been an amazing partner and they really do wonders to simplify Cloud Computing. In ten minutes time you can be up and running on the Cloud, thanks to the templates provided.

DB2 on the Cloud

DB2 support for Django

But the good times don’t stop there, we are also announcing the first official release of the Django adapter for DB2. It sounded crazy when I first proposed the idea within IBM back in 2006, but now it’s a reality.

You can download the .tar.gz archive from the Google Code homepage for the project, or simply by clicking here. This version fully supports the Django 1.0.2 API. For instructions on how to install it, please read the Getting started with the IBM DB Django adapter guide. The current version supports DB2 for Linux, Unix, Windows and MAC OS X, version 8.2 or higher (9.5 FP2 or higher for MAC OS X). In the future, IBM Cloudscape, Apache Derby, Informix (IDS) and both System i & z/OS will be supported.

ibm_db gem updated to 1.1

I’ll conclude this DB2-centric post with a smaller, but still interesting announcement. The ibm_db gem has been updated to version 1.1. This release includes support for ActiveRecord’s QueryCache mechanism, enhanced support for BigInt (and BigSerial), support for rename_column (requires DB2 9.7), parametrization of the timestamp datatype (requires DB2 9.7), and a few fixes and performance enhancements as well. It is recommended that you upgrade to this version.


Do Androids Count Electric Sheep with DB2 or MySQL?

Counting rows is an ubiquitous operation on the web, so much so that it’s often overused. Regardless of misuse, there is no denying that the performance of counting operations has an impact on most applications. In this post I’ll discuss my findings about the performance of DB2 9.5 and MySQL 5.1 regarding counting records.

For those of you who are not into science fiction, let me clarify that the odd title of this post is a tongue-in-cheek reference to the great novel, Do Androids Dream of Electric Sheep?.

I connected to the database, created the table, imported the data and benchmarked counting operations using ActiveRecord in a standalone script. Here is the code I used:

#!/usr/bin/env ruby
require "rubygems"
require "active_record"
require 'benchmark'

ActiveRecord::Base.establish_connection(
  :adapter  => :mysql,
  :username => "myuser",
  :password => "mypass",
  :database => "mydb")

ActiveRecord::Schema.define do
  create_table :people, :force => true do |t|
    t.string :name, :null => false
    t.string :fbid, :null => false
    t.string :gender
    t.string :profession
  end
end

class Person < ActiveRecord::Base
end

# This can be sped up by performing an import instead
Person.transaction do
  File.open("person.tsv").each_line do |line|
    line = line.split(/\t/)
    p = Person.new
    p.name = line[0]
    p.fbid = line[1]
    p.gender = line[6]
    p.profession = line[17]
    p.save!
  end
end

n = 100
Benchmark.bm(26) do |x|
  x.report("Count all:") { n.times { Person.count } }
  x.report("Count profession:") { n.times { Person.count(:profession) } }

  x.report("Count females:") do
    n.times { Person.count(:conditions => "gender = 'Female'") }
  end

  x.report("Count males w/ profession:") do
    n.times { Person.count(:profession, :conditions => "gender = 'Male'") }
  end
end

Please note that importing records in a huge transaction containing hundreds of thousands of INSERT operations is far from the most efficient way to import. Massive imports of data using the load/import facilities provided by each database is the way to go (also see the ar-extensions plugin). The lengthy import wasn’t benchmarked here though, so it isn’t determinant for this article.

people.tsv is a 92.7 MB tab separated values file that contains 875,857 records from the Freebase project (in my file I removed the header line, leaving only records).

For those who are not familiar with ActiveRecord, the queries executed behind the scenes are (in order):

SELECT count(*) AS count_all FROM people

SELECT count(people.profession) AS count_profession FROM people

SELECT count(*) AS count_all FROM people WHERE (gender = 'Female')

SELECT count(people.profession) AS count_profession FROM people WHERE (gender = 'Male')

While the table definition (for MySQL) is:

CREATE TABLE `people` (
	`id` int(11) DEFAULT NULL auto_increment PRIMARY KEY,
	`name` varchar(255) NOT NULL,
	`fbid` varchar(255) NOT NULL,
	`gender` varchar(255),
	`profession` varchar(255)
) ENGINE=InnoDB

As easily verified by enabling logging with:

ActiveRecord::Base.logger = Logger.new(STDOUT)

Without much further ado, here are the times I obtained on my last generation MacBook Pro 2.66 GHz with 4 GB DDR3 RAM, and 320 GB @ 7200 rpm hard disk, running Mac OS X Leopard:

MySQL:

  Count all:                  42.467522
  Count profession:           52.130935
  Count females:              54.575469
  Count males w/ profession:  64.046631

DB2:

  Count all:                  5.818485
  Count profession:           7.714391
  Count females:              8.556377
  Count males w/ profession:  9.656739

Or in graph form:

COUNT performance graph

That’s an impressive difference. To be exact, in this example DB2 was between 6 and 7 times faster than MySQL. In the case of COUNT(*), DB2 counted almost a million records in 58 milliseconds, or in about the blink of an eye according to Wolfram Alpha.

For those who are skeptical, please note that DB2 was not manually fine-tuned in any way. The client codepage was set to 1252 to allow Greek letters, and the log size was increased to permit such a huge transaction during the import. That’s it, no optimizations were attempted. This is DB2 Express-C out of the box. It looks like smart androids count electric sheep with DB2 after all. :-P

The advantages of DB2 over MySQL when dealing with a massive volume of traffic are well known (and not limited to performance either), but DB2 can dramatically improve performance even for your average web application. And DB2 9.7, which will be released this month, increases the performance and the ability to self-tune itself to the available resources and required workload even further. If you’d like to try DB2 Express-C for yourself, you can download it here. It doesn’t cost you a dime to obtain and can be used for development, testing and production absolutely free of charge.


TextMate bundle for DB2

Before leaving on a recent business trip to Italy I begun working on a TextMate bundle for DB2. Here I’ll introduce it in the hope that it will interest some TextMate and DB2 users.

Installation

There are two simple prerequisites for using this bundle: 1) Install DB2 as a regular user (not root); 2) Source the db2profile by, for example, adding . ~/sqllib/db2profile to your shell profile (e.g., in ~/.profile). Both of these will ensure that your user is able to issue DB2 commands.

To install the bundle, you can run the following (assuming you have git installed):

sudo mkdir -p /Library/Application\ Support/TextMate/Bundles
cd /Library/Application\ Support/TextMate/Bundles
git clone git://github.com/acangiano/db2-textmate-bundle.git "DB2.tmbundle"

Alternatively, you can download this file, unzip it and then double click on the DB2.tmbundle that was extracted. Of course, the previous method has the advantage of being able to easily update the bundle through git pull.

If TextMate is running while you executed the previous step, you may want to also execute the following line:

osascript -e 'tell app "TextMate" to reload bundles'

This is equivalent to selecting Bundles ? Bundle Editor ? Reload Bundles from within TextMate.

Using the bundle

If you followed the instructions above, at this point you should see a DB2 menu under Bundles:

DB2 Bundle for Textmate

TextMate is a text editor, and as such it expects you to have an open document, in order to execute commands from this bundle. The good news is that an empty, untitled, open by default window will suffice.

As you can see from the image above, you can start the DB2 server, stop it, run an arbitrary query or DB2 command and open up the Information Center for DB2 9.5 in your browser. Below the separating line there are also two submenus, Database and Tables.

Let’s consider Database first:

The Database submenu

Within the Database menu we can connect to a database (this step is required before being able to query a database), disconnect, create a database, drop it, and create the SAMPLE database that ships with DB2 so you can get some practice.

Likewise, while it’s at a very early stage of development, we have the Tables submenu:

The Tables submenu

So far we can list the tables within the database we are currently connected to, and drop an existing table.

You can use the Shift-Command-D combination to bring up a convenient contextual menu of the most common tasks, while you are within a document (unless you are editing an HTML file, in which case that combination brings up Safari):

DB2's menu

The result of each command executed is flashed to the user in the form of an informative tooltip:

The output of DB2START as a tooltip

Conclusion

The aim of this bundle is to provide a more convenient way to work with DB2 without having to switch between your coding editor and the command line. It’s admittedly just a “seed”, a very early effort, but I sincerely hope that even at this stage it can be useful to some people. Being an open source project that’s released under the MIT license, you are free to contribute to it and more than welcome to fork it on Github as well.


Screencast on how to install DB2 Express-C on Mac OS X

Over the weekend I recorded a screencast on how to install DB2 on Mac OS X. You can watch it below:


Installing DB2 on Mac OS X from Antonio Cangiano on Vimeo.


The blog post I reference within the video can be found here. Yes, that’s an Italian accent. I hope you don’t mind it. ;-)


« Previous Entries

Copyright © 2005-2010 Antonio Cangiano. All rights reserved.