Meditations on programming, startups, and technology
New Relic

Speeding up queries by a factor of 100 or more with DB2 Text Search

During a recent Rails project at IBM we had to deal with a large table consisting of customers. The table is made up of legacy enterprise data, and contains close to a million records.

Among many other fields, the table Customers includes a column name defined as VARCHAR. name is used to store company names.

One of the requirements for the project was to implement a Google Suggest-like feature. That is, when the user enters a few characters for the customer’s name, they should be prompted, via AJAX, with a list of possible suggestions containing that string (or at least beginning with that string.)

The naive approach would be to use a simple LIKE predicate, to generate queries such as:

SELECT name
FROM   Customers
WHERE  name LIKE '%micro%'
ORDER  BY name

One of the main problems with this approach is performance. Using LIKE this way doesn’t allow us to take advantage of indexes that are defined on the column name.

Unsurprisingly, on a modest server, such a query takes about 25 seconds. A UI that isn’t able to respond for 25 seconds will feel like an eternity for the end user, of course, so a simple query like this obviously cannot be seen as a viable solution.

Enter DB2 Text Search. You may be familiar with other full text search engines such as Ferret, Lucene, or Sphinx. DB2 V9.7.4 ships with an extremely powerful full text search engine that has plenty of useful features, including excellent integration with native XML columns.

Using the free edition of DB2 (DB2 Express-C) which includes Text Search, we were able to implement the autocomplete functionality we were after in a heartbeat, thanks to a query like the following:

SELECT name
FROM   Customers
WHERE  CONTAINS(name, 'micro') = 1
ORDER  BY name

This query was executed in mere fractions of a second for most searches, and behaved exactly as needed. In fact, not only did we match strings that began with the searched token, but also ones that contained it elsewhere (e.g., micro would match both Microsoft and Sun Microsystems.) The results where ordered alphabetically, but could have easily been ordered by relevance via the SCORE function, also available via DB2 Text Search.

As usual, it’s a matter of using the right tool for the right job, and DB2 Text Search was created exactly for these kind of scenarios.

Let’s briefly look at how you can also go about setting up and playing with it on Linux.

Installing DB2 Text Search

To start with, you’ll need to have a 64bit Linux distro, and then follow these simple steps.

Download DB2 Express-C. Make sure you grab the db2exc_974_LNX_x86_64.tar.gz file and not the Light edition.

Next, install DB2 by following these steps (the Installing DB2 section in particular.) Make sure you select a custom installation, and that you select everything including DB2 Text Search (which is not checked by default).

After you’ve started DB2 with db2start via the instance user (e.g., db2inst1,) launch the text search server by running:

db2ts start for text

To enable text search for an existing database, run:

db2ts enable database for text connect to mydb

Next, you’ll need to create a text search index and populate it. For example:

db2ts "create index customer_name for text on customers(name) connect to mydb"
db2ts "update index customer_name for text connect to mydb"

That’s it. From now on you’ll be able to use functions such as CONTAINS to quickly search for your data.

Such a function can also be easily invoked in Rails:

Customer.where("CONTAINS(name, ?) = 1", name)

Or wrapping it a little in a model:

class Customer < ActiveRecord::Base
  # ... 
  def self.search(name, max_res = 10)
    where('CONTAINS(name, ?) = 1', name).
    order('name').
    limit(max_res)
  end
end

Which can then be invoked as follows:

Customer.search("micro")

(Technically we wouldn’t have needed limit, so long as we passed RESULTLIMIT to the CONTAINS function.)

You can read more about DB2 Text Search at the Information Center.

No related posts.


If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

receive my posts by email

3 Responses to “Speeding up queries by a factor of 100 or more with DB2 Text Search”

  1. johno says:

    Queries using like “name%” actually should use index on name. It’s called prefix search. And normally you would also have a limit clause there, since you really don’t need to retrieve all matches when someone queries for something general like “peter%”.

    What you probably wanted to write is a query with “%name%”. It has the same semantics as CONTAINS.

  2. duke says:

    DB2 text search cannot distinguish between C++ and C#

    Is this likely to be fixed any time?

    thanks

Leave a Reply

I sincerely welcome and appreciate your comments, whether in agreement or dissenting with my article. However, trolling will not be tolerated. Comments are automatically closed 15 days after the publication of each article.

Copyright © 2005-2012 Antonio Cangiano. All rights reserved.