During a recent Rails project at IBM we had to deal with a large table consisting of customers. The table is made up of legacy enterprise data, and contains close to a million records.
Among many other fields, the table
Customers includes a column
name defined as
name is used to store company names.
One of the requirements for the project was to implement a Google Suggest-like feature. That is, when the user enters a few characters for the customer’s name, they should be prompted, via AJAX, with a list of possible suggestions containing that string (or at least beginning with that string.)
The naive approach would be to use a simple
LIKE predicate, to generate queries such as:
SELECT name FROM Customers WHERE name LIKE '%micro%' ORDER BY name
One of the main problems with this approach is performance. Using
LIKE this way doesn’t allow us to take advantage of indexes that are defined on the column
Unsurprisingly, on a modest server, such a query takes about 25 seconds. A UI that isn’t able to respond for 25 seconds will feel like an eternity for the end user, of course, so a simple query like this obviously cannot be seen as a viable solution.
Enter DB2 Text Search. You may be familiar with other full text search engines such as Ferret, Lucene, or Sphinx. DB2 V9.7.4 ships with an extremely powerful full text search engine that has plenty of useful features, including excellent integration with native XML columns.
Using the free edition of DB2 (DB2 Express-C) which includes Text Search, we were able to implement the autocomplete functionality we were after in a heartbeat, thanks to a query like the following:
SELECT name FROM Customers WHERE CONTAINS(name, 'micro') = 1 ORDER BY name
This query was executed in mere fractions of a second for most searches, and behaved exactly as needed. In fact, not only did we match strings that began with the searched token, but also ones that contained it elsewhere (e.g.,
micro would match both
Sun Microsystems.) The results where ordered alphabetically, but could have easily been ordered by relevance via the
SCORE function, also available via DB2 Text Search.
As usual, it’s a matter of using the right tool for the right job, and DB2 Text Search was created exactly for these kind of scenarios.
Let’s briefly look at how you can also go about setting up and playing with it on Linux.
To start with, you’ll need to have a 64bit Linux distro, and then follow these simple steps.
Download DB2 Express-C. Make sure you grab the
db2exc_974_LNX_x86_64.tar.gz file and not the Light edition.
Next, install DB2 by following these steps (the Installing DB2 section in particular.) Make sure you select a custom installation, and that you select everything including DB2 Text Search (which is not checked by default).
After you’ve started DB2 with
db2start via the instance user (e.g.,
db2inst1,) launch the text search server by running:
db2ts start for text
To enable text search for an existing database, run:
db2ts enable database for text connect to mydb
Next, you’ll need to create a text search index and populate it. For example:
db2ts "create index customer_name for text on customers(name) connect to mydb" db2ts "update index customer_name for text connect to mydb"
That’s it. From now on you’ll be able to use functions such as
CONTAINS to quickly search for your data.
Such a function can also be easily invoked in Rails:
Customer.where("CONTAINS(name, ?) = 1", name)
Or wrapping it a little in a model:
class Customer < ActiveRecord::Base # ... def self.search(name, max_res = 10) where('CONTAINS(name, ?) = 1', name). order('name'). limit(max_res) end end
Which can then be invoked as follows:
(Technically we wouldn’t have needed
limit, so long as we passed
RESULTLIMIT to the
You can read more about DB2 Text Search at the Information Center.
I sincerely welcome and appreciate your comments, whether in agreement or dissenting with my article. However, trolling will not be tolerated. Comments are automatically closed 15 days after the publication of each article.