A close look at three Rails 2.1 bugs

Ruby on Rails 2.1 has been out for six weeks now. Let’s take a closer look at three database related bugs that affect this release.

1. SQLite’s db creation generates false warnings

This is an innocuous bug, and if you work with SQLite I’m sure that you encountered and safely ignored it. When you create a Rails application, the default adapter in use is sqlite3, unless you specify otherwise with the -d option. The config/database.yml will reference three databases by default: db/development.sqlite3, db/test.sqlite3 and db/production.sqlite3. Now if you try to create the databases through the db:create or db:create:all rake tasks, the database will be created, but you’ll also get a warning:


$ rake db:create:all
db/development.sqlite3 already exists
db/production.sqlite3 already exists
db/test.sqlite3 already exists		

That warning message is not actually true. The three databases didn’t exist before you ran rake, and were created by the task instead, as you’d expect. This is a small annoyance, but one that pops up way too often for my taste. Let’s see where things went wrong.

The db:create and db:create:all tasks invoke the method create_database defined within the file railties/lib/tasks/databases.rake in the repository (or the frozen vendor/rails folder). This is the definition of the method:

def create_database(config)
  begin
    ActiveRecord::Base.establish_connection(config)
    ActiveRecord::Base.connection
  rescue
    case config['adapter']
    when 'mysql'
      @charset = ENV['CHARSET'] || 'utf8'
      @collation = ENV['COLLATION'] || 'utf8_general_ci'
      begin
        ActiveRecord::Base.establish_connection(config.merge('database' => nil))
        ActiveRecord::Base.connection.create_database(config['database'], :charset => (config['charset'] || @charset), :collation => (config['collation'] || @collation))
        ActiveRecord::Base.establish_connection(config)
      rescue
        $stderr.puts "Couldn't create database for #{config.inspect}, charset: #{config['charset'] || @charset}, collation: #{config['collation'] || @collation} (if you set the charset manually, make sure you have a matching collation)"
      end
    when 'postgresql'
      @encoding = config[:encoding] || ENV['CHARSET'] || 'utf8'
      begin
        ActiveRecord::Base.establish_connection(config.merge('database' => 'postgres', 'schema_search_path' => 'public'))
        ActiveRecord::Base.connection.create_database(config['database'], config.merge('encoding' => @encoding))
        ActiveRecord::Base.establish_connection(config)
      rescue
        $stderr.puts $!, *($!.backtrace)
        $stderr.puts "Couldn't create database for #{config.inspect}"
      end
    when 'sqlite'
      `sqlite "#{config['database']}"`
    when 'sqlite3'
      `sqlite3 "#{config['database']}"`
    end
  else
    $stderr.puts "#{config['database']} already exists"
  end
end

At first glance, it looks OK: it tries to establish a successful connection, if it succeeds, it executes the else clause and prints a message stating that the database already exists; otherwise Ruby creates it in the rescue clause with the sqlite3 command line tool (or sqlite if using the sqlite adapter).

What truly happens though, is that the rescue clause is never executed (short of Ruby raising an unrelated error). The reason for this is that the two lines of code between begin and rescue, not only attempt to connect to the given database, but will create the database if this doesn’t exist already (well, the SQLite’s Active Record adapter creates it). This means that regardless of whether the database existed or not, those two lines will be successfully run and won’t raise any errors. The final outcome is that the database that didn’t exist, gets created, and the else clause gets executed, wrongly warning us that the database already existed.

I opened a ticket for this and submitted a patch. This is the new create_database method that works correctly:

def create_database(config)
begin
if config['adapter'] =~ /sqlite/
if File.exist?(config['database'])
$stderr.puts "#{config['database']} already exists"
else
begin
# Create the SQLite database
ActiveRecord::Base.establish_connection(config)
ActiveRecord::Base.connection
rescue
$stderr.puts $!, *($!.backtrace)
$stderr.puts "Couldn't create database for #{config.inspect}"
end
end
return # Skip the else clause of begin/rescue    
else
ActiveRecord::Base.establish_connection(config)
ActiveRecord::Base.connection
end
rescue
case config['adapter']
when 'mysql'
@charset   = ENV['CHARSET']   || 'utf8'
@collation = ENV['COLLATION'] || 'utf8_general_ci'
begin
ActiveRecord::Base.establish_connection(config.merge('database' => nil))
ActiveRecord::Base.connection.create_database(config['database'], :charset => (config['charset'] || @charset), :collation => (config['collation'] || @collation))
ActiveRecord::Base.establish_connection(config)
rescue
$stderr.puts "Couldn't create database for #{config.inspect}, charset: #{config['charset'] || @charset}, collation: #{config['collation'] || @collation} (if you set the charset manually, make sure you have a matching collation)"
end
when 'postgresql'
@encoding = config[:encoding] || ENV['CHARSET'] || 'utf8'
begin
ActiveRecord::Base.establish_connection(config.merge('database' => 'postgres', 'schema_search_path' => 'public'))
ActiveRecord::Base.connection.create_database(config['database'], config.merge('encoding' => @encoding))
ActiveRecord::Base.establish_connection(config)
rescue
$stderr.puts $!, *($!.backtrace)
$stderr.puts "Couldn't create database for #{config.inspect}"
end
end
else
$stderr.puts "#{config['database']} already exists"
end
end

I removed the creation of the SQLite (2 and 3) database from the case statement inside the rescue clause, but the actual deal is in these lines:

if config['adapter'] =~ /sqlite/
if File.exist?(config['database'])
$stderr.puts "#{config['database']} already exists"
else
begin
# Create the SQLite database
ActiveRecord::Base.establish_connection(config)
ActiveRecord::Base.connection
rescue
$stderr.puts $!, *($!.backtrace)
$stderr.puts "Couldn't create database for #{config.inspect}"
end
end
return # Skip the else clause of begin/rescue    
else
ActiveRecord::Base.establish_connection(config)
ActiveRecord::Base.connection
end

If the specified adapter is sqlite or sqlite3 then Ruby checks if the database file exists or not. If it exists, the warning message is printed; if it doesn’t, Ruby creates the database automatically by invoking the connection method after running establish_connection with the configuration as an argument. If there is an error during the creation of the database — or the connection to the database, if this already exists — the error and its backtrace are printed. Finally, we exit the method so as to not execute the else clause of the begin/rescue statement, which would erroneously print that the database already existed. If the adapter in use is any other adapter, then we just run the two lines as we did in the non-patched version of the method.

Notice also that there was a second problem with the original code. I couldn’t replace those connection lines with `sqlite3 "#{config['database']}"` because they would open an SQLite3 shell which remains open, so the rake task waits forever for that shell to exit (you can actually see this if you use system rather than “).

2. Preloading has_and_belongs_to_many associations generates non-standard SQL

This second bug is quite serious and relates to the generation of non-standard SQL in has_and_belongs_to_many associations. Imagine that you have two models and a finder as follows:

class Project < ActiveRecord::Base
has_and_belongs_to_many :developers
end

class Developer < ActiveRecord::Base
has_and_belongs_to_many :projects
end

p = Project.find(:all, :include => :developers)

That finder will generate a query like:

SELECT developers.*, t0.project_id as _parent_record_id 
FROM developers 
INNER JOIN developers_projects as t0 ON developers.id = t0.developer_id 
WHERE (t0.project_id IN (1,2))

Notice that _parent_record_id? That’s problematic because an SQL identifier cannot begin with an underscore, and this is true for all three versions of the standard (92, 99 and 2003). You can verify the non-conformity of the generated SQL through this validator. That single underscore broke support for DB2 and Oracle, and possibly other standard compliant databases.

I submitted a patch for this some time ago and it was immediately applied by Jeremy Kemper, so Edge Rails and the next version of Rails aren’t affected. Meanwhile the IBM_DB gem worked around the issue and I believe the Oracle enhanced adapter did the same.

3. DEFAULT NULL NULL

The third bug is somewhat similar to the second one and it’s caused once again by non-standard SQL. This too, like the previous bug, breaks support for DB2, Oracle and possibly some other databases. Imagine that we have the following migration file:

class CreatePosts < ActiveRecord::Migration
def self.up
create_table :posts do |t|
t.string :title
t.text :body

t.timestamps
end
end

def self.down
drop_table :posts
end
end

When the up method gets invoked by a rake task, the following query will be generated (this is the SQLite version of it):

CREATE TABLE "posts" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
"title" varchar(255) DEFAULT NULL NULL, 
"body" text DEFAULT NULL NULL,
"created_at" datetime DEFAULT NULL NULL,
"updated_at" datetime DEFAULT NULL NULL
) 

Exclude for a moment the primary key, whose syntax is entirely adapter specific. The rest of the query still shows a problem. DEFAULT NULL NULL is not standard SQL and will not be accepted by some databases. This is caused by the fact that in the column definition we didn’t specify a :default option, so we have the equivalent of :default => nil, which translates into DEFAULT NULL. This precedes the second NULL, which is there to indicate that the column accepts nulls. Again, even the second NULL is not part of the SQL standard. A field is nullable by default, therefore we only need to specify something if it must not accept NULLs (i.e. by appending NOT NULL).

The solution to this problem lies in changing the behavior of ColumnDefinition’s instance method to_sql in activerecord/lib/active_record/connection_adapters/abstract/schema_definitions.rb. This is the current definition:

def to_sql
column_sql = "#{base.quote_column_name(name)} #{sql_type}"
add_column_options!(column_sql, :null => null, :default => default) unless type.to_sym == :primary_key
column_sql
end

Notice how the second argument of add_column_options! is a hash that sets the null and default values in every case. If these values are nil, the value NULL gets pushed into the final SQL query. We need to prevent this from happening by passing the two options only when they are not nil.

def to_sql
column_sql = "#{base.quote_column_name(name)} #{sql_type}"
column_options = {}
column_options[:null] = null unless null.nil?
column.options[:default] = default unless default.nil?
add_column_options!(column_sql, column_options) unless type.to_sym == :primary_key
column_sql
end

The migration file we saw above will now generate the following query (again, in the case of SQLite):

CREATE TABLE "posts" (
"id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
"title" varchar(255),
"body" text,
"created_at" datetime,
"updated_at" datetime
)

This is standard SQL, if we exclude the primary key syntax which is specific to SQLite. More importantly, this would work and be accepted by other database systems like DB2 and Oracle. Version 0.9.5 of the IBM_DB gem monkey patches the issue in a similar fashion, in order to work around this problem and provide support for Rails 2.1. It is possible that the Oracle enhanced adapter 1.1.1 does the same thing. I didn’t submit a patch for this issue because several tickets regarding it are already open, and Nick Sieger already beat me to it. His patch was submitted more than a month ago but it hasn’t received any attention yet. I hope that this post will contribute towards changing that.

Conclusion

I think the last two bugs highlight a valid lesson: the importance of adhering to the SQL standard, particularly when implementing an Object-Relational mapper that supports so many databases. ActiveRecord takes the approach of generating SQL queries by mixing common SQL bits, defined outside of a particular adapter, with dialect-specific parts which are defined by each adapter. It is therefore crucial to ensure that the bits that end up in the SQL queries, independently from the adapter in use, are based on the SQL standard and not on the behavior of SQLite or MySQL.

The same principle applies when building a CMS system or a popular blog engine. If the project is aimed at being portable from one RDBMS to another with little hassle, attention should be payed to custom queries in order to keep them as close to the SQL standard as possible. Of course, it’s perfectly fine to create projects that are database specific and take advantage of that particular database’s strength. For example, DB2 Express-C offers the ability to handle XML in a native manner through a technology called pureXML. This is a fantastic feature, which can’t really be ported to a different RDBMS, but it’s worth its weight in gold, particularly when working in Ruby.

Lastly, don’t let this reflect negatively on your judgment of Rails’ code quality. A large and ambitious project like Rails is bound to have bugs here and there, no matter how scrupulous its developers are.

Get more stuff like this

Subscribe to my mailing list to receive similar updates about programming.

Thank you for subscribing. Please check your email to confirm your subscription.

Something went wrong.

9 Comments

  1. Frederick cheung July 14, 2008
  2. Nathan Zook July 14, 2008
  3. Antonio Cangiano July 15, 2008
  4. Nathan Zook July 16, 2008
  5. Antonio Cangiano July 17, 2008
  6. tmacedo July 28, 2008
  7. Mario Briggs August 1, 2008
  8. Nathan Zook August 4, 2008
  9. Science August 30, 2008

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.