By Antonio Cangiano, Software Engineer & Technical Evangelist at IBM
Currently Browsing: Quick Tips

Who is accessing your Gmail account?

The Gmail team recently introduced a new feature (in the footer) that enables account holders to verify the latest login activities on their account. I routinely check mine and the results are usually boring, reminding me I check my email way too often (and I do so mostly via browser, through my Canadian IP).

An unwelcome surprise

If you don’t check yours regularly, you should (my version of Google Apps doesn’t have this feature though). In fact, tonight during a routine check, I discovered an unwelcome surprise: an entry that didn’t belong. The following screenshot shows the recent activity on my account (with some information blanked out):

My Gmail activity window

See that US, IMAP line? That wasn’t me. So did someone manage to access my account? Or was it a web application that I authorized? Before panicking, I decided to look into whatever information I could gather about that IP.

It turns out that it’s the IP of a server hosted by Slicehost (RackSpace), but I couldn’t find any website running on that IP address (173.203.211.51). To make things more interesting, I found two people (one German, one Japanese) complaining online about the same IP address and IMAP access to their Gmail accounts.

Was my account hacked into? I have a hard time believing that someone actually managed to login by guessing my password which was as secure as a password can be. I haven’t used my laptop on an unsecured WiFi. I use a Mac and am very cautious about what I install, so I doubt I have a keylogger installed or anything of that nature. Using 1Password I’m even immune to the so-called “tab napping” attacks.

Possible culprits

Assuming that this is not a misunderstanding and some SaaS application I authorized is not in fact using that server to perform a legitimate action, I think it’s likely that someone managed to get in through a vulnerability or backdoor in one such application.

I’m not pointing fingers here, nor accusing anyone, but it is interesting to find such an occurence happening so shortly after granting the aforementioned authorizations. The websites I granted access to were:

  • Zoho Discussions (24 hours before the suspected intrusion happened)
  • Trendly (3 days before the intrusion)
  • Etacts (a few weeks before the intrusion)

It’s worth mentioning that in the past Etacts had scared the crap out of me with their American IP showing up in the recent activity list. However a lookup has always shown the questionable IP to belong to them.

Do any of these services intentionally use the server with IP 173.203.211.51? Since I’m not the only one who suspects a violation from this IP, it would be interesting to hear what Slicehost has to say about it? Perhaps they know if it’s a legitimate or illegitimate use of their server.

How to deal with an email intrusion

The perception of being intruded upon, whether it’s real or just a scare, is definitely not pleasant. Just in case the same happens to you, here is what I did to deal with the situation:

  • I verified that there were no messages sent on my behalf.
  • I checked that there weren’t any new filters that would forward emails to a possible malicious user.
  • I verified that there weren’t any forwards and ensured that forwarding was disabled.
  • POP3 was already disabled, and I have now disabled IMAP as well.
  • I revoked access to my Google account for all listed web applications.
  • I changed my password to another humongous one on a different computer, with a brand new installation of Linux, directly wired to my DSL modem (bypassing the whole wireless infrastructure I set up at home).
  • I will, soon enough, format my Mac (I’ve been planning a DBAN wipe, plus a brand new installation for a while either way).
  • I will continue to monitor my account activity.

This is the kind of information I felt necessary to share even if this turns out to be a false alarm. I highly suggest that you keep an eye on your Gmail account activity and if you find something suspicious, act accordingly.

UPDATE (June 17, 2010): Please read my follow up post.


Setup Ruby Enterprise Edition, nginx and Passenger (aka mod_rails) on Ubuntu

The following is a very short guide on setting up Ruby Enterprise Edition (REE), nginx and Passenger, for serving Ruby on Rails applications on Ubuntu. It also includes a few quick and easy optimization tips.

We start with setting up REE (x64), using the .deb file provided by Phusion:

wget http://rubyforge.org/frs/download.php/66163/ruby-enterprise_1.8.7-2009.10_amd64.deb
sudo dpkg -i ruby-enterprise_1.8.7-2009.10_amd64.deb
ruby -v

In output you should see “ruby 1.8.7 (2009-06-12 patchlevel 174)…” or similar. If this is the case, good; while you are there, update RubyGems and the installed gems:

sudo gem update --system
sudo gem update

Next, you’ll need to install nginx, which is a really fast web server. The Phusion team has made it very easy to install, but if you simply follow most instructions found elsewhere, you’ll get the following error:

checking for system md library ... not found
checking for system md5 library ... not found
checking for OpenSSL md5 crypto library ... not found

./configure: error: the HTTP cache module requires md5 functions
from OpenSSL library.  You can either disable the module by using
--without-http-cache option, or install the OpenSSL library in the
system,
or build the OpenSSL library statically from the source with nginx by
using
--with-http_ssl_module --with-openssl=
 options.

Instead, we are going to install libssl-dev first and then nginx and its Passenger module:

sudo aptitude install libssl-dev
sudo passenger-install-nginx-module

Follow the prompt and accept all the defaults (when prompted to chose between 1 and 2, pick 1).

Before I proceed with the configuration, I like to create an init script and have it boot at startup (the script itself is adapted from one provided by the excellent articles at slicehost.com):

sudo vim /etc/init.d/nginx

The content of which needs to be:

#! /bin/sh

### BEGIN INIT INFO
# Provides:          nginx
# Required-Start:    $all
# Required-Stop:     $all
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: starts the nginx web server
# Description:       starts nginx using start-stop-daemon
### END INIT INFO

PATH=/opt/nginx/sbin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/opt/nginx/sbin/nginx
NAME=nginx
DESC=nginx

test -x $DAEMON || exit 0

# Include nginx defaults if available
if [ -f /etc/default/nginx ] ; then
    . /etc/default/nginx
fi

set -e

. /lib/lsb/init-functions

case "$1" in
  start)
    echo -n "Starting $DESC: "
    start-stop-daemon --start --quiet --pidfile /opt/nginx/logs/$NAME.pid \
        --exec $DAEMON -- $DAEMON_OPTS || true
    echo "$NAME."
    ;;
  stop)
    echo -n "Stopping $DESC: "
    start-stop-daemon --stop --quiet --pidfile /opt/nginx/logs/$NAME.pid \
        --exec $DAEMON || true
    echo "$NAME."
    ;;
  restart|force-reload)
    echo -n "Restarting $DESC: "
    start-stop-daemon --stop --quiet --pidfile \
        /opt/nginx/logs/$NAME.pid --exec $DAEMON || true
    sleep 1
    start-stop-daemon --start --quiet --pidfile \
        /opt/nginx/logs/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS || true
    echo "$NAME."
    ;;
  reload)
      echo -n "Reloading $DESC configuration: "
      start-stop-daemon --stop --signal HUP --quiet --pidfile /opt/nginx/logs/$NAME.pid \
          --exec $DAEMON || true
      echo "$NAME."
      ;;
  status)
      status_of_proc -p /opt/nginx/logs/$NAME.pid "$DAEMON" nginx && exit 0 || exit $?
      ;;
  *)
    N=/etc/init.d/$NAME
    echo "Usage: $N {start|stop|restart|reload|force-reload|status}" >&2
    exit 1
    ;;
esac

exit 0

Change its permission and have it startup at boot:

sudo chmod +x /etc/init.d/nginx
sudo /usr/sbin/update-rc.d -f nginx defaults

From now on, you’ll be able to start, stop and restart nginx with it. Start the server as follows:

sudo /etc/init.d/nginx start

Heading over to your server IP with your browser, you should see “Welcome to nginx!”. If you do, great, we can move on with the configuration of nginx for your Rails app.

Edit nginx’ configuration file:

sudo vim /opt/nginx/conf/nginx.conf

Adding a server section within the http section, as follows:

    server {
        listen 80;
        server_name example.com;
        root /somewhere/my_rails_app/public;
        passenger_enabled on;
        rails_spawn_method smart;
    }

The server name can also be a subdomain if you wish (e.g., blog.example.com). It’s important that you point the root to your Rails’ app public directory.

The rails_spawn_method directive is very efficient, allowing Passenger to consume less memory per process and speed up the spawning process, whenever your Rails application is not affected by its limitations (for a discussion about this you can read the proper section in the official guide).

If you have lots of RAM (e.g., more than 512 MB) on your server, you may want to consider increasing you maximum pool size, with the directive passenger_max_pool_size from its default size of 6. Conversely, if you want to limit the number of processes running at any time and consume less memory on a small VPS (e.g., 128 to 256MB), you can decrease that number down to 2 (or something in that range). (Always test a bunch of configurations to find one that works for you). You can read more about this directive, in the official guide.

While you are modifying nginx’ configuration, you may also want to increase the worker processes (e.g., to 4, on a typical VPS) and add a few more tweaks (such as enabling gzip compression):

# ...
http {
    passenger_root /usr/local/lib/ruby/gems/1.8/gems/passenger-2.2.5;
    passenger_ruby /usr/local/bin/ruby;

    include       mime.types;
    default_type  application/octet-stream;

    access_log  logs/access.log;

    sendfile        on;
    keepalive_timeout  65;
    tcp_nodelay on;

    gzip on;
    gzip_comp_level 2;
    gzip_proxied any;   

    server {
    #...

When you are happy with the changes, save the file, and restart nginx:

sudo /etc/init.d/nginx restart

If you wish to restart Passenger in the future, without having to restart the whole web server, you can simply run the following command:

touch /somewhere/my_rails_app/tmp/restart.txt

Passenger also provides a few handy monitoring tools. Check them out:

sudo passenger-status
sudo passenger-memory-stats

That’s it, you are ready to go! I hope that you find these few notes useful.


Add code highlighting to your Google Waves

Google Wave is still rough around the edges, but it has a lot of potential in terms of becoming a great collaboration tool. As a developer, your first question will probably be: “How do I add code highlighting to my waves?”. The answer is straightforward, however not very easy to find if you google it. I hope this post will help fellow developers who are experimenting with Google Wave.

The following steps are required to obtain syntax highlighting for your code:

  1. Create a new wave and add the Syntaxy robot to your wave. Use the wave address: kasyntaxy@appspot.com.
  2. Reply to your first message or within it, thereby creating a reply (called “blip” in Google lingo).
  3. Specify your code’s language, prefixing the name with a hash and exclamation mark, like #!python or #!ruby.

At this point, as you type the code in your blip it will be highlighted by the Syntaxy bot as shown in the picture below:

Highlight code on Google Wave

More advanced automatic syntax highlighting bots will probably appear as Google Wave progresses, but this one should do the trick for now. On a side note, if you copy and paste code from XCode, the code formatting will be kept in your waves and blips without the need for bots.


Getting MacRuby’s compiler to work

MacRuby's logoThere is major news in Rubyland today. MacRuby’s team just released their fist beta of version 0.5 (an experimental, still incomplete version of Ruby), which brings JIT, removal of the dreaded GIL (Global Interpreter Lock), native threads, GCD (Grand Central Dispatch) for multicore computing, and a whole new set of features found in the release announcement to the table.

The most important new feature is the presence of a compiler. That’s right, thanks to this release, Ruby code can now become highly optimized executable code. How awesome is that? I can sense that you’re pumped by this news, so why not head over to MacRuby.com and download the installation file for yourself? After you’ve done that, the next thing you’re going to want to do is run a small test like the following:

$ macrubyc world_domination.rb -o world_domination
Can't locate program `llc'

Oh noes! llc is a tool that ships with the LLVM (upon which MacRuby is built), however it’s not included with MacRuby’s installer (it will be in the future). But fear not my friends, there is a solution:

$ svn co -r 82747 https://llvm.org/svn/llvm-project/llvm/trunk llvm-trunk
$ cd llvm-trunk
$ ./configure
$ UNIVERSAL=1 UNIVERSAL_ARCH="i386 x86_64" ENABLE_OPTIMIZED=1 make -j2
$ sudo env UNIVERSAL=1 UNIVERSAL_ARCH="i386 x86_64" ENABLE_OPTIMIZED=1 make install

If your machine does not have 2 cores, remove the -j2 option from the fourth line or adjust the number accordingly.

The compilation phase may take a couple of centuries, depending on your machine’s speed, but it should eventually build the LLVM. :-P llc will be placed in your PATH, and you’ll finally be able to compile Ruby code and obtain an executable to help you carry out your world domination plans.

$ macrubyc world_domination.rb -o world_domination
$ ./world_domination
MUAHAHAHAHA!

Serving Django Static Files through Apache

Django’s development server is capable of serving static (media) files thanks to the view django.views.static.serve. Popular web servers like Apache, Lighttpd or NGINX are much faster though, and as such should be used in production mode. Our goal is to bypass Django and let Apache (or other valid alternatives) directly serve static files like images, videos, CSS, JavaScript files, and so on, for us.

Generally speaking, for performance reasons, it’s advised that you have two different webservers serving your dynamic requests and static files. In practice, for smaller sites, people often opt to simply use one webserver. In this article, I’ll discuss how to serve the static files within your Django project, through Apache.

The first thing we need to do is distinguish between development and production mode. We can do so by simply specifying DEBUG = True (development), or DEBUG = False (production) within our settings.py file.

settings.py may include (among others) the following declarations:

# Absolute path to the project directory
BASE_PATH = os.path.dirname(os.path.abspath(__file__))

# Main URL for the project
BASE_URL = 'http://example.org'

DEBUG = False

# Absolute path to the directory that holds media
MEDIA_ROOT = '%s/media/' % BASE_PATH

# URL that handles the media served from MEDIA_ROOT
MEDIA_URL = '%s/site_media/' % BASE_URL

# URL prefix for admin media -- CSS, JavaScript and images.
ADMIN_MEDIA_PREFIX = "%sadmin/" % MEDIA_URL

*PATH constants indicate paths on your filesystem (e.g., /home/myuser/projects/myproject), while *URL constants indicate the actual URL needed to reach a given page or file.

Notice that it’s not unusual to have a /site_media URL that corresponds to a /media folder. In the example above, I opted to separate regular media files for the project from the standard ones that ship with Django for the admin section. To do this, all we have to do is create a symbolic link as follows:

ln -s /usr/lib/python2.5/site-packages/django/contrib/admin/media /path/to/myproject/media/admin

When you’re in development mode, and DEBUG = True, you want to let Django serve your static files. This can be done by adding the following snippet (or similar) to your urls.py:

if settings.DEBUG:
    urlpatterns += patterns('',
        (r'^site_media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),
    )

In production mode, the code contained within the if clause will not be executed as we’ve set DEBUG to False within settings.py.

From the Django side of things, we are good. We now need to instruct Apache. Within your virtual host file, you can specify something along the lines of:

<VirtualHost *:80>

  #...

  SetHandler python-program
  PythonHandler django.core.handlers.modpython
  SetEnv DJANGO_SETTINGS_MODULE myproject.settings
  PythonDebug On
  PythonAutoReload Off
  PythonPath "['/usr/lib/python2.5/site-packages/django', '/path/to/myproject'] + sys.path"

  #...

  Alias /site_media "/path/to/myproject/media"
  <Location "/site_media">
    SetHandler None
  </Location>
</VirtualHost>

The first group of declarations essentially tells Apache to use mod_python to handle any incoming requests. However, we don’t want Django to deal with static files, so the second group of declarations, aliases/maps the /site_media URL with the actual media directory on the server, and tells Apache to threat it as static content (with SetHandler None) bypassing de facto Django.


Memoization in Ruby and Python

Wikipedia defines memoization as “an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs.”. This typically means caching the returning value of a function in a dictionary of sorts using the parameters passed to the function as a key. This is done in order to reuse that returning value immediately without calculating it again, when the function is invoked with the same arguments. Even though we are trading space for time, it is often invaluable for speeding up certain recursive functions and when dealing with dynamic programming where intermediate calls are often repeated many times.

Using memoization in Ruby is very easy thanks to the memoize gem. The first step to getting started is therefore to install it:

$ sudo gem install memoize
Successfully installed memoize-1.2.3
1 gem installed
Installing ri documentation for memoize-1.2.3...
Installing RDoc documentation for memoize-1.2.3...

Now we can use the memoize method as illustrated in the example below:

require 'rubygems'
require 'memoize'
require 'benchmark'
include Memoize

def fib(n)
  return n if n < 2
  fib(n-1) + fib(n-2)
end

Benchmark.bm(15) do |b|
  b.report("Regular fib:") { fib(35) }
  b.report("Memoized fib:") { memoize(:fib); fib(35)}
end

In the first block we simply invoke fib(35), while in the second one we first invoke the method memoize(:fib) to memoize the method fib. Running this code on my machine prints the following:

                     user     system      total        real
Regular fib:    55.230000   0.160000  55.390000 ( 55.819205)
Memoized fib:    0.000000   0.000000   0.000000 (  0.001305)

We went from almost a minute of run time to an instantaneous execution. Optionally we could even pass a file location to the function memoize and this would use marshaling to dump and load the cached values on/from disk.

For Python we can write a simple decorator that behaves in a similar manner. In its simplest form it can be implemented as follows:

# memoize.py

def memoize(function):
    cache = {}
    def decorated_function(*args):
        try:
            return cache[args]
        except KeyError:
            val = function(*args)
            cache[args] = val
            return val
    return decorated_function

Or more efficiently:

# memoize.py

def memoize(function):
    cache = {}
    def decorated_function(*args):
        if args in cache:
            return cache[args]
        else:
            val = function(*args)
            cache[args] = val
            return val
    return decorated_function

When the memoized function has been invoked, we look in the cache to see if an entry for the given arguments already exist. If it does, we immediately return that value. If not, we call the function, cache the results and return its returning value.

Truth be told, the limit of this approach lies in the fact that since we are using a dictionary, only immutable objects can be used as keys. For example, we can use a tuple but are not allowed to have a list as a parameter. For the example within this article, this approach will suffice, but to take advantage of memoization when using arguments that are mutable, you may want to consider the approach described in this recipe.

We can now rewrite the Ruby example above in Python as follows:

import timeit
from memoize import memoize

def fib1(n):
    if n < 2:
        return n
    else:
        return fib1(n-1) + fib1(n-2)

@memoize
def fib2(n):
    if n < 2:
        return n
    else:
        return fib2(n-1) + fib2(n-2)	

t1 = timeit.Timer("fib1(35)", "from __main__ import fib1")
print t1.timeit(1)
t2 = timeit.Timer("fib2(35)", "from __main__ import fib2")
print t2.timeit(1)

Running this code on my machine prints the following:

9.32223105431
0.000314950942993

In Python 2.5′s case by employing memoization we went from more than nine seconds of run time to an instantaneous result.

Granted we don’t write Fibonacci applications for a living, but the benefits and principles behind these examples still stand and can be applied to everyday programming whenever the opportunity, and above all the need, arises.


Resolving the gray window when running db2setup

You drank the Kool-Aid and downloaded the awesomeness which is DB2 Express-C. Good job! Next you proceed to install it on Linux with sudo ./db2setup and boom, instead of a launchpad all you see is a gray window. Now what?

This problem is a known Java bug (resolved in Java 6) that shows up on Linux distros where Compiz effects are enabled. For example, this problem manifests itself in recent Ubuntu releases, including 8.10, where Compiz is enabled by default.

There are a couple of easy ways to solve this problem though. The first is to temporary disable these effects during the installation and turn them back on when you’ve finished installing. In Ubuntu, you can do this by clicking on the Appearance menu, Visual Effects tab and then selecting None. The second method is to run export AWT_TOOLKIT=MToolkit, before running sudo ./db2setup.

A new setup is in the works to solve this issue, but for the time being, you can use the workarounds above to install DB2 Express-C on Linux.


Integrating TextMate and Pygments

Like many, I don’t use TextMate just for coding. All of my posts are first drafted in my trusty editor before being published. One of the problems that I had, and that others probably face too, is the less than smooth process of publishing properly highlighted code in posts and HTML pages. A few solutions exist, including embedding gist snippets, using “Create HTML from Document” in TextMate, or adopting JavaScript libraries or WP plugins. But when it comes to highlighting code, for me Pygments is simply unbeatable.

Pygments is a Python library but ships as a command line tool as well. However, switching between TextMate and the command line is not as convenient as I’d like. So on the weekend I pulled out my big sharp razor and started yak shaving. The result of that brief session is a hack that delivers the integration of TextMate and Pygments, so that code can be easily converted to HTML in order to beautifully present it.

First, let’s see how I use it. When I select a snippet of Ruby code in TextMate and press ⌃⌥1 a snippet of code is transformed into the proper HTML. ⌃⌥2 is for Python snippets, ⌃⌥3 for any other language, and ⌃⌥4 for any language as well but with the option of adding line numbers. In practice, this means that I use 1 and 2 most of the time and these shortcuts are easy enough to remember. Note that this is not necessarily the best arrangement, but it works well for me. I could, if so inclined, associate all 4 commands to the same shortcut and be prompted by a menu every time this combination is pressed, obtaining something along the lines of the image shown below:

A possible prompt menu for Pygments

Should I ever forget these 4 shortcuts, I can take a quick look at the Text bundle menu shown below. I placed these commands under the Text menu, since they are globally available for textual formats, whether I’m composing HTML, Textile, Markdown or ReST; but this is entirely arbitrary and I suspect that many would consider the HTML menu instead or place a “Convert to HTML” entry in the menu of the specific language.

The Text menu

Ruby and Python deserve their own command because they are the languages whose code I publish the most, but pressing ⌃⌥3 (or 4) prompts a long list of languages to choose from as shown below (the image is cut to reduce its length):

The select a language dialog

The following are a series of steps that you can take to reproduce the same results as mine. The HTML required to present the code nicely in this section was generated from within TextMate. In other words, I’m eating my own dog food.

Step 1: If you haven’t done so already, install Pygments. You can get it from the official site.

Step 2: Within TextMate click on the menu entry: Bundles -> Bundle Editor -> Show Bundle Editor and click on the triangle to open up Text in the left pane.

Step 3: Click on the +- button in the lower left corner of the window and select New Command, then name the command Pygmentize Ruby (assuming that you want a command for Ruby).

Step 4: Ensure that each option for Save, Input, Output and Activation are the same as shown below (click to enlarge):

Step 5: Fill the Command(s) text area with the following code:

#!/usr/bin/env python

import os
import sys
from pygments import highlight
from pygments.lexers import RubyLexer
from pygments.formatters import HtmlFormatter

try:
    code = os.environ['TM_SELECTED_TEXT']
except KeyError:
    sys.exit()

formatter = HtmlFormatter()
print highlight(code, RubyLexer(), formatter)

Step 6: Repeat the process for Pygmentize Python, Pygmentize… and Pygmentize with line numbers… but select a different Activation key equivalent (replace 1 with 2, 3 and 4, respectively).

The command code for Pygmentize Python is as follows:

#!/usr/bin/env python

import os
import sys
from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter

try:
    code = os.environ['TM_SELECTED_TEXT']
except KeyError:
    sys.exit()

formatter = HtmlFormatter()
print highlight(code, PythonLexer(), formatter)

For Pygmentize… use the following:

#!/usr/bin/env python

import os
import sys
from commands import getoutput
from pygments import highlight
from pygments.lexers import get_all_lexers, get_lexer_by_name
from pygments.formatters import HtmlFormatter

try:
    code = os.environ['TM_SELECTED_TEXT']
except KeyError:
    sys.exit()

available_languages = ", ".join(sorted('"'+lex[1][0]+'"' for lex in get_all_lexers()))
chosen_language = getoutput("""echo $(osascript <<'AS'
    tell app "TextMate"
        activate
        choose from list { %(languages)s } \
            with title "Pick a language" \
            with prompt "Select a language"
    end tell
AS)""" % {'languages':available_languages})
os.system("osascript -e 'tell app ""TextMate"" to activate' &>/dev/null &")

lexer = get_lexer_by_name(chosen_language.lower())
formatter = HtmlFormatter() # linenos=False
print highlight(code, lexer, formatter)

And finally for Pygmentize with line numbers… use the almost identical script below:

#!/usr/bin/env python

import os
import sys
from commands import getoutput
from pygments import highlight
from pygments.lexers import get_all_lexers, get_lexer_by_name
from pygments.formatters import HtmlFormatter

try:
    code = os.environ['TM_SELECTED_TEXT']
except KeyError:
    sys.exit()

available_languages = ", ".join(sorted('"'+lex[1][0]+'"' for lex in get_all_lexers()))
chosen_language = getoutput("""echo $(osascript <<'AS'
    tell app "TextMate"
        activate
        choose from list { %(languages)s } \
            with title "Pick a language" \
            with prompt "Select a language"
    end tell
AS)""" % {'languages':available_languages})
os.system("osascript -e 'tell app ""TextMate"" to activate' &>/dev/null &")

lexer = get_lexer_by_name(chosen_language.lower())
formatter = HtmlFormatter(linenos=True)
print highlight(code, lexer, formatter)

Step 7: Click on Text and by dragging and dropping arrange the menu to include the Pygmentize commands as shown below (click to enlarge):

Editing the Text menu

Step 8: At this point everything should work, whether you invoke the commands through a keyboard shortcut or through the Text menu. However, you will need to upload and include a Pygments stylesheet from within your site. To generate a stylesheet run the following from the command line:

pygmentize -S default -f html > pygmentize.css

In the above command, default is the name of the style. For example, the Python code you see in this article is styled with the style pastie (because I globally adopted that stylesheet for this site). For a comparison of the available styles check out this demo page.

Step 9: ????

Step 10: Profit!

I hope these hacked together commands can be useful to others. Feel free to customize them and improve upon them as it suits your needs.

UPDATE: I made a Pygments TextMate Bundle out of this.


This Week in Ruby (May 29, 2008)

This is the 9th episode of This Week in Ruby, please consider subscribing to my feed so as to not miss any weekly installments.

Ruby

Two days ago JRuby 1.1.2 was released. Amongst several bug fixes and improvements, this release is characterized by a focus on performances. Startup time, threading, method calling and YAML symbol parsing have all been drastically improved.

Huw Collingbourne of SapphireSteel, has announced that he’ll be releasing a complete book on Ruby, chapter by chapter, free of charge online. After reading the first chapter, I can attest that it’s excellent. Keep an eye on it, as new chapters get added.

The Pragmatic Programmers put out a series of screencasts for sale. The most relevant series for Ruby programmers is Everyday Active Record. The first two episodes (a half an hour long, each) are out and can be purchased for just $5 a piece. The preview — and Ryan Bates’s reputation — lead me to believe that they are entirely worth their very reasonable sticker price. Speaking of screencasts, a new one about merb-slices was released on Merbunity, check it out if you’re into Merb.

There were two important releases last week, Mack 0.5.5 — which features a new rendering engine with support for Haml and Markaby — and DataMapper 0.9, a major reworking of the ORM. A third release, which is perhaps just as welcomed, was launched by _Why who included a few graphical improvements for Shoes, his GUI application toolkit. Definitely neat stuff, which I invite you to take a look at if you’re working on a Mac.

Peter Cooper published 21 Ruby Tricks You Should Be Using In Your Own Code. You probably know already most of the common ones at least, but they’re quick and fun, so if you haven’t checked out the post yet take a moment and do so. Other must-read tutorials and articles were Ruby && DTrace! (really neat results), Ruby EventMachine – The Speed Demon by one of my favorite Ruby bloggers, and Will’s Guide to Mashing-up Remote Databases using Page Scraping.

In a post made a couple of days ago, Robert Fischer opened up a can of worms by bringing up the issue of Ruby and XML libraries. As most of you know REXML is far from being issue-free (performance in primis), and in The Status of Ruby’s libxml Robert uncovers that the author of LibXml Ruby is unable to actively pursue the development of his extension. This issue concerns me, but if I’m working with databases, I prefer to take advantage of DB2 Express-C ’s fantastic pureXML features, which give me the sort of speed, flexibility and stability that I won’t find in a Ruby library anytime soon.

Before highlighting some of the news from Rails-land, I wanted to inform you that a new version of The Great Ruby Shootout will surface in June, as I intend to test a couple of special new entries.

Rails

Today, RailsConf 2008 started and it certainly stands a great chance of being dubbed an exhilarating event. A few people enquired to see if they could meet me there, but unfortunately I couldn’t make it. Chances are that you’re reading this post from RailsConf. If that’s the case, say hi for me and don’t forget to visit the nice fellas from Engine Yard, Morph (my sponsor), Phusion and GemStone. Oh and also, feel free to pass around the url of this entry. ;)

Rails 2.1 RC1 is out, so you’ll find this article on upgrading to Rails 2.1.0_RC1 useful. Fabio Akita released a new version of his popular tutorials, Rolling with Rails 2.1 (part 1 and part 2). And if you are looking for an advanced authentication/authorization system for Rails 2, take a gander at Lockdown on RubyForge.

My friends at SeeSaw implemented a series of Rails Widgets which can easily be installed as a Rails plugin. Feel free to use them and/or contribute, in order to add further support for simplifying and reusing common UI elements. Speaking of shiny things, check out this Ruby on Rails icon pack; very pleasing to the eye, in my opinion.

RubyInside published a list of 28 mod_rails / Passenger Resources To Help You Deploy Rails Applications Faster. As DHH forecasted, “this could definitely become very popular, very fast ;) ”.

New Relic released their RPM solution for monitoring and improving the performances of Rails applications to the general public. You can get it here.

And finally, some great news just came in, IronRuby is running unmodified Rails. “Excellent” (said in Montgomery Burns’ voice, complete with characteristic hand gesture).


Using Python to detect the most frequent words in a file

Working with Python is nice. Just like Ruby, it usually doesn’t get in the way of my thought process and it comes “with batteries included”. Let’s consider the small task of printing a list of the N most frequent words within a given file:

from string import punctuation
from operator import itemgetter

N = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("test.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)[:N]

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

I won’t provide a step by step explanation of what I believe is already rather understandable. There are however a few tricky considerations to be made on behalf of those who are not too familiar with the language. First and foremost, I love using Generator Expressions because they are lazily evaluated and have a math-like readability. It’s just a very convenient way of crating generator objects. Notice how in the snippet I favor them over the option of placing a whole file into a string by concatenating the read() method to the open() one. Doing so results in a significant performance improvement for large files. Generator Expressions and List Comprehension are extremely useful language features which are inherited from the world of functional programming, and I’m glad that Python fully embraces them.

In the third for loop we count words and add them and their respective frequencies to the words dictionary (similar to a Ruby Hash). Notice how the method get() enabled us to specify a default value before incrementing the counter, in case the given key didn’t exist yet (which means that the word we were adding hadn’t been encountered before). We pass operator.itemgetter() as a keyword argument (another nice Python feature) to the sorted() function. itemgetter() returns a callable object that fetches the given item(s) from its operand which, in our case, essentially means that we can tell sorted() to sort based on the value of the dictionary’s items (the frequency of the words) rather than based on the keys (the words themselves).

Unfortunately there is a problem with this code. It will correctly sort the most popular words in the file, but equally represented words won’t be alphabetically ordered. Given that we specified a reverse order for the sorted() function, we could simply pass it key=itemgetter(1, 0) to order (in descending order) by value first and by key second. But let’s be realistic. In most cases, you want to have these type of keys whose values are equal, be alphabetically ordered (in ascending order). With a few changes to the code, this can be easily achieved:

from string import punctuation

def sort_items(x, y):
    """Sort by value first, and by key (reverted) second."""
    return cmp(x[1], y[1]) or cmp(y[0], x[0])

N = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("test.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(), cmp=sort_items, reverse=True)[:N]

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

Previously we specified what “key” should we use for sorting, while in this case we now have a much greater deal of control. By defining the function sort_items() and passing a pointer to it for the cmp argument of the function sorted(), we get to define how the comparison amongst the items of the dictionary should be carried out. The function that we defined at the beginning of the script will return -1, 0 or 1, depending on how the two key-value pairs compare. The returned value is cmp(x[1], y[1]) or cmp(y[0], x[0]). This may seem complicated but the trick is rather easy. The first part compares the frequencies of the two words and returns 1 or -1 if one is greater than the other. If they are equal, the expression to the left of the or will be 0, therefore the expression on the right of the or will be returned. On the right we compare the keys (the words), but invert the order of the arguments y and x to reverse the effects of the reversed ordering defined in sorted().

Finally, for those who prefer to use a lambda expression, rather than to define a function, we can write the following:

from string import punctuation

N = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("test.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(),
                   cmp=lambda x, y: cmp(x[1], y[1]) or cmp(y[0], x[0]),
                   reverse=True)[:N]

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

Or simplified further by getting rid of reverse=True and using key rather than cmp:

from string import punctuation

N = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("test.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(),
                   key=lambda(word, count): (-count, word))[:N] 

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

Please bear in mind that the code makes a few assumptions so as to keep things simple. As it stands, the script would consider “l’amore” as a single word, and an accidental lack of spaces wouldn’t be accounted for (e.g. “word.Another” would be a single word too). The replace() method can be used to address these sorts of special cases.

Sure, this was a rather trivial example, born from an iPython session, but I think it gives away Python’s expressiveness and flexibility when dealing with problems that, approached in some other languages, would be much more error prone and verbose. Batteries included indeed.


Installing Django with PostgreSQL on Ubuntu

This how-to is essentially the same as my previous one, only this time I’ve provided step-by-step instructions for installing Django with PostgreSQL on Ubuntu 7.10.

First and foremost, we are going to install Django from its svn repository, as opposed to obtaining the 0.96 release archive. The reason for this is that the trunk version implements a few new features. The development code is also rather stable and used by most people in production mode, even for sites like the Washington Post.

Install Subversion

sudo apt-get install subversion

Checkout Django

svn co http://code.djangoproject.com/svn/django/trunk django_trunk

Tell Python where Django is

Ubuntu already ships with Python 2.5.1, thus you won’t have to install it. You can verify this by running python in your shell (use exit() to get out of the python shell). What you need to do is inform Python about the location of your django_trunk directory. To do this create the following file:

/usr/lib/python2.5/site-packages/django.pth

Within this file, place only one line containing the path to your django_trunk folder. In my case, this is:

/home/antonio/django_trunk

Of course, change it to the full path location of the directory on your filesystem.

Add django-admin.py to your PATH

The bin directory within the django folder (which is inside django_trunk itself) contains several management utilities. We need therefore to add the following to the PATH (again, change it to your own location):

/home/antonio/django_trunk/django/bin

How you go about doing this, depends on the shell you are using, and I’m assuming you are able to export a shell variable on your own. In case you are using the bash shell (as I do) you could export it in .bashrc. Alternatively, you could just create a symlink to the utility django-admin.py in /usr/bin, but I recommend the former approach.

Install PostgreSQL and Psycopg2

sudo apt-get install postgresql pgadmin3 python-psycopg2

This will install PostgreSQL 8.2.5, PgAdmin III and the driver Psycopg2 for you. Most people at this point will ask, what’s the default password for PostgreSQL on Ubuntu? You can use the following instructions to set the password for the user postgres both in Ubuntu and within PostgreSQL:

sudo su -
passwd postgres
su postgres
psql template1

The last instruction should open the psql shell, where you can run the following:

ALTER USER postgres WITH ENCRYPTED PASSWORD 'mypassword';

Verify the installation

You should be all set now, but let’s verify this right away. Open the shell and run the following instructions inside the python shell (start off with the python command).

>>> import django
>>> print django.VERSION
(0, 97, 'pre')
>>> import psycopg2
>>> psycopg2.apilevel
'2.0'

By running exit() get out of the python shell, and verify that django-admin.py is in your path:

django-admin.py
Type 'django-admin.py help' for usage.

If you obtain a similar output for all three of them, you are really set.

Where to go from here

Now that Django is installed, you can go read the Django Book 1.0 that’s available for free online. Something equally well done and useful is really missing from the Rails community. Above all, experiment, Django (and programming in general) is learnt by doing. The Definitive Guide to Django: Web Development Done Right is also available for purchase in its deadtree version, which just came out. It’s cheap and it’s already a best seller on Amazon. Despite the availably of a free version online, I like having paper versions of tech books so that I can read without staring at the monitor. Furthermore, I feel like rewarding the authors (who are also the framework creators), while encouraging publishing companies that are willing to allow authors to make their books available for free on the web. Well done guys!


How to install Django with MySQL on Mac OS X

Installing Django on Mac OS X Leopard is supposed to be very straightforward, but if you are new to it, you may encounter a few puzzling questions and, in the case of MySQL, even a couple of headaches. I’m writing about this for the benefit of those of you who may attempt and struggle with this feat. MacPorts is not required for this how-to.

First and foremost, we are going to install Django from its svn repository, as opposed to obtaining the 0.96 release archive. The reason for this is that the trunk version implements a few new features. The development code is also rather stable and used by most people in production mode, even for sites like the Washington Post.

Checkout Django

svn co http://code.djangoproject.com/svn/django/trunk django_trunk

Tell Python where Django is

Mac OS X 10.5 already ships with Python 2.5.1, thus you won’t have to install it. You can verify this by running python in the Terminal (use exit() to get out of the python shell). What you need to do is inform Python about the location of your django_trunk directory. To do this create the following file:

/Library/Python/2.5/site-packages/django.pth

Within this file, place only one line containing the path to your django_trunk folder. In my case, this is:

/Users/Antonio/Code/django_trunk

Of course, change it to the full path location of the directory on your filesystem.

Add django-admin.py to your PATH

The bin directory within the django folder (which is inside django_trunk itself) contains several management utilities. We need therefore to add the following to the PATH (again, change it to your own location):

/Users/Antonio/Code/django_trunk/django/bin

How you go about doing this, depends on the shell you are using, and I’m assuming you are able to export a shell variable on your own. In case you are using the bash shell (as I do) then you should have a .profile file in your home directory. Alternatively, you could just create a symlink to the utility django-admin.py in /usr/bin, but I recommend the former approach.

Grab and install MySQL

I would normally recommend PostgreSQL, at least until we have DB2 on Mac, but I realize that many of you use and prefer MySQL, which also seems to be the only one that requires special instructions due to a few installation issues when trying to get MySQL and Python to work together. You can install MySQL by grabbing and running one of the packages that are available on the official site. Choose the one for x86 and Mac OS X 10.4.

Install the MySQLdb driver

Get MySQL-python-1.2.2.tar.gz from SourceForge. Please follow these exact instructions because the source code won’t compile out of the box and will give you the following error when trying to build it:

/usr/include/sys/types.h:92: error: duplicate 'unsigned'
/usr/include/sys/types.h:92: error: two or more data types
in declaration specifiers
error: Setup script exited with error: command 'gcc' failed

Run the following:

tar xvfz MySQL-python-1.2.2.tar.gz
cd MySQL-python-1.2.2

At this point, edit the _mysql.c file and comment out lines 37, 38 and 39 as follows:

//#ifndef uint
//#define uint unsigned int
//#endif

Now, from the MySQL-python-1.2.2 folder run:

python setup.py build
sudo python setup.py install

If you still get an error (and only in that case) you’ll need to edit the site.cfg file within the same folder and set threadsafe = False, before running the two commands above once again.
If instead, you don’t receive an error but you see warnings about files not required on this architecture, don’t be concerned about them. The last step required is to create a symbolic link with the following command:

sudo ln -s /usr/local/mysql/lib/ /usr/local/mysql/lib/mysql

All these adjustments are required because we are building and installing the driver on Mac and not on Linux.

Verify the installation

You should be all set now, but let’s verify this right away. Open Terminal and run the following commands in the python shell (start this with the python command).

Verify that MySQLdb is correctly installed:

>>> import MySQLdb
>>> MySQLdb.apilevel
'2.0'

Now, verify that Django is working:

>>> import django
>>> print django.VERSION
(0, 97, 'pre')

By running exit() get out of the python shell, and verify that django-admin.py is in your path:

django-admin.py
Type 'django-admin.py help' for usage.

If you obtain a similar output for all three of them, you are really set to write the next YouTube.

Where to go from here

Now that Django is installed, you can go read the Django Book 1.0 that’s available for free online. Something equally well done and useful is really missing from the Rails community. Above all, experiment, Django (and programming in general) is learnt by doing. The Definitive Guide to Django: Web Development Done Right is also available for purchase in its deadtree version, which just came out. It’s cheap and it’s already a best seller on Amazon. Despite the availably of a free version online, I like having paper versions of tech books so that I can read without staring at the monitor. Furthermore, I feel like rewarding the authors (who are also the framework creators), while encouraging publishing companies that are willing to allow authors to make their books available for free on the web. Well done guys!


Installing DB2 9.5 on Ubuntu 7.10

The official IBM site now has DB2 Express-C 9.5 available for download. Like its previous versions it’s entirely free, so why don’t you give it a try?

When running db2setup, after extracting the archive you downloaded, you may be disappointed to see that on Ubuntu 7.10 the setup program gives you the following error right out of the box:

ERROR:
 The required library file libstdc++.so.5 is not found on the system.
ERROR:
 The required library file libaio.so.1 is not found on the system.
 Check the following web site for the up-to-date system requirements
 of IBM DB2 9.5

http://www.ibm.com/software/data/db2/udb/sysreqs.html

http://www.software.ibm.com/data/db2/linux/validate

/home/antonio/Desktop/exp/db2/linux/install/../bin/db2usrinf:
error while loading shared libraries: libstdc++.so.5:
cannot open shared object file: No such file or directory
[: 609: 0: unexpected operator
/home/antonio/Desktop/exp/db2/linux/install/../bin/db2langdir:
error while loading shared libraries: libstdc++.so.5:
cannot open shared object file: No such file or directory
/home/antonio/Desktop/exp/db2/linux/install/../bin/db2langdir:
error while loading shared libraries: libstdc++.so.5:
cannot open shared object file: No such file or directory
DBI1055E The message file db2install.cat cannot be found.

Explanation:  The message file required by this
script is missing from the system; it may have been
deleted or the database products may have been loaded
incorrectly.

User Response:  Verify that the product option containing
the message file is installed correctly.  If there are
verification errors; reinstall the product option.


Here is a handy tip, just in case you get stuck with this error. You can install the prerequisites and run the DB2 setup program simply by executing the three commands below (in order).

sudo apt-get install libstdc++5
sudo apt-get install libaio-dev
sudo ./db2setup


I hope this quick tip saves you some time and a headache.


Redirecting Atom feeds from Typo and WordPress to FeedBurner

With the recent switch from Typo to WordPress I had to address the issue of handling my existing and new feeds. On top of that I decided that it was the right moment to switch to FeedBurner, so I had to handle the redirect for both Typo and WordPress at the same time. If you prefer Atom 1.0 over RSS 2.0 (you should), this brief post will tell you how to migrate to FeedBurner and Atom.

Where is your Atom feed?

When you join FeedBurner, you may want to create 2 feeds, one for your posts and another one for the comments. When you do so ensure to provide FeedBurner with your Atom 1.0 feeds for both of them. Typo and WordPress use RSS as their default feed format (auto-discovered), so you may wonder where the Atom feeds are located. Use the following:

  • Typo: http://yourdomain.com/xml/atom/feed.xml and http://yourdomain.com/xml/comments/atoms/comments/feed.xml
  • WordPress: http://yourdomain.com/wp-atom.php and http://yourdomain.com/comments/feed/atom/

If you already have feeds at FeedBurner, you can always edit them in order to add the Atom URLs. This will provide your readers with an Atom feed served directly from FeedBurner, but you are still left with a problem. Existing subscribers, those who arrive to your blog and find links to the RSS feed, and readers who know the URL of Typo/WordPress Atom feed, will bypass FeedBurner. One of the main reasons for using FeedBurner in the first place is to access statistics about your readership, so you want all your subscribers to use FeedBurner.

Using Typo

mod_rewrite comes to the rescue and with a few changes to your main .htaccess file, both RSS and Atom feeds will redirect to FeedBurner. If you are using Typo, your users are subscribing to one of the following feeds:

  • http://yourdomain.com/xml/atom/feed.xml
  • http://yourdomain.com/xml/rss/feed.xml
  • http://yourdomain.com/xml/rss20/feed.xml

You will then need to insert in your .htaccess located in the public_html directory, the following:

RewriteCond %{HTTP_USER_AGENT} !^FeedBurner.*$
RewriteRule ^xml/(atom|rss|rss20)/feed.xml$ http://feeds.feedburner.com/YourSite [R=301,L]



Using WordPress

With WordPress the whole process is simplified by using a plugin called FeedBurner FeedSmith. Once you have installed it (by copying it over the wp-content/plugin directory) and activated it from the Plugins page, you will have to fill in the details of your feeds (available at the Options -> FeedBurner page) as shown in the figure below.

FeedBurner option page

This plugin is handy because it does all the dirty work for you, but should you want to handle this from .htaccess, you can do so by just adapting the technique used above for Typo.

Switching from Typo to WordPress

If you are switching from Typo to WordPress and decide to adopt FeedBurner, you can combine the two above so that both existing and new users obtain the right feeds for your articles and comments. It is however unlikely that you want to create a feed at FeedBurner for each category and tag existing in your blog. It is very likely that you are going to adopt a different URL structure in WordPress from Typo’s default, and therefore a few extra redirects are in order. The following example assumes that you went from Typo to WordPress, and that you configured FeedBurner FeedSmith:

# Redirects Typo tags to WordPress Categories
RewriteRule ^articles/tag/(.*)$ /category/$1 [R=301,L]
# Redirects Typo permalinks for articles and pages
RewriteRule ^pages|articles/(.*)$ /$1 [R=301,L]
# Redirects Posts and Articles to FeedBurner
RewriteRule ^xml/atom|rss|rss20/feed.xml$ http://feeds.feedburner.com/YourSite [R=301,L]
RewriteRule ^xml/atom|rss|rss20/comments/feed.xml$ http://feeds.feedburner.com/CommentsYourSite [R=301,L]
# Redirects Typo feeds for tags and categories to WordPress category feeds
RewriteRule ^xml/atom|rss|rss20/category|tag/(.*)/feed.xml$ /category/$1/feed [R=301,L]


As you can see the first two lines of code (excluding comments) take care of redirecting the existing links to articles, pages and tags. The following two lines redirect the old Typo feeds to FeedBurner, and finally the last line redirects the feeds for the tags and categories in Typo to their respective category feed in WordPress.


Ruby and Rails books, Textmate and FastRi

  • Ruby and Rails books keep popping up on the (virtual and real) shelves, which means that it may be slightly puzzling for newcomers to decide which books to spend their hard earned cash on. In the spirit of providing guidance in this process, I’ve prepared the Recommended Books for Ruby and Rails page. The recommendations are organized by skill level and should provide the reader with a nice logical sequence of increasingly more challenging reading material, making the task of identifying valid books easier for the new developer or student.
  • The more I become acquainted with Textmate, the more I’m impressed by such a nice little editor. I’ve had next to no time to play with it, and yet I’ve already added new functionalities to some of the existing bundles. In particular, I’ve committed a patch that extends the Textile bundle features, and I’ve become the maintainer of that bundle, with SVN access. Textmate bundles are very easy to customize and extend, and this flexibility is really appreciated from a development standpoint.
  • ri is an indispensable tool for the Ruby programmer, however you may have noticed that it’s a bit sluggish. If you haven’t done it already, do yourself a big favor and install the FastRi gem. FastRi provides various enhancements and advanced functionalities over the standard ri tool. But even when it’s used simply as a replacement for ri locally, it is significantly faster, and in my opinion more practical. Also, if you look for String#o for example, ri provides you with all the methods of the class String, that contain the letter ‘o’. While FastRi outputs only the methods starting with the letter ‘o’, which is what you actually want in most cases. Install with: $ sudo gem install fastri and $sudo fastri-server -b . Then use it in this way (qri is used for stand-alone mode, fri to connect to a remote server): $ qri String#scan.

Top 10 Ruby on Rails performance tips

Please note that this article is now obsolete.

The performance of Ruby on Rails is influenced by many factors, particularly the configuration of your deployment server(s). However the application code can make a big difference and determine whether your site is slow or highly responsive. This short article is about some of the tips and best coding practices to improve performances in Rails only, and won’t attempt to cover the server configuration improvements for the various deployments options.

  1. Optimize your Ruby code: this may seem obvious, but a Rails application is essentially ruby code that will have to be run. Make sure your code is efficient from a Ruby standpoint. Take a look at your code and ask yourself if some refactoring is in order, keeping in mind performance considerations and algorithmic efficiency. Profiling tools are, of course, very helpful in identifying slow code, but the following are some general considerations (some of them may appear admittedly obvious to you):
    • When available use the built-in classes and methods, rather than rolling your own;
    • Use Regular Expressions rather than costly loops, when you need to parse and process all but the smallest text;
    • Use Libxml rather than the slower REXML if you are processing XML documents;
    • Sometimes you may want to trade off just a bit of elegance and abstraction for speed (e.g. define_method and yield can be costly);
    • The best way to resolve slow loops, is to remove them if possible. Not always, but in a few cases you can avoid loops by restructuring your code;
    • Simplify and reduce nested if/unless as much as you can and remember that the operator ||= is your friend;
    • Hashes are expensive data structures. Consider storing the value for a given key in a local variable if you need to recall the value a few times. More in general, it’s a good idea to store in a variable (local, instance or class variable) any frequently accessed data structure.
  2. Caching is good: caching can significantly speed up your application. In particular:
  3. Use your database to the full extent of the law :) : don’t be afraid of using the cool features provided by your database, even if they are not directly supported by Rails and doing so means bypassing ActiveRecord. For example define stored procedures and functions, knowing that you can use them by communicating directly with the database through driver calls, rather than ActiveRecord high level methods. This can hugely improve the performance of a data bound Rails application.
  4. Finders are great but be careful: finders are very pleasant to use, enable you to write readable code and they don’t require in-depth SQL knowledge. But the nice high level abstraction come with a computational cost. Follow these rules of thumb:
    • Retrieve only the information that you need. A lot of execution time can be wasted by running selects for data that is not really needed. When using the various finders make sure to provide the right options to select only the fields required (:select), and if you only need a numbered subset of records from the resultset, opportunely specify a limit (with the :limit and :o ffset options).
    • Don’t kill your database with too many queries, use eager loading of associations through the include option:
      # This will generates only one query,
      # rather than Post.count + 1 queries
      for post in Post.find(:all,
                            :include => [ :author, :comments ])
        # Do something with post
      end
    • Avoid dynamic finders like MyModel.find_by_*. While using something like User.find_by_username is very readable and easy, it also can cost you a lot. In fact, ActiveRecord dynamically generates these methods within method_missing and this can be quite slow. In fact, once the method is defined and invoked, the mapping with the model attribute (username in our example) is ultimately achieved through a select query which is built before being sent to the database. Using MyModel.find_by_sql directly, or even MyModel.find, is much more efficient;
    • Be sure to use MyModel.find_by_sql whenever you need to run an optimized SQL query. Needless to say, even if the final SQL statement ends up being the same, find_by_sql is more efficient than the equivalent find (no need to build the actual SQL string from the various option passed to the method). If you are building a plugin that needs to be cross-platform though, verify that the SQL queries will run on all Rails supported databases, or just use find instead. In general, using find is more readable and leads to better maintainable code, so before starting to fill your application with find_by_sql, do some profiling and individuate slow queries which may need to be customized and optimized manually.
  5. Group operations in a transaction: ActiveRecord wraps the creation or update of a record in a single transaction. Multiple inserts will then generate many transactions (one for each insert). Grouping multiple inserts in one single transaction will speed things up.

    Insead of:

     my_collection.each do |q|
       Quote.create({:phrase => q})
     end

    Use:

    Quote.transaction do
     my_collection.each do |q|
       Quote.create({:phrase => q})
     end
    end

    or for rolling back the whole transaction if any insert fails, use:

    Quote.transaction do
     my_collection.each do |q|
       quote = Quote.new({:phrase => q})
       quote.save!
     end
    end
  6. Control your controllers: filters are expensive, don’t abuse them. Also, don’t overuse too many instance variables that are not actually required by your views (they are not light).
  7. Use HTML for your views: in your view templates don’t overuse helpers. Every time you use form helpers you are introducing an extra step. Do you really need a helper to write the HTML for a link, a textbox or a form for you? (You may even make your designer, who doesn’t know Ruby, happy!)
  8. Logging: configure your applications so that they log only the information that is absolutely vital to you. Logging is an expensive operation and an inappropriate level (e.g. Logger::DEBUG) can cripple your production application.
  9. Patch the GC: OK, not really a coding issue, but patching Ruby’s Garbage Collection is strongly advised and will improve the speed of your Ruby and Rails applications significantly.
  10. A final note:I don’t advocate premature optimization, but if you can, work on your code with these principles in mind (but don’t overdo it either). Last minute changes and tweaks are possible but less desirable than a “performance aware” style of coding. Profile your applications, benchmark them
    and have fun experimenting.

Typo errors on postponed articles

This is a tip that I’m writing in the hopes of helping Typo users who will google for the following error messages uninitialized constant ActionController::TestRequest or undefined method `publish!’ for nil:NilClass.

While publishing an article on db2onrails.com, I’ve selected a future publishing date for the post (just to round it off to the half an hour mark). I’ve performed this operation in the past without encountering any problems, but for some bizarre reason (most likely a Typo 4.0.3 bug) it didn’t work this time. The article wouldn’t show up and the /admin section would raise the typical Application Error (Rails). There was obviously some kind of issue with the application and the only reason why the homepage was working fine is because its content was cached.

At this point, I restarted fcgi, reloaded the homepage and boom: uninitialized constant ActionController::TestRequest. If you go to your PhpMyAdmin or access your database by any other means, you should drop the ‘offending’ article that you intended to publish (you can find it in the contents table). If you reload your application now, you should see another error: undefined method `publish!’ for nil:NilClass. This is due to the fact that in the table triggers there is a publishing action waiting to be executed on an article that no longer exists.

Please follow the instructions below as a quick fix to the problem, if you encountered these errors in Typo just after you did a post dated in the future:

  1. Drop the problematic article from the contents table. You should first take note of its id and make a copy of its body if you don’t have it already (as you are going to publish it again in a few moments).
  2. Drop the corresponding record from the triggers table. You’ll recognize the raw by the pending_item_id (which is the same number as the id for the record you just deleted in contents) and the trigger_method field set for publishing.

Your blog should be back up and you’ll be able to access the /admin section as well. You can now repost the initial message, just make sure not to set a future date and time again. ;-)


A step by step guide on how to install Django with PostgreSQL on Windows

This is a step by step tip about how to install Django with PostgreSQL on Windows. The links below point directly to the downloads, so the whole procedure should be extremely fast and easy.

  1. Install Python 2.5
  2. Install PostgreSQL 8.1.4
  3. Install the eGenix MX Base package
  4. Extract win-psycopg25.zip
    and copy libpq.dll and psycopg.pyd to c:\python25\DLLs (assuming c:\python25 is where you installed Python).
    You can test that the installation was successful by firing up python and trying: import psycopg. If you are not prompted with an error message, it simply worked.
  5. Get Django 0.95 and uncompress the tar.gz file. If you have problems with this, please use 7-Zip.
  6. If you don’t have setuptools already installed, the Django installation will download and install it for you.
    However the version that Django will attempt to download doesn’t exist on the server,
    therefore we need to specify a different one (the latest).

    Run:

    python ez_setup.py -U setuptools

    within the Django directory and this will install the latest setuptools for you. You can now go to step 7.

    Alternatively if you wish to install setuptools and Django in one step, open in your favorite editor, ez_setup.py located within the Django folder that you just uncompressed,

    and replace the line:


    DEFAULT_VERSION = “0.6c1”

    with

    DEFAULT_VERSION = “0.6c3”

    Now, if the Windows box you’re using has an Internet connection, you’re settle; otherwise just download this file manually into the Django folder.

  7. At this point, open a command prompt and run:



    python setup.py install

NOTE: While it is not recommended, if you wish to use Psycopg2 for some reason, you can skip steps 3 and 4, and use this installer instead.
Then you will need to specify in settings.py: DATABASE_ENGINE = ‘postgresql_psycopg2’ rather than ’postgresql’.


Typo and Lucid theme error

The Typo theme ‘Lucid’ seems to generate an error when used with the latest version (trunk) of Typo. In order to solve this, simply replace the following code in your /typo/themes/lucid/layouts/default.rhtml file:

<%= render_component(:controller => 'sidebars/sidebar',
:action => 'display_plugins') %>

with

<% benchmark "BENCHMARK: layout/sidebars" do %>
  <%= render_sidebars %>
<% end %>

This should fix it. ;-)


How to parse decimal numbers within a string

INPUT: a string containing decimal numbers.

OUTPUT: an array containing all the decimal numbers within the given string.

You can accomplish this task very quickly with the String#scan method and the right Regular Expression (regex).

Given a string s, you can use:

numbers = s.scan /[-+]?\d*\.?\d+/

numbers will be an array whose elements are the decimal numbers within the string s. Note how the regex considers the possible + or – signs in front of the numbers.

If you also wish to match floating point numbers with exponents (scientific notation, e.g. 2.54.e-07), then use the following:

numbers = s.scan /[-+]?\d*\.?\d+([eE][-+]?\d+)?/ 

« Previous Entries

Copyright © 2005-2010 Antonio Cangiano. All rights reserved.