Importing Legacy Data to Your Rails Database

Comments May 02, 2008

A while back, I faced the need to migrate some old data out of a simple web application done by another group within my company. The app was designed on the quick, and basically just dumped all input into one table, without much (any?) sanity-checking. As I sat down with my trusty text editor, ready to whip out a few lines of Perl as I’d done so many times before to quickly scrub and load this data into its appropriate place within its new home, it occurred to me that I was just about to reimplement a bunch of validations and transformations I’d already taken care of once in Ruby. Yes, I had temporarily taken leave of my senses. There had to be a better way, and there was.

The Running Man

Every Rails application has a file called script/runner which can be called from a command line (or a command scheduler, like cron) and can run whatever Ruby code you supply within the context of your application, with full access to your application’s models. If you place your desired code in a separate script, all that is required to run it is something like:

    script/runner -e <environmentname> "load 'filename.rb'"

Putting the pieces together

This is all good for inserting the data into the new database, but we will still need a way to get the data from the existing database. ActiveRecord is easy enough to bend to our will – just toss something like the following into the top of the file you load with script/runner:

    class LegacyObject < ActiveRecord::Base
      set_table_name "goofy_table_name_from_legacy_schema"
      establish_connection(
        :adapter => "mysql",  # Or whatever
        :host => "legacydatabase.company.com",
        :username => "username",
        :password => "ICanHasSekrit",
        :database => "legacy_database_name"
      )
    # ...
    end

Hiding your naughty bits

You’ll pick up all the usual autogenerated methods from ActiveRecord for each column in the table (keep in mind the methods are generated to match the case of the columns in the database, if your DBAs love their mixed case), and then be able to set up any special handling with overrides or virtual attributes. This is often where data migration can get messy, and it’s really nice to be able to tuck this stuff away inside your models.

For instance, in my class, I defined alternate reader methods for first and last names, since there was no validation in the old application, and I wanted to insert obvious placeholders to pass validation in my new app, like “(blank)”. I added a few convenience methods for use in conditional expressions as well. You’ll note all column names were uppercase in the legacy table, hence the goofy-looking attributes.

      def first_name
        if self.NAME1.blank?
          "(blank)"
        else
          self.NAME1
        end
      end
    
      def last_name
        if self.NAME2.blank?
          "(blank)"
        else
          self.NAME2
        end
      end
    
      def site_num_invalid?
        self.SITE.split(':')[0].to_i.zero?
      end

Define whatever simplifies the job of munging the data into a format you can work with. After that, you can just use the standard ActiveRecord methods to move data from the model representing your legacy data to your shiny new Rails app, with all of your data scrubbing and munging nicely tucked away in models.

Ernie Miller

No, I don't work in NYC, DC, or the valley, and I'm cool with that.

Importing Legacy Data to Your Rails Database

The Running Man

Putting the pieces together

Hiding your naughty bits