Monday, May 25, 2009

Quicker Rails Seed Data Loading

The word is that Rails 3.0 will feature a way to load seed data. This is sure to be a handy and needed feature, however, when loading large amounts of seed data you are probably going to have to abandon ActiveRecord and / or fixture style loading in your db/seeds.rb file in order to get the kind of performance you want. Recently I set up a way to load seed data for a Rails 2.2.2 project which exploits the "LOAD DATA INFILE..." command that MySQL provides. This cut the data loading time for my particular data set from over 15 minutes (I gave up waiting) to less than 30 seconds. This technique is likely to remain relevant for your future db/seed.rb files. The following rake tasks set up my Rails application to load ".psv" and ".yml" seed data files from the db/seed directory. The ".yml" files are normal YAML fixture files and are loaded via my rake tasks using an ordinary fixture technique. The ".psv" files are pipe separated files which are loaded use the above mentioned "LOAD DATA INFILE..." command. The way I have this set up here the order of the columns in your ".psv" needs to match the column order in your database so you may want to tweak this code a bit and provide parameters to specify a different column order. In other words, YMMV, anyway, code follows:

One thing to look at carefully when using this technique is what your database product does with "empty" values in your data set. MySQL didn't want to seem to let a empty numeric value be NULL which caused me to go with some data massaging before and / or after load.