Tuesday, May 26, 2009

Rails ActsAsList - Initializing position Values

Recently I worked on a project that had a model acting as a tree with the children acting as a list within the scope of the child's parent. The old school ActsAsTree and ActsAsList plugins used in combination, handle this scenario nicely. There was one problem however - we had a large "tree" to load in where only some of the children were ordered under their parent. The rest of the children had a NULL value sitting in the position column. Because ActsAsList works by setting "position = position + 1 WHERE position >= x" when insert_at(x) is called this wasn't going to work with the data as is. "NULL = NULL + 1" just doesn't seem to work out too well. I decided to pre-process the data putting the children in their natural position order. In this case the natural position order would be by the provided position with a secondary sort on short_name if the position was not given. Also, when a child list contains both numeric positions and NULLs the NULL children should go to the bottom. Given the number of rows I was processing I knew this was going to take some time so I decided to see how this would perform when implemented as a MySQL stored procedure. Here is the code I came up with shown in the migration that creates the stored proc:

class AddSetPositionsStoredProc < ActiveRecord::Migration
STORED_PROC_NAME = 'set_positions'
CREATE_STORED_PROC_SQL = %Q{
CREATE PROCEDURE #{STORED_PROC_NAME}()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE pid INT;
DECLARE curs CURSOR FOR SELECT DISTINCT parent_id FROM units;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN curs;
REPEAT
FETCH curs INTO pid;
IF NOT done THEN
SET @rownum := 0;
UPDATE units SET position = @rownum := @rownum + 1
WHERE parent_id = pid
ORDER BY CASE WHEN position IS NULL THEN 1 ELSE 0 END, position, short_name;
END IF;
UNTIL done END REPEAT;
CLOSE curs;
SET @rownum := 0;
UPDATE units SET position = @rownum := @rownum + 1
WHERE parent_id IS NULL
ORDER BY CASE WHEN position IS NULL THEN 1 ELSE 0 END, position, short_name;
END
}
def self.up
ActiveRecord::Base.connection.execute(CREATE_STORED_PROC_SQL)
end
def self.down
sql = "DROP PROCEDURE #{STORED_PROC_NAME}"
ActiveRecord::Base.connection.execute(sql)
end
end
view raw gistfile1.rb hosted with ❤ by GitHub


And here is the rake task to call it:

namespace :data do
desc 'Set positions per parent_id, position, short_name via stored procedure'
task :sp_set_positions => :environment do
sql = 'CALL set_positions()'
puts sql
ActiveRecord::Base.connection.execute(sql)
end
end
view raw gistfile1.rb hosted with ❤ by GitHub


The stored procedure took just over a minute to do the renumbering. After implementing I decided to compare this to a straight SQL version that would be portable to different databases - but not as fast. I tried to avoid an ActiveRecord tax here and did the SQL in raw execute statements. Here is the rake task:

namespace :data do
# on my slow machine takes 5m09.209s
desc 'Set positions per parent_id, position, short_name via SQL'
task :sql_set_positions => :environment do
parents_sql = 'SELECT DISTINCT parent_id FROM units'
children_sql = %q{
SELECT
id FROM units
WHERE
parent_id = ?
ORDER BY
CASE WHEN position IS NULL THEN 1 ELSE 0 END, position, short_name
}
update_position_sql = 'UPDATE units SET position = ? WHERE id = ?'
parents_rs = Unit.connection.execute(parents_sql)
parents_rs.each do |parent|
parent_id = parent.first
sql = Unit.send(:sanitize_sql, [children_sql, parent_id])
children_rs = Unit.connection.execute(sql)
position = 1
children_rs.each do |child|
sql = Unit.send(:sanitize_sql, [update_position_sql, position,
child.first])
Unit.connection.execute(sql)
position += 1
end
end
# WHERE parent_id = NULL doesn't work so handle separately
sql = 'SELECT id FROM units WHERE parent_id IS NULL'
children_rs = Unit.connection.execute(sql)
position = 1
children_rs.each do |child|
sql = Unit.send(:sanitize_sql, [update_position_sql, position,
child.first])
Unit.connection.execute(sql)
position += 1
end
end
end
view raw gistfile1.rb hosted with ❤ by GitHub


The final results showed the stored procedure technique to be about 5 times faster than the SQL technique.