Monday, March 30, 2009

Rails Caching and MySQL GROUP_CONCAT

A recent caching solution that we came up with involved using the hash of the concatenated "children" ids of another resource to identify when the identical and previously stored children payload could be delivered instead of constructing the payload from scratch (which is kind of expensive). Our current solution uses action caching and a custom ActionController::Caching::Fragments::FileStore. In our custom FileStore we override the real_file_path(name) method to use this "children signature" as part of the name. The class looks something like this:

class DsesFileStore < ActionController::Caching::Fragments::FileStore
private
def real_file_path(name)
# bunch of code removed that parses name pulling parameters out of the URL
# path, uic, date ... retrieves the unit ...
children_sig = ids_to_signature(unit.children_ids(date))
name = "#{path}/#{uic}/#{children_sig}/children.#{extension}"
# the "*" in a uic won't work as a filename character
super(name).gsub('*', '$')
end
def ids_to_signature(ids)
Digest::SHA1.hexdigest(ids.join(':'))
end
end
view raw gistfile1.rb hosted with ❤ by GitHub


This seems to work well and the next step is likely to look at using the children_sig value for the ETag as well.

One interesting thing that came out of this investigation was that I wondered if I could produce the hash using straight SQL - should be much faster right? Turns out that with MySQL there is a built in aggregate function that concatenates a column and it is called GROUP_CONCAT. To generate that hash in the database with MySQL the following will work:

SELECT MD5(GROUP_CONCAT(id ORDER BY id SEPARATOR ':'))
FROM unit_dimension
WHERE parent_uic = ? AND ? BETWEEN effective_date AND expiration_date
GROUP BY id
view raw gistfile1.sql hosted with ❤ by GitHub


Pretty cool! I'm not sure which other database products support this as a built in aggregate function but it doesn't appear that SQL Server does.

I found the following recent blog posts talking about other uses for GROUP_CONCAT as well:

http://tempe.st/2009/03/the-thrill-of-a-new-technology-couchdb/
http://www.christianmontoya.com/2007/09/14/mysql-group_concat-this-query-is-insane/
http://db4free.blogspot.com/2006/01/hail-to-groupconcat.html