Tuesday, February 03, 2009

Backup Blogger Blog with Ruby Mechanize Script

After writing up a cURL script to download a backup of my Blogger blog in my previous post I started wondering what the best way to do this in Ruby would be. I looked at some Ruby HTTP clients and narrowed my choices down to two: Mechanize and HTTPClient. Both of these tools look pretty good but Mechanize seemed to offer a very simple interface to scripting a web interaction like this while HTTPClient would offer something that was more like my original cURL script. So I decided to write the same code with a Mechanize approach:

#!/usr/bin/env ruby
%w(rubygems mechanize).each {|l| require l}
fail "Usage is: #{File.basename($0)} login password" if ARGV.length != 2
login, password = *ARGV
agent = WWW::Mechanize.new
# grab the page with the login form in it
start_page = agent.get('https://www.blogger.com/start')
# grab the login form and fill in the credentials
login_form = start_page.form('login')
login_form.Email = login
login_form.Passwd = password
# login
agent.submit(login_form, login_form.buttons.first)
# now get the home page, follow links to the download, puts to stdout
home_page = agent.get('http://blogger.com/home')
settings_page = agent.click(home_page.link_with(:text => 'Settings'))
export_page = agent.click(settings_page.link_with(:text => 'Export blog'))
xml_page = agent.click(export_page.link_with(:text => 'Download Blog'))
puts xml_page.body
view raw gistfile1.rb hosted with ❤ by GitHub


I need to beef this up with some error handling obviously - but I think you'll find Mechanize to be pretty slick if you try it.