David Burger's Courtesy Flush: blogger

Tuesday, February 03, 2009

Backup Blogger Blog with Ruby Mechanize Script

After writing up a cURL script to download a backup of my Blogger blog in my previous post I started wondering what the best way to do this in Ruby would be. I looked at some Ruby HTTP clients and narrowed my choices down to two: Mechanize and HTTPClient. Both of these tools look pretty good but Mechanize seemed to offer a very simple interface to scripting a web interaction like this while HTTPClient would offer something that was more like my original cURL script. So I decided to write the same code with a Mechanize approach:

I need to beef this up with some error handling obviously - but I think you'll find Mechanize to be pretty slick if you try it.

Sunday, February 01, 2009

Backup Blogger Blog with cURL Script

I read Dave Winer's Where's Your Data? post today which was in response to a blog post by Craig Burton title The State of Blogging Sucks. These articles refer to the problems that may arise when someone else is hosting your data. The problem could be a massive data loss (see Ma.gnolia), a company going out of business (apparently a Userland product?), or just data lock in.

With that in mind I thought about my blog here. This is primarily a dumping ground for small but handy techniques I learn so that if I figure out a way to do something - I can quickly google for it and use it again later. At this point there are enough helpful hints in here that I would hate to lose it. Blogger provides a way to export your blog as XML but it involves logging in and clicking a button - the last thing a programmer wants to have to remember to do on a recurring basis.

So...I cooked up this bash cURL script to backup a Blogger blog. This script demonstrates the basics of using cURL. The first curl command fetches the login page and pulls out a security token using grep and sed. The second curl command logs in, providing the security token, login, and password among other arguments. The next curl request requests the home page so that your blogID can be determined, which will be used in the final request. The final curl request actually requests the backup dumping the result to standard out - you can redirect this to the file of your choice or modify the script to also accept an output file. Here is the script:

In other news the Cardinals almost pulled it off today. This was the first time in a long time where I really wanted one team to win over the other in the Super Bowl. Alas it wasn't meant to be. I've been waiting for the Chiefs to make it for almost 25 years and it doesn't look I'll be rooting for them in the Super Bowl anytime soon.