
I decided to put a script together to notify me in case I start approaching the monthly limit. My first thought was that this was a perfect task for mechanize, a Ruby library for interacting with web sites. I put that aside, however, to make an attempt to script the scraping with curl.
Those folks at Comcast are whack. When logging in to the home page, what is sent back is a redirect - no not an HTTP redirect. You are sent back a page that has a form in it with a "cima.ticket" that submits itself in the body onload event. Here is the somewhat functional script to pull the your Comcast bandwidth usage:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
USER=$1 | |
PASSWD=$2 | |
AGENT="Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.13" | |
function bail { | |
echo $1 | |
exit $2 | |
} | |
ticket=$(curl -s \ | |
--user-agent "${AGENT}" \ | |
--cookie-jar cjar \ | |
--location \ | |
--referer ";auto" \ | |
-d "user=${USER}&passwd=${PASSWD}&rm=2&deviceAuthn=false" \ | |
-d "forceAuthn=true&s=ccentral-cima&r=comcast.net" \ | |
-d "continue=https://customer.comcast.com/Secure/Home.aspx" \ | |
https://login.comcast.net/login | tee one.html | \ | |
grep cima\.ticket | \ | |
sed -e 's/.*cima.ticket" value="\([^"]*\).*$/\1/') | |
[ $? -eq 0 ] || bail "could not login and grab the ticket" $? | |
curl -s \ | |
--user-agent "${AGENT}" \ | |
--cookie-jar cjar \ | |
--cookie cjar \ | |
--location \ | |
--referer "https://customer.comcast.com/Secure/Home.aspx;auto" \ | |
--data-urlencode "cima.ticket=${ticket}" \ | |
"https://customer.comcast.com/Secure/Home.aspx" > /dev/null | |
[ $? -eq 0 ] || bail "could not post the ticket" $? | |
usage=$(curl -s \ | |
--user-agent "${AGENT}" \ | |
--cookie-jar cjar \ | |
--cookie cjar \ | |
--location \ | |
--referer "https://customer.comcast.com/Secure/Home.aspx;auto" \ | |
https://customer.comcast.com/Secure/Users.aspx | \ | |
grep "GB of" | sed 's/^[^>]*>\([^<]*\)<.*$/\1/') | |
[ $? -eq 0 ] || bail "could not find the usage" $? | |
echo "usage: ${usage}" |
I say somewhat functional because it doesn't work every time. It will fail to pull a result every now and then. There is a lot of wonkiness in the way the Comcast login works - and it appears this results in some kind of timing issue. Hopefully I have a chance to investigate a solution for this in the future.
Oh - back to mechanize. I googled for another solution to this problem and found this. It shows off the elegance and ease of use of mechanize, however, it seems to fail intermittently just like my script.