Sunday, August 10, 2008

Converting from REXML to LibXML-Ruby

With the recent resurrection of LibXML-Ruby I decided to investigate converting one of our more XML processing heavy applications from REXML to LibXML-Ruby. LibXML-Ruby is touted to be much faster than REXML, and, I found this to be the case. In the process I kept track of some of the differences between the two that should help you if you decide to do the same. Here are some of the command equivalents between the two:

create doc from
create doc from
grab to root of a docdoc.rootdoc.root
create doc from
parser =
parser.string = string
doc = parser.parse
(no, you can pass a StringIO to
return all elements (not text nodes)node.elementsnode.find('*')
xpath from elementelem[xpath](annoyingly can return a single item or an array)elem.find(xpath)
xpath to the first matchREXML::XPath.first(elem, xpath)elem.find_first(xpath)
grab text content of nodeelem.textelem.content
working with attributes
(elem[...] reserved for xpath)
creating nodes, attr_hash), content)
(can't set attrs on create)
deep clone a nodeelem.deep_copyelem.copy(true)
add a child element
(child can be node or string)
node.elements.add_element(name, attr_hash)
node << child_node
node.child = child.node
removing elements
(child may be Element, String, or Integer)
jump to the next
can XPath node not in a document?yesno
can add node directly from one document to anotheryesno

Ok, for those of you that actually read all the way through the table the bonus is right down here, because the biggest difference between REXML and LibXML-Ruby is in the handling of default namespaces. A default namespace is a namespace placed on an XML document that acts as the default, that is it doesn't use a prefix. A good example of this is KML documents which are often defined like this:

With REXML, you can use XPath expressions with the assumption that you are referencing the default namespace and they will just work - no prefix necessary. With LibXML-Ruby, this is not the case. Say you have a reference to a node with LibXML-Ruby, and you want to run some XPath on it, with LibXML-Ruby you will be forced to do something like this:

I found an approach of registering a prefix for the default namespace on Bogle's Blog. While this is nice, you still can't register this once for the whole document, but must do it on each node you will be running an XPath expression on.

(On another note, did I just remove all carriage returns from my table to make blogger happy? Why yes, yes I did.)