julik live

XMLRPC and Builder, a marriage made in heaven

This one is an example of why if you do XML you better do it right. I've written this blog with an aim for spotless MetaWeblog API support (I hate Web UI's and I own both MarsEdit and Ecto licenses). I also don't have an aversion to Dave Winer, and Atompub is not there yet IMO.

So I've implemented a small MetaWeblog responder based on Ruby's XML-RPC module. The module is somewhat antiquated and the docs are not always telling all that you might need, but there is a little problem which is actually very serious - it uses a bozo XML writer.

Recently I've posted an entry about caches. When I've posted it and retreived it via MarsEdit, everything went well. But then an update to MarsEdit came along, and lo and behold - the updated MarsEdit was not able to retreive the very entry the old one has made.

Error message

Hmm... let's see up close. I copied the XML response into a standalone XML document and ran it through xmllint.

invalid CDATA

Even more interesting. The character in question is ASCII 005, the ENQ character. Don't know how MacOS X managed to type it into the entry box in MarsEdit, but it certainly was there!


This is an ASCII "control" non-printable character, and putting it into XML is anything but responsible. Even though it might be present in the blog entry itself, it should never bleed through into the XML representation for an RPC call! So I went out to investigate what writes XML for Ruby's XML RPC. I wish I didn't - to spare you a search, the file you need is create.rb.

Let's put it that way - the only sanitization done is replacing the obligatory entities. The rest.. well the rest is expected to somehow happen when you pass your results to XML RPC. The XML writer in Ruby's RPC works based on text concatenation.

Fixing the bitch

The library we all love and use is Jim's XML Builder - an extremely versatile and friendly block-based XML generator, which also happens to have one of the most extensive XML character sanitization routines available - the so-called XChar harness by Sam Ruby. If there is a person who can be trusted with gremlins in XML character encoding world, it's Sam, and while XChar is a burden when you want to read your own Russian XML feed in plain-text, it's a perfect fit for the problem we are facing here (generating well-formed XML no matter how great the cost or readability of the product).

Let's take a look at how Ruby's RPC writes out a method response (if you are into clean Ruby code take your handkerchief now):

     def methodResponse(is_ret, *params)

       if is_ret 
   resp = params.collect do |param|
     @writer.ele("param", conv2value(param))

   resp = [@writer.ele("params", *resp)]
   if params.size != 1 or params[0] === XMLRPC::FaultException 
     raise ArgumentError, "no valid fault-structure given"
   resp = @writer.ele("fault", conv2value(params[0].to_h))

       tree = @writer.document(
          @writer.pi("xml", 'version="1.0"'),
          @writer.ele("methodResponse", resp) 

       @writer.document_to_str(tree) + "\n"

Thus the way an XML writer is expected to operate in XML-RPC is based on returns, roughly so:

    make_element_with_text("value", blabla..))

A method generating a node has to return this node in text form, after which it's going to be used as child content for the next method call.

Maaaam, if there is a way to properly generate XML this is certainly not the one.

To transform this into Builder's calls we will need to trick XML-RPC into beleiving that we are passing around text chunks, but instead we will pass around XML node commands. When the whole node tree has been assembled we will use the document\_to\_str method call to "replay" the commands to Builder. Let's begin.

# An XML writer for Ruby's XML RPC module
require 'rubygems'
require 'builder'

class RPCBuilder 
  # This is what is going to be lugged around as XML RPC results. It's actually
  # a node element
  class Command
    attr_accessor :text, :name, :children

    def initialize
      @text, @name, @children = nil, nil, []
      yield(self) if block_given?

    def inspect
      '<#Command @name=%s @text=%s @children=%s' % [ name, text, children.length]

I am using an inner class here, because I do not agree with the Rails' Clique view of "Namespaces have been invented by idiots" for a second (I take it more as "Namespace support in Rails sucks balls").

The Command is what we are going to lug around instead of the text chunks the default writer sends. Now let's reimplement some methods that an XML writer needs to support for XML-RPC to accept it.

This one is going to make an element with some text in it

  # Should return a prebaked element. It saves us that native Builder uses tag!
  def tag(name, txt)
    Command.new { | c |  c.name = name;  c.text = txt }

This one is a no-op (XML-RPC is riddled with no-ops all over the place, so we will maintain the tradition).

  # Make a document and stuff things into it. 
  # Document is actually a noop
  def document(*stuff)

We don't need PI's because Builder has instruct! that we are going to call anyway. Besides, Ruby's XMLRPC never writes a prolog that has UTF-8 in it.

  # noop, a processing instruction
  def pi(name, *params); end

This is the one we want the most - make a sane CDATA chunk.

  # Should return the text escaped as XML cdata
  def text(txt)
    Command.new { |c| c.text = txt }

This one is going to accept an array of Commands (instead of text) as children. If we look closely at what XMLRPC does, sometimes it sums the two returns of previous methods to get a whole list. Arrays in Ruby can handle that (they will be merged), so this is a good enough solution.

  def ele(name, *children)
    # Make an element with name and attributes
    Command.new { |c| c.name = name; c.children = children }

Now let's pass to the meat of the method. First of all, it's not the best idea to inherit from XmlBuilder itself (it's a blank slate with no methods that swallows any method calls), this is bad for debugging - we'll encapsulate the builder instead. The document\_to\_str method is going to be the place where we process our Command tree and instantiate a contained XmlBuilder.

  def document_to_str(command_tree)
    @builder = Builder::XmlMarkup.new
    # Play the command tree to builder inside, do not run twice

Now the "do not run twice" bit is important. If you call to_s on XmlBuilder twice it's going to cretate an XML element for the first call like so

<to_s />

This is not what we want. And now the final piece of the puzzle - the replayer. Our task here is to transform a node tree that the other method calls generated (with Command objects with child nodes) into a tree of Builder calls wrapped into blocks.

  # This will convert a tree of commands into XML Builder calls
  def command_to_builder_call(cmd)
    case cmd
      when NilClass
        # pass
      when Array
        cmd.compact.each{|c| command_to_builder_call(c) }
      when Command
        if cmd.name
          @builder.tag!(cmd.name) do
           @builder.text!(cmd.text.to_s.strip) if cmd.text
      when String
        @builder.text!(cmd.strip) unless cmd.empty?

This method is basically a huge recursion - it will drill down the Command#children until it finds the bottom nodes, and issue wrapping calls to Builder underneath. It also can consume arrays (which XMLRPC returns in abundance).

Then we will need to plug the end result into XMLRPC. Although XMLRPC sports a sleuth of WriterChooserMixin modules and writer class iterators I could not find a way to override the writer for my own RPC service. Not in the docs and not in the code. So we just go and globally subvert the whole writer infrastructure in XMLRPC::Config:

     $VERBOSE, yuck = nil, $VERBOSE
   XMLRPC::Config.send(:const_set, :DEFAULT_WRITER, RPCBuilder)
     $VERBOSE  = yuck

A much simpler solution

Plug XChar into the standard XMLWriter that RPC uses:

module XMLRPC::XMLWriter
  class XCharred < Simple
    def text(txt)

However, using the whole Builder as a serializer just seemed cleaner to me from the OOP pluggability standpoint.


If you are wondering if the same ASCII 005 can happen to you in ActionWebService in Rails - I think so, because it uses the same RPC module. And this actually brings us to a little caveat with XML-RPC which is handy to know if you want to do it properly. This:

 assert_equal "foo\005", 
    @unicode_rpc.call("mine.noopMethodThatAcceptsString", "foo\005")

should always fail. If you need to transport non-UTF8 blobs across, use the Base-64 encoded bits type. The string type is UTF-8 only.

And not only that, but it's ultimately the responsibility of the client to kill the offending text (which MarsEdit didn't do for me, although it should have), because I would have gotten an error messsage when posting the message if XMLRPC's Parser module was not bozo.

Suspects: Веб-стройка Юникод

comments powered by Disqus

Aspirine not included.