XSLT with Python

I just thought it had been a long time since my last post. This one wins even more.

I still really like Lighttpd and until recently was only using Apache for mod_svn and mod_xslt. I don't have much choice with mod_svn short of using svnserve (which I may end up doing), but a few months ago (December 12 by the file date) I took up the challenge of replacing mod_xslt.

I did enjoy mod_xslt and can't complain about its performance or memory usage. The fact that the project is dead is disconcerting, but any time the module stops compiling I'm able to get it working again by looking around, posting on the mailing list, or fixing it myself. Really, the only real qualm I have is that it requires Apache.

As an aside, my love for XML has long since passed and so I just want the system to work and I won't make any future enhancements. In general, I am now anti-XML and pro-JSON and -Bencode. My opinion is that there are still uses for XML, but that it is generally overused.

After some time, I developed this CGI script in Python:

#!/usr/bin/env python

from lxml import etree
import cgi
import sys, os


def transform(xml, xslt):
    doc = etree.parse(xml)
    style = etree.XSLT(etree.parse(xslt))

    return style.tostring(style.apply(doc))

if __name__ == "__main__":
    import cgitb

    form = cgi.FieldStorage()
    if "key" not in form or form["key"].value != KEY:
        print "Status: 403 Forbidden"
        print "Content-Type: text/html"
        print "<html><title>403 Forbidden</title><body><h1>403 Forbidden</h1></body></html>"

    xml = form["xmlfile"].value
    xslt = form["xsltfile"].value
    contenttype = form["contenttype"].value

    print "Content-Type:", contenttype
    print transform(xml, xslt)

Luckily I didn't use very many mod_xslt specific features, so everything seemed to "just work." I did lose Content-Type support, so I have to hard-code it as a GET parameter. Notice I added the secret key in there since I didn't want to bother with proper security.

Now for the Lighttpd configuration. Since I can no longer use .htaccess files in different directories to change which XSLT is used, I get this more-ugly config:

url.redirect = ( "^/$" => "/recent/" )
url.rewrite-once = (
    "^(/recent/rss/)(?:index.html|index.xml)?$" => "/cgi-bin/ejona-xslt.py?key=SOMESECRETKEY&xsltfile=/path/to/htdocs/shared/xsl/application-rss%%2Bxml.xsl&contenttype=application/xml&xmlfile=../$1/index.xml",
    "^(/recent/atom/)(?:index.html|index.xml)?$" => "/cgi-bin/ejona-xslt.py?key=SOMESECRETKEY&xsltfile=/path/to/htdocs/shared/xsl/application-atom%%2Bxml.xsl&contenttype=application/atom%%2Bxml&xmlfile=../$1/index.xml",
    "^(/recent/atom/summary/)(?:index.html|index.xml)?$" => "/cgi-bin/ejona-xslt.py?key=SOMESECRETKEY&xsltfile=/path/to/htdocs/shared/xsl/application-atom%%2Bxml.summary.xsl&contenttype=application/atom%%2Bxml&xmlfile=../$1/index.xml",
    "^(/recent/atom/0\.3/)(?:index.html|index.xml)?$" => "/cgi-bin/ejona-xslt.py?key=SOMESECRETKEY&xsltfile=/path/to/htdocs/shared/xsl/application-atom%%2Bxml.0.3.xsl&contenttype=application/xml&xmlfile=../$1/index.xml",
    "^((?:/recent/|/archive/).*).(?:html|xml)$" => "/cgi-bin/ejona-xslt.py?key=SOMESECRETKEY&xsltfile=/path/to/htdocs/shared/xsl/text-html.xsl&contenttype=text/html&xmlfile=../$1.xml",
    "^((?:/recent|/archive)/(?:.*/)?)$" => "/cgi-bin/ejona-xslt.py?key=SOMESECRETKEY&xsltfile=/path/to/htdocs/shared/xsl/text-html.xsl&contenttype=text/html&xmlfile=../$1/index.xml",
index-file.names = ( "index.xml" )

Notice the %%2B's in some of the URLs. Those make it additionally ugly, but I still prefer that stuff over dealing with Apache.

All-in-all, it feels like a reasonably hackish solution, but it works great. I don't care about loss in performance (honestly, who reads a not-updated-in-over-two-years blog?) and if I really care I could convert the script into a Fast-CGI on WSGI script. It is nice to know that this proof-of-concept of a blog is somewhat portable now.