7/15/2012

XML, DTD, XML Schema, XSLT

In this series, we've already seen XHTML, using XML, HTML5, which can be used with an XML syntax and SGML, which both classic HTML and XML are subsets of, and you probably encountered XML apart from HTML.

Other technologies using the Extensible Markup Language are the newsfeed formats RSS and Atom, the data exchange protocol SOAP, the instant messaging protocol XMPP (Jabber), the OpenDocument as well as the newer Microsoft Office formats, the vector graphics format SVG and many many many more. It's also commonly used for things like configuration files or data exchange.

Let's use this flexible all purpose tool to define the JDBC, the John Doe Business Card format. ;)
<?xml version="1.0" encoding="UTF-8"?>
<person name="John Doe">
  <work>DOD</work>
  <hobbies>
    <hobby>Fishing</hobby>
    <hobby>Cars</hobby>
  </hobbies>
</person>

The first line is an XML declaration, which is optional and defines the XML version (although there's version 1.1, 1.0 is the only version widely implemented) and the character set, usually UTF-8, sometimes UTF-16. The declaration can be used with XHTML as well, but do not throw it at the Internet Explorer - it won't like it.

The rest of the document looks very common to HTML, but with arbitrary elements and attributes. To enable others to implement the same format and be able to verify their document, we need some definition. That's what XML schema languages are for. One of them is the Document Type Definition (DTD), a classic schema language for SGML. All HTML and XHTML standards include a formal DTD, referenced in a correct doctype declaration. For example, take a look at the DTD of XHTML 1.1. It is split into several modules. In the Block Phrasal Module you can find the following part:

<!ENTITY % h1.element  "INCLUDE" >
<![%h1.element;[
<!ENTITY % h1.qname  "h1" >
<!ELEMENT %h1.qname;  %Heading.content; >
]]>

<!ENTITY % h1.attlist  "INCLUDE" >
<![%h1.attlist;[
<!ATTLIST %h1.qname;
      %Common.attrib;
>
]]>

This formally defines the h1 element, which may include any valid heading content and common attributes. Ugly? Agree. Let's move on to a more recent, XML-specific schema language called XML Schema or XSD for XML Schema Definition. XML Schema is able to define XML schemas and is itself defined using an XML Schema. Think about it, then have a look at the JDBC definition:

<?xml version="1.0" encoding="utf-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
  <element name="person">
    <complexType>
      <all>
        <element name="work" type="string" minOccurs="0" maxOccurs="1"/>
        <element name="hobbies" minOccurs="0" maxOccurs="1">
          <complexType>
            <sequence>
              <element name="hobby" minOccurs="0" maxOccurs="unbounded" type="string"/>
            </sequence>
          </complexType>
        </element>
      </all>
      <attribute name="name" type="string" use="required"/>
    </complexType>
  </element>
</schema>

This schema defines the root element <person> containing one or zero <work> elements (in terms containing a string) and one or zero <hobbies> elements as well as a mandatory name attribute of type string. <hobbies> may include an arbitrary number of <hobby> elements of type string. Given this definition, everybody's able to create a JDBC, which can be used by an appropriate parser.

Now, let's use an Extensible Stylesheet Language (XSL) Transformation (XSLT), another member of the XML family, to transform such a JDBC into HTML:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="person">
    <html>
      <body>
        <h1><xsl:value-of select="@name"/></h1>
        <xsl:apply-templates select="work"/>
        <xsl:apply-templates select="hobbies"/>
      </body>
    </html>
  </xsl:template>
  
  <xsl:template match="work">
    <h2>Work</h2>
    <xsl:value-of select="."/>
  </xsl:template>
  
  <xsl:template match="hobbies">
    <h2>Hobbies</h2>
    <ul>
      <xsl:apply-templates select="hobby"/>
    </ul>
  </xsl:template>
  
  <xsl:template match="hobby">
    <li><xsl:value-of select="."/></li>
  </xsl:template>
</xsl:stylesheet>

This is a nice little technique supported by every recent browser (and has ever been for the last five to ten years; yes, even IE 6). The XSL transform defines templates for all the elements, each defining an HTML structure to be used. The technique to select elements in the original XML document is called XPath, one more XML-related standard. E. g. @name within the person element selects the name attribute. Extend John Doe's card by adding the stylesheet to use:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="jdbc.xsl"?>
<person name="John Doe">
  <work>DOD</work>
  <hobbies>
    <hobby>Fishing</hobby>
    <hobby>Cars</hobby>
  </hobbies>
</person>

Save the XSLT stylesheet as jdbc.xsl next to it and open the XML in your browser. (Just realized that Chrome refuses to load the stylesheet when accessing the local files, so try a webserver if you get an empty page.) Nice, isn't it? It's a perfect HTML page based on the pure XML data, which can be used for data exchange too. CSS stylesheets can be added to XML documents in the same way; just replace the type with text/css.

That shall be enough on XML. Dynamic topics are coming up, but first, I'll cover HTML and CSS helpers like HAML, LESS and SASS in the next post.

No comments:

Post a Comment