XML

Perfect provides XML & HTML parsing support via the Perfect-XML module.

Getting Started

In addition to the PerfectLib, you will need the Perfect-XML dependency in the Package.swift file:

.Package(url: "https://github.com/PerfectlySoft/Perfect-XML.git", majorVersion: 2)

In each Swift source file that references the XML classes, include the import directive:

import PerfectXML

macOS Build Notes

If you receive a compile error that says the following, you need to install and link libxml2.

note: you may be able to install libxml-2.0 using your system-packager:

    brew install libxml2

Compile Swift Module 'PerfectXML' (2 sources)
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "libxml2.h"

To install and link libxml2 with Homebrew, use the following two commands:

brew install libxml2
brew link --force libxml2

Linux Setup note

Ensure that you have installed libxml2-dev and pkg-config.

sudo apt-get install libxml2-dev pkg-config

Parsing an XML string

Instantiate an XDocument object with your XML string:

let xDoc = XDocument(fromSource: <XMLSource>)

Now you can get the root node of the XML structure by using the documentElement property.

Convert the node tree to String

Converting the XML node tree to a string, with optional pretty-print formatting:

let xDoc = XDocument(fromSource: rssXML)
let prettyString = xDoc?.string(pretty: true)
let plainString = xDoc?.string(pretty: false)

Working with Nodes

XML is a structured document standard consisting of nodes in the format <A>B</A>. Each node has a tag name and either a value, or sub nodes we call "children". Each node has several important properties:

  • nodeValue
  • nodeName
  • parentNode
  • childNodes

Each node also has a getElementsByTagName method that recursively searches through it and its children to return an array of all nodes that have that name. This method makes it easy to find a single value in the XML file.

To demonstrate, this recursive function iterates through all elements in the node:

func printChildrenName(_ xNode: XNode) {

    // try treating the current node as a text node
    guard let text = xNode as? XText else {

            // print out the node info
            print("Name:\t\(xNode.nodeName)\tType:\(xNode.nodeType)\t(children)\n")

            // find out children of the node
            for n in xNode.childNodes {

                // call the function recursively and take the same produces
                printChildrenName(n)
            }
            return
    }

    // it is a text node and print it out
    print("Name:\t\(xNode.nodeName)\tType:\(xNode.nodeType)\tValue:\t\(text.nodeValue!)\n")
}

printChildrenName( xDoc!)

Access Node by Tag Name

The snippet below shows the method of getElementsByTagName in a general manner:

func testTag(tagName: String) {

    // use .getElementsByTagName to get this node.
    // check if there is such a tag in the xml document
        guard let node = xDoc?.documentElement?.getElementsByTagName(tagName) else {
            print("There is no such a tag: `\(tagName)`\n")
            return
        }

        // if is, get the first matched
        guard let value = node.first?.nodeValue else {
            print("Tag `\(tagName)` has no value\n")
            return
        }

        // show the node value
        print("Tag '\(tagName)' has a value of '\(value)'\n")
}

testTag(tagName: "link")
testTag(tagName: "description")

Access Node by ID

Alternatively querying a node by its id using .getElementById():

func testID(id: String) {

        // Access node by its id, if available
        guard let node = xDoc?.getElementById(id) else {
            print("There is no such a id: `\(id)`\n")
            return
        }

        guard let value = node.nodeValue else {
            print("id `\(id)` has no value\n")
            return
        }

        print("id '\(id)' has a value of '\(value)'\n")
}

testID(id: "rssID")
testID(id: "xmlID")

XElement

The method .getElementsByTagName() returns an array of nodes of type XElement.

The following code demonstrates how to iterate all element in this array:

func showItems() {
        // get all items with tag name of "item"
        let feedItems = xDoc?.documentElement?.getElementsByTagName("item")

        // checkout how many items indeed.
        let itemsCount = feedItems?.count
        print("There are totally \(itemsCount!) Items Found\n")

        // iterate all items in the result set
        for item in feedItems!
        {
                let title = item.getElementsByTagName("title").first?.nodeValue
                let link = item.getElementsByTagName("link").first?.nodeValue
                let description = item.getElementsByTagName("description").first?.nodeValue
                print("Title: \(title!)\tLink: \(link!)\tDescription: \(description!)\n")
        }
}

showItems()

Relationships of a Node Family

PerfectXML provides a convenient way to access all relationships of a specific XML node: Parent, Siblings & Children.

Parent of a Node

Parent node can be accessed using parentNode:

func showParent(tag: String) {

        guard let node = xDoc?.documentElement?.getElementsByTagName(tag).first else {
            print("There is no such a tag '\(tag)'.\n")
            return
        }

        // Accessing the parent node
        guard let parent = node.parentNode else {
            print("Tag '\(tag)' is the root node.\n")
            return
        }
        let name = parent.nodeName
        print("Parent of '\(tag)' is '\(name)'\n")
}

showParent(tag: "link")

Node Siblings

Each XML node can have two siblings: previousSibling and nextSibling.

func showSiblings (tag: String) {

        let node = xDoc?.documentElement?.getElementsByTagName(tag).first

        // Check the previous sibling of current node
        let previousNode = node?.previousSibling
        var name = previousNode?.nodeName
        var value = previousNode?.nodeValue
        print("Previous Sibling of \(tag) is \(name!)\t\(value!)\n")

        // Check the next sibling of current node
        let nextNode = node?.nextSibling
        name = nextNode?.nodeName
        value = nextNode?.nodeValue
        print("Next Sibling of \(tag) is \(name!)\t\(value!)\n")
}

showSiblings(tag: "link")
showSiblings(tag: "description")

First & Last Child

If an XML node has a child, .firstChild and .lastChild can be used:

func firstLast() {
        let node = xDoc?.documentElement?.getElementsByTagName("channel").first

        /// retrieve the first child
        let firstChild = node?.firstChild
        var name = firstChild?.nodeName
        var value = firstChild?.nodeValue
        print("First Child of Channel is \(name!)\t\(value!)\n")

        /// retrieve the last child
        let lastChild = node?.lastChild
        name = lastChild?.nodeName
        value = lastChild?.nodeValue
        print("Last Child of Channel is \(name!)\t\(value!)\n")
}

firstLast()

Node Attributes

Any XML node or element can have attributes:

<node attribute1="value of attribute1" attribute2="value of attribute2">
</node>

Node method .getAttribute(name: String) provides the functionality of accessing attributes:

func showAttributes() {
        let node = xDoc?.documentElement?.getElementsByTagName("title").first

        /// get some attributes of a node
        let att1 = node?.getAttribute(name: "attribute1")
        print("attribute1 of title is \(att1)\n")

        let att2 = node?.getAttributeNode(name: "attribute2")
        print("attribute2 of title is \(att2?.value)\n")
}

showAttributes()

Namespaces

XML namespaces are used for providing uniquely named elements and attributes in an XML document. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

Both .getElementsByTagName() and .getAttributeNode() have namespace versions: .getElementsByTagNameNS() and .getAttributeNodeNS(). In these cases namespaceURI and localName are required.

The following code demonstrates the usage of .getElementsByTagNameNS() and .getNamedItemNS():

func showNamespaces(){
    let deeper = xDoc?.documentElement?.getElementsByTagName("deeper").first
    let atts = deeper?.firstChild?.attributes;
    let item = atts?.getNamedItemNS(namespaceURI: "foo:bar", localName: "atr2")
    print("Namespace of deeper has an attribute of \(item?.nodeValue)\n")

    let foos = xDoc?.documentElement?.getElementsByTagNameNS(namespaceURI: "foo:bar", localName: "fool")
    var count = foos?.count
    let node = foos?.first
    let name = node?.nodeName
    let localName = node?.localName
    let prefix = node?.prefix
    let nuri = node?.namespaceURI

    print("Namespace of 'foo:bar' has \(count!) element:\n")
    print("Node Name: \(name!)\n")
    print("Local Name: \(localName!)\n")
    print("Prefix: \(prefix!)\n")
    print("Namespace URI: \(nuri!)\n")

    let children = node?.childNodes

    count = children?.count

    let a = node?.firstChild
    let b = node?.lastChild

    let na = a?.nodeName
    let nb = b?.nodeName
    let va = a?.nodeValue
    let vb = b?.nodeValue

    print("This node also has \(count!) children.\n")
    print("The first one is called '\(na!)' with value of '\(va!)'.\n")
    print("And the last one is called '\(nb!)' with value of '\(vb!)'\n")
}

showNamespaces()

XPath

XPath (XML Path Language) is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g. strings, numbers, or Boolean values) from the content of an XML document.

The following code demonstrates extracting a specific path:

func showXPath(xpath: String) {
    /// Use .extract() method to deal with the XPath request.
        let pathResource = xDoc?.extract(path: xpath)
        print("XPath '\(xpath)':\n\(pathResource!)\n")
}

showXPath(xpath: "/rss/channel/item")
showXPath(xpath: "/rss/channel/title/@attribute1")
showXPath(xpath: "/rss/channel/link/text()")
showXPath(xpath: "/rss/channel/item[2]/deeper/foo:bar")