XML Parsing

Adam D. Ruppe destructionator at gmail.com
Mon Mar 19 21:32:12 PDT 2012


I know very little about std.xml (I looked at it and
said 'meh' and wrote my own lib), but my lib
makes this pretty simple.

https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff

grab dom.d and characterencodings.d

This has a bit of an html bias, but it works for xml too.

===
import arsd.dom;
import std.file;
import std.stdio;
import std.conv;

void main() {
	auto document = new Document(readText("test12.xml"), true, true);

	auto map = document.requireSelector("map");

	writeln(to!int(map.width), "x", to!int(map.height));

	foreach(tile; document.getElementsByTagName("tile"))
		writeln(tile.gid);
}
===

$ dmd test12.d dom.d characterencodings.d
$ test12
25x19
<snip tile data>





Let me explain the lines:

	auto document = new Document(readText("test12.xml"), true, true);

We use std.file.readText to read the file as a string. Document's
constructor is: (string data, bool caseSensitive, bool 
strictMode).

So, "true, true" means it will act like an XML parser, instead of
trying to correct for html tag soup.


Now, document is a DOM, like you see in W3C or web browsers
(via javascript), though it is expanded with a lot of convenience
and sugar.

	auto map = document.requireSelector("map");

querySelector and requireSelector use CSS selector syntax
to fetch one element. querySelector may return null, whereas
requireSelector will throw an exception if the element is not
found.

You can learn more about CSS selector syntax on the web. I tried
to cover a good chunk of the standard, including most css2 and 
some
css3.

Here, I'm asking for the first element with tag name "map".


You can also use querySelectorAll to get all the elements that
match, returned as an array, which is great for looping.

	writeln(to!int(map.width), "x", to!int(map.height));


The attributes on an element are exposed via dot syntax,
or you can use element.getAttribute("name") if you
prefer.

They are returned as strings. Using std.conv.to, we can
easily convert them to integers.


	foreach(tile; document.getElementsByTagName("tile"))
		writeln(tile.gid);

And finally, we get all the tile tags in the document and
print out their gid attribute.

Note that you can also call the element search functions
on individual elements. That will only return that
element and its children.



Here, you didn't need it, but you can also use
element.innerText to get the text inside a tag,
pretty much covering basic data retrieval.




Note: my library is not good at handling huge files;
it eats a good chunk of memory and loads the whole
document at once. But, it is the easiest way I've
seen (I'm biased though) to work with xml files,
so I like it.


More information about the Digitalmars-d-learn mailing list