XML Parsing

Chris Pons cmpons at gmail.com
Tue Mar 20 11:15:32 PDT 2012


On Tuesday, 20 March 2012 at 04:32:13 UTC, Adam D. Ruppe wrote:
> I know very little about std.xml (I looked at it and
> said 'meh' and wrote my own lib), but my lib
> makes this pretty simple.
>
> https://github.com/adamdruppe/misc-stuff-including-D-programming-language-web-stuff
>
> grab dom.d and characterencodings.d
>
> This has a bit of an html bias, but it works for xml too.
>
> ===
> import arsd.dom;
> import std.file;
> import std.stdio;
> import std.conv;
>
> void main() {
> 	auto document = new Document(readText("test12.xml"), true, 
> true);
>
> 	auto map = document.requireSelector("map");
>
> 	writeln(to!int(map.width), "x", to!int(map.height));
>
> 	foreach(tile; document.getElementsByTagName("tile"))
> 		writeln(tile.gid);
> }
> ===
>
> $ dmd test12.d dom.d characterencodings.d
> $ test12
> 25x19
> <snip tile data>
>
>
>
>
>
> Let me explain the lines:
>
> 	auto document = new Document(readText("test12.xml"), true, 
> true);
>
> We use std.file.readText to read the file as a string. 
> Document's
> constructor is: (string data, bool caseSensitive, bool 
> strictMode).
>
> So, "true, true" means it will act like an XML parser, instead 
> of
> trying to correct for html tag soup.
>
>
> Now, document is a DOM, like you see in W3C or web browsers
> (via javascript), though it is expanded with a lot of 
> convenience
> and sugar.
>
> 	auto map = document.requireSelector("map");
>
> querySelector and requireSelector use CSS selector syntax
> to fetch one element. querySelector may return null, whereas
> requireSelector will throw an exception if the element is not
> found.
>
> You can learn more about CSS selector syntax on the web. I tried
> to cover a good chunk of the standard, including most css2 and 
> some
> css3.
>
> Here, I'm asking for the first element with tag name "map".
>
>
> You can also use querySelectorAll to get all the elements that
> match, returned as an array, which is great for looping.
>
> 	writeln(to!int(map.width), "x", to!int(map.height));
>
>
> The attributes on an element are exposed via dot syntax,
> or you can use element.getAttribute("name") if you
> prefer.
>
> They are returned as strings. Using std.conv.to, we can
> easily convert them to integers.
>
>
> 	foreach(tile; document.getElementsByTagName("tile"))
> 		writeln(tile.gid);
>
> And finally, we get all the tile tags in the document and
> print out their gid attribute.
>
> Note that you can also call the element search functions
> on individual elements. That will only return that
> element and its children.
>
>
>
> Here, you didn't need it, but you can also use
> element.innerText to get the text inside a tag,
> pretty much covering basic data retrieval.
>
>
>
>
> Note: my library is not good at handling huge files;
> it eats a good chunk of memory and loads the whole
> document at once. But, it is the easiest way I've
> seen (I'm biased though) to work with xml files,
> so I like it.

Thank you. I'll check it out.




More information about the Digitalmars-d-learn mailing list