[improve-it] Parsing NG archive and sorting by post-count

Andrej Mitrovic none at none.none
Tue Mar 15 15:24:06 PDT 2011


I thought about making a kind of code-golf contest (stackoverflow usually has these contests). Only I would focus on improving each others code.

So here's my idea of the day: Parse the newsgroup archive files from http://www.digitalmars.com/NewsGroup.html, and for each .html file output another .html file which has a list of topics sorted in post count order. Sure, there is NG software which does this automatically. But this is about doing it in D.

Here's my implementation: https://gist.github.com/871631

Download a few .html files, save them in their own folder. Then copy my script into a .d file in the same folder, and just run it with RDMD. It will output the files in a `output`subfolder. It works on Windows, since that's all I've tested it with.

There's a few things I've noticed: Using just a simple hash with the post count as the Key type wouldn't work. There are many topics which have the same post count number, and AA's can't hold duplicates. So I worked around this by making a wrapper which hides all the details of storing duplicates and traversal, I've called it `CommonAA`.

I've also implemented an `allSatisfy` function which works on runtime arguments. There's a similar function in std.typetuple, but its only useful for compile-time arguments. There's probably a similar method someplace in std.algorithm, but I was too lazy to check. I thought it would be nice to have.

I can see some ways to improve this. For one, I could have used Regex instead of indexOf. I could have also tried to avoid using a wrapper, however I haven't figured out a way to do this while having duplicate key types and having to sort them while keeping the Key types linked to the Values.

Anywho, let's see you improve my code! It's just for fun and maybe we'll learn some tricks from one another. Have fun!


More information about the Digitalmars-d-learn mailing list