Reading a file of words line by line

mark mark at qtrac.eu
Tue Jan 14 16:39:16 UTC 2020


As part of learning D I want to read a file that contains one 
word per line (plus optional junk after the word) and creates a 
set of all the unique words of a particular length (uppercased).

D doesn't appear to have a set type so I'm faking using an 
associative array whose values are always 0.

I can't help feeling that the foreach loop's block is rather more 
verbose than it could be?

----
#!/usr/bin/env rdmd
import std.stdio;

immutable WORDFILE = "/usr/share/hunspell/en_GB.dic";
immutable WORDSIZE = 4; // Should be even

alias WordSet = int[string]; // key = word; value = 0

void main() {
     import core.time;

     auto start = MonoTime.currTime;
     auto words = getWords(WORDFILE, WORDSIZE);
     // TODO
     writeln(words.length, " words");
     writeln(MonoTime.currTime - start);
}

WordSet getWords(string filename, int wordsize) {
     import std.conv;
     import std.regex;
     import std.uni;

     WordSet words;
     auto rx = ctRegex!(r"^[a-z]+", "i");
     auto file = File(filename);
     foreach (line; file.byLine) {
	auto match = matchFirst(line, rx);
	if (!match.empty()) {
	    auto word = match.hit().to!string; // I hope this assumes 
UTF-8?
	    if (word.length == wordsize) {
		words[word.toUpper] = 0;
	    }
	}
     }
     return words;
}
----

PS I'm using ldc on Linux and think that rdmd is excellent. For 
lots of small Python programs I have I'm wondering how many would 
be faster using D and rdmd (which I think caches binaries). Also 
I've now got Mike Parker's "Learning D" on order.


More information about the Digitalmars-d-learn mailing list