Does something like std.algorithm.iteration:splitter with multiple seperators exist?

Simen Kjaeraas via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Mar 23 08:23:38 PDT 2016


On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:
> I need to parse an ascii with multiple tokens. The tokens can 
> be seen as keys. After every token there is a bunch of lines 
> belonging to that token, the values.
> The order of tokens is unknown.
>
> I would like to read the file in as a whole string, and split 
> the string with:
> splitter(fileString, [token1, token2, ... tokenN]);
>
> And would like to get a range of strings each starting with 
> tokenX and ending before the next token.
>
> Does something like this exist?
>
> I know how to parse the string line by line and create new 
> strings and append the appropriate lines, but I don't know how 
> to do this with a lazy result range and new allocations.

Without a bit more detail, it's a bit hard to help.

std.algorithm.splitter has an overload that takes a function 
instead of a separator:

     import std.algorithm;
     auto a = "a,b;c";
     auto b = a.splitter!(e => e == ';' || e == ',');
     assert(equal(b, ["a", "b", "c"]));

However, not only are the separators lost in the process, it only 
allows single-element separators. This might be good enough given 
the information you've divulged, but I'll hazard a guess it isn't.

My next stop is std.algorithm.chunkBy:

     auto a = ["a","b","c", "d", "e"];
     auto b = a.chunkBy!(e => e == "a" || e == "d");
     auto result = [
         tuple(true, ["a"]), tuple(false, ["b", "c"]),
         tuple(true, ["d"]), tuple(false, ["e"])
         ];

No assert here, since the ranges in the tuples are not arrays. My 
immediate concern is that two consecutive tokens with no 
intervening values will mess it up. Also, the result looks a bit 
messy. A little more involved, and according to documentation not 
guaranteed to work:

bool isToken(string s) {
     return s == "a" || s == "d";
}

bool tokenCounter(string s) {
     static string oldToken;
     static bool counter = true;
     if (s.isToken && s != oldToken) {
         oldToken = s;
         counter = !counter;
     }
     return counter;
}

unittest {
     import std.algorithm;
     import std.stdio;
     import std.typecons;
     import std.array;

     auto a = ["a","b","c", "d", "e", "a", "d"];
     auto b = a.chunkBy!tokenCounter.map!(e=>e[1]);
     auto result = [
         ["a", "b", "c"],
         ["d", "e"],
         ["a"],
         ["d"]
         ];
     writeln(b);
     writeln(result);
}

Again no assert, but b and result have basically the same 
contents. Also handles consecutive tokens neatly (but consecutive 
identical tokens will be grouped together).

Hope this helps.

--
   Simen


More information about the Digitalmars-d-learn mailing list