Does something like std.algorithm.iteration:splitter with multiple seperators exist?
Simen Kjaeraas via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Mar 23 08:23:38 PDT 2016
On Wednesday, 23 March 2016 at 11:57:49 UTC, ParticlePeter wrote:
> I need to parse an ascii with multiple tokens. The tokens can
> be seen as keys. After every token there is a bunch of lines
> belonging to that token, the values.
> The order of tokens is unknown.
>
> I would like to read the file in as a whole string, and split
> the string with:
> splitter(fileString, [token1, token2, ... tokenN]);
>
> And would like to get a range of strings each starting with
> tokenX and ending before the next token.
>
> Does something like this exist?
>
> I know how to parse the string line by line and create new
> strings and append the appropriate lines, but I don't know how
> to do this with a lazy result range and new allocations.
Without a bit more detail, it's a bit hard to help.
std.algorithm.splitter has an overload that takes a function
instead of a separator:
import std.algorithm;
auto a = "a,b;c";
auto b = a.splitter!(e => e == ';' || e == ',');
assert(equal(b, ["a", "b", "c"]));
However, not only are the separators lost in the process, it only
allows single-element separators. This might be good enough given
the information you've divulged, but I'll hazard a guess it isn't.
My next stop is std.algorithm.chunkBy:
auto a = ["a","b","c", "d", "e"];
auto b = a.chunkBy!(e => e == "a" || e == "d");
auto result = [
tuple(true, ["a"]), tuple(false, ["b", "c"]),
tuple(true, ["d"]), tuple(false, ["e"])
];
No assert here, since the ranges in the tuples are not arrays. My
immediate concern is that two consecutive tokens with no
intervening values will mess it up. Also, the result looks a bit
messy. A little more involved, and according to documentation not
guaranteed to work:
bool isToken(string s) {
return s == "a" || s == "d";
}
bool tokenCounter(string s) {
static string oldToken;
static bool counter = true;
if (s.isToken && s != oldToken) {
oldToken = s;
counter = !counter;
}
return counter;
}
unittest {
import std.algorithm;
import std.stdio;
import std.typecons;
import std.array;
auto a = ["a","b","c", "d", "e", "a", "d"];
auto b = a.chunkBy!tokenCounter.map!(e=>e[1]);
auto result = [
["a", "b", "c"],
["d", "e"],
["a"],
["d"]
];
writeln(b);
writeln(result);
}
Again no assert, but b and result have basically the same
contents. Also handles consecutive tokens neatly (but consecutive
identical tokens will be grouped together).
Hope this helps.
--
Simen
More information about the Digitalmars-d-learn
mailing list