std.gregorian contribution
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Mon May 17 10:44:52 PDT 2010
On 05/17/2010 12:32 PM, negerns wrote:
> On 5/18/2010 1:03 AM, Tomek Sowiński wrote:
>> negerns wrote:
>>
>>> Also, I have introduced a unjoin() function as a helper function. It
>>> splits a string into an array of lines using the specified array of
>>> characters as delimiters. I am not sure if there is already an
>> existing
>>> function that does the same but I could not find it. For lack of a
>>> better word I opted for the opposite of the join() function in
>> std.string.
>>>
>>> string[] unjoin(string s, char[] ch)
>>> {
>>> uint start = 0;
>>> uint i = 0;
>>> string[] result;
>>>
>>> for (i = 0; i< s.length; i++) {
>>> if (indexOf(ch, s[i]) != -1) {
>>> result ~= s[start..i];
>>> start = i + 1;
>>> }
>>> }
>>> if (start< i) {
>>> result ~= s[start..$];
>>> }
>>> return result;
>>> }
>>>
>>> unittest {
>>> string s = "2010-05-31";
>>> string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
>>> assert(r[0] == "2010");
>>> assert(r[1] == "05");
>>> assert(r[2] == "31");
>>> }
>>
>> Thanks, it's useful. There's std.string.split but it takes only one
>> delimiter. It'd be nice to have it as an overload that takes any range
>> of delims. Yet, a delim can be a string (an array) and there would be
>> problems how to understand split(..., "://"). So I suggest calling it
>> splitBy to disambiguate. Like it?
>>
>>
>> Tomek
>
> I wish it wouldn't be too long like splitByChar :)
> I'm out of ideas.
I have two unrelated suggestions about unjoin.
First, you may want to follow the model set by splitter() instead of
split() when defining unjoin(). This is because split() allocates memory
whereas splitter splits lazily so it doesn't need to. If you do want
split(), just call array(splitter()).
Second, there is an ambiguity between splitting using a string as
separator and splitting using a set of characters as separator. This
could be solved by simply using different names:
string str = ...;
foreach (splitByOneOf(str, "; ")) { ... }
foreach (splitter(str, "; ")) { ... }
First look splits by one of the two, whereas the second splits by the
exact string "; ".
An idea I am toying with is to factor things out into the data types.
After all, if I'm splitting by "one of" an element in a set of elements,
that should be reflected in the set's type. For example:
foreach (splitter(str, either(';', ' ')) { ... }
foreach (splitter(str, "; ")) { ... }
or, using a more general notion of a set:
foreach (splitter(str, set(';', ' ')) { ... }
One nice outcome is that we can then reuse the same pattern in other
signatures.
Andrei
More information about the Digitalmars-d
mailing list