std.gregorian contribution

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon May 17 10:44:52 PDT 2010


On 05/17/2010 12:32 PM, negerns wrote:
> On 5/18/2010 1:03 AM, Tomek Sowiński wrote:
>> negerns wrote:
>>
>>> Also, I have introduced a unjoin() function as a helper function. It
>>> splits a string into an array of lines using the specified array of
>>> characters as delimiters. I am not sure if there is already an
>> existing
>>> function that does the same but I could not find it. For lack of a
>>> better word I opted for the opposite of the join() function in
>> std.string.
>>>
>>> string[] unjoin(string s, char[] ch)
>>> {
>>> uint start = 0;
>>> uint i = 0;
>>> string[] result;
>>>
>>> for (i = 0; i< s.length; i++) {
>>> if (indexOf(ch, s[i]) != -1) {
>>> result ~= s[start..i];
>>> start = i + 1;
>>> }
>>> }
>>> if (start< i) {
>>> result ~= s[start..$];
>>> }
>>> return result;
>>> }
>>>
>>> unittest {
>>> string s = "2010-05-31";
>>> string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
>>> assert(r[0] == "2010");
>>> assert(r[1] == "05");
>>> assert(r[2] == "31");
>>> }
>>
>> Thanks, it's useful. There's std.string.split but it takes only one
>> delimiter. It'd be nice to have it as an overload that takes any range
>> of delims. Yet, a delim can be a string (an array) and there would be
>> problems how to understand split(..., "://"). So I suggest calling it
>> splitBy to disambiguate. Like it?
>>
>>
>> Tomek
>
> I wish it wouldn't be too long like splitByChar :)
> I'm out of ideas.

I have two unrelated suggestions about unjoin.

First, you may want to follow the model set by splitter() instead of 
split() when defining unjoin(). This is because split() allocates memory 
whereas splitter splits lazily so it doesn't need to. If you do want 
split(), just call array(splitter()).

Second, there is an ambiguity between splitting using a string as 
separator and splitting using a set of characters as separator. This 
could be solved by simply using different names:

string str = ...;
foreach (splitByOneOf(str, "; ")) { ... }
foreach (splitter(str, "; ")) { ... }

First look splits by one of the two, whereas the second splits by the 
exact string "; ".

An idea I am toying with is to factor things out into the data types. 
After all, if I'm splitting by "one of" an element in a set of elements, 
that should be reflected in the set's type. For example:

foreach (splitter(str, either(';', ' ')) { ... }
foreach (splitter(str, "; ")) { ... }

or, using a more general notion of a set:

foreach (splitter(str, set(';', ' ')) { ... }

One nice outcome is that we can then reuse the same pattern in other 
signatures.


Andrei


More information about the Digitalmars-d mailing list