Splitter quiz / survey

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Apr 27 04:53:26 PDT 2009


Brad Roberts wrote:
> Without looking at the docs, code, or compiling and running a test, what will
> this do:
> 
>     foreach(x, splitter(",a,b,", ","))
>         writefln("x = %s", a);
> 
> I'll make it multiple choice:
> 
> choice 1)
>   x = a
>   x = b
> 
> choice 2)
>   x =
>   x = a
>   x = b
> 
> choice 3)
>   x = a
>   x = b
>   x =
> 
> choice 4)
>   x =
>   x = a
>   x = b
>   x =
> 
> Later,
> Brad

Thanks for bringing this to attention, Brad. Splitter does what Perl's 
split does: 2. This means comma is an item terminator and not an item 
separator. Why did I think this is a good idea? Because in most cases, I 
was thankful to Perl's split that it does exactly the right thing.

Whenever I read text from linguistic corpora, I see that words (or other 
word properties) are separated by spaces. There is never a space before 
the first word on a line, but there is often a trailing space at the end 
of the line. Why? Because the text was processed by a program that 
output "word, ' '" or "tag, ' '" for each word of tag. Then if I split 
the text by whitespace, I'd be annoyed to see that trailing spaces do 
matter.

For the same reason, C accepts enum X { a, b, } but not ,a ,b. 
Mechanically generating enum values is easier if each value has a 
trailing comma.

Similarly, when you split a text by '\n', a leading empty line is 
important, whereas you wouldn't expect a final '\n' to introduce an 
empty line.

Now clearly there are cases in which leading or trailing empty items are 
both important. I'm just saying they are more rare. We could add an 
enumerated parameter to Splitter:

enum PleaseFindAGoodName { terminator, separator }

foreach (line; splitter(",a,b,", ","))
     ... terminator is implicit ...
foreach (line; splitter(",a,b,", ",", PleaseFindAGoodName.separator))
     ... separator ...

We might just go with the terminator semantics and ask people who need 
separator semantics to use a stripl() or a munch() prior to splitting. 
I'd personally prefer having an enum there.


Andrei



More information about the Digitalmars-d mailing list