Splitter quiz / survey

Jason House jason.james.house at gmail.com
Mon Apr 27 05:35:06 PDT 2009


Andrei Alexandrescu Wrote:

> Brad Roberts wrote:
> > Without looking at the docs, code, or compiling and running a test, what will
> > this do:
> > 
> >     foreach(x, splitter(",a,b,", ","))
> >         writefln("x = %s", a);
> > 
> > I'll make it multiple choice:
> > 
> > choice 1)
> >   x = a
> >   x = b
> > 
> > choice 2)
> >   x =
> >   x = a
> >   x = b
> > 
> > choice 3)
> >   x = a
> >   x = b
> >   x =
> > 
> > choice 4)
> >   x =
> >   x = a
> >   x = b
> >   x =
> > 
> > Later,
> > Brad
> 
> Thanks for bringing this to attention, Brad. Splitter does what Perl's 
> split does: 2. This means comma is an item terminator and not an item 
> separator. Why did I think this is a good idea? Because in most cases, I 
> was thankful to Perl's split that it does exactly the right thing.

Before reading your post, I was going to say that I'd expect 4, would accept 1, and consider 2 or 3 to be buggy! Notice how under your new proposal everyone would still get the behavior wrong when reading the code.



 
> Whenever I read text from linguistic corpora, I see that words (or other 
> word properties) are separated by spaces. There is never a space before 
> the first word on a line, but there is often a trailing space at the end 
> of the line. Why? Because the text was processed by a program that 
> output "word, ' '" or "tag, ' '" for each word of tag. Then if I split 
> the text by whitespace, I'd be annoyed to see that trailing spaces do 
> matter.
> 
> For the same reason, C accepts enum X { a, b, } but not ,a ,b. 
> Mechanically generating enum values is easier if each value has a 
> trailing comma.
> 
> Similarly, when you split a text by '\n', a leading empty line is 
> important, whereas you wouldn't expect a final '\n' to introduce an 
> empty line.
> 
> Now clearly there are cases in which leading or trailing empty items are 
> both important. I'm just saying they are more rare. We could add an 
> enumerated parameter to Splitter:
> 
> enum PleaseFindAGoodName { terminator, separator }
> 
> foreach (line; splitter(",a,b,", ","))
>      ... terminator is implicit ...
> foreach (line; splitter(",a,b,", ",", PleaseFindAGoodName.separator))
>      ... separator ...
> 
> We might just go with the terminator semantics and ask people who need 
> separator semantics to use a stripl() or a munch() prior to splitting. 
> I'd personally prefer having an enum there.
> 
> 
> Andrei




More information about the Digitalmars-d mailing list