Splitter quiz / survey
Jason House
jason.james.house at gmail.com
Mon Apr 27 05:35:06 PDT 2009
Andrei Alexandrescu Wrote:
> Brad Roberts wrote:
> > Without looking at the docs, code, or compiling and running a test, what will
> > this do:
> >
> > foreach(x, splitter(",a,b,", ","))
> > writefln("x = %s", a);
> >
> > I'll make it multiple choice:
> >
> > choice 1)
> > x = a
> > x = b
> >
> > choice 2)
> > x =
> > x = a
> > x = b
> >
> > choice 3)
> > x = a
> > x = b
> > x =
> >
> > choice 4)
> > x =
> > x = a
> > x = b
> > x =
> >
> > Later,
> > Brad
>
> Thanks for bringing this to attention, Brad. Splitter does what Perl's
> split does: 2. This means comma is an item terminator and not an item
> separator. Why did I think this is a good idea? Because in most cases, I
> was thankful to Perl's split that it does exactly the right thing.
Before reading your post, I was going to say that I'd expect 4, would accept 1, and consider 2 or 3 to be buggy! Notice how under your new proposal everyone would still get the behavior wrong when reading the code.
> Whenever I read text from linguistic corpora, I see that words (or other
> word properties) are separated by spaces. There is never a space before
> the first word on a line, but there is often a trailing space at the end
> of the line. Why? Because the text was processed by a program that
> output "word, ' '" or "tag, ' '" for each word of tag. Then if I split
> the text by whitespace, I'd be annoyed to see that trailing spaces do
> matter.
>
> For the same reason, C accepts enum X { a, b, } but not ,a ,b.
> Mechanically generating enum values is easier if each value has a
> trailing comma.
>
> Similarly, when you split a text by '\n', a leading empty line is
> important, whereas you wouldn't expect a final '\n' to introduce an
> empty line.
>
> Now clearly there are cases in which leading or trailing empty items are
> both important. I'm just saying they are more rare. We could add an
> enumerated parameter to Splitter:
>
> enum PleaseFindAGoodName { terminator, separator }
>
> foreach (line; splitter(",a,b,", ","))
> ... terminator is implicit ...
> foreach (line; splitter(",a,b,", ",", PleaseFindAGoodName.separator))
> ... separator ...
>
> We might just go with the terminator semantics and ask people who need
> separator semantics to use a stripl() or a munch() prior to splitting.
> I'd personally prefer having an enum there.
>
>
> Andrei
More information about the Digitalmars-d
mailing list