std.d.lexer requirements
Jonathan M Davis
jmdavisProg at gmx.com
Thu Aug 2 15:38:42 PDT 2012
On Thursday, August 02, 2012 15:14:17 Walter Bright wrote:
> Remember, its the consumer doing the decoding, not the input range.
But that's the problem. The consumer has to treat character ranges specially
to make this work. It's not generic. If it were generic, then it would simply
be using front, popFront, etc. It's going to have to special case strings to
do the buffering that you're suggesting. And if you have to special case
strings, then how is that any different from what we have now?
If you're arguing that strings should be treated as ranges of code units, then
pretty much _every_ range-based function will have to special case strings to
even work correctly - otherwise it'll be operating on individual code points
rather than code points (e.g. filtering code units rather than code points,
which would generate an invalid string). This makes the default behavior
incorrect, forcing _everyone_ to special case strings _everywhere_ if they
want correct behavior with ranges which are strings. And efficiency means
nothing if the result is wrong.
As it is now, the default behavior of strings with range-based functions is
correct but inefficient, so at least we get correct code. And if someone wants
their string processing to be efficient, then they special case strings and do
things like the buffering that you're suggesting. So, we have correct by
default with efficiency as an option. The alternative that you seem to be
suggesting (making strings be treated as ranges of code units) means that it
would be fast by default but correct as an option, which is completely
backwards IMHO. Efficiency is important, but it's pointless how efficient
something is if it's wrong, and expecting that your average programmer is
going to write unicode-aware code which functions correctly is completely
unrealistic.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list