std.d.lexer requirements

Jonathan M Davis jmdavisProg at gmx.com
Thu Aug 2 15:38:42 PDT 2012


On Thursday, August 02, 2012 15:14:17 Walter Bright wrote:
> Remember, its the consumer doing the decoding, not the input range.

But that's the problem. The consumer has to treat character ranges specially 
to make this work. It's not generic. If it were generic, then it would simply 
be using front, popFront, etc. It's going to have to special case strings to 
do the buffering that you're suggesting. And if you have to special case 
strings, then how is that any different from what we have now?

If you're arguing that strings should be treated as ranges of code units, then 
pretty much _every_ range-based function will have to special case strings to 
even work correctly - otherwise it'll be operating on individual code points 
rather than code points (e.g. filtering code units rather than code points, 
which would generate an invalid string). This makes the default behavior 
incorrect, forcing _everyone_ to special case strings _everywhere_ if they 
want correct behavior with ranges which are strings. And efficiency means 
nothing if the result is wrong.

As it is now, the default behavior of strings with range-based functions is 
correct but inefficient, so at least we get correct code. And if someone wants 
their string processing to be efficient, then they special case strings and do 
things like the buffering that you're suggesting. So, we have correct by 
default with efficiency as an option. The alternative that you seem to be 
suggesting (making strings be treated as ranges of code units) means that it 
would be fast by default but correct as an option, which is completely 
backwards IMHO. Efficiency is important, but it's pointless how efficient 
something is if it's wrong, and expecting that your average programmer is 
going to write unicode-aware code which functions correctly is completely 
unrealistic.

- Jonathan M Davis


More information about the Digitalmars-d mailing list