The Case Against Autodecode

tsbockman via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 14:51:51 PDT 2016


On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote:
> On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote:
>> 1) It does not say that level 2 should be opt-in; it says that 
>> level 2 should be toggle-able. Nowhere does it say which of 
>> level 1 and 2 should be the default.
>>
>> 2) It says that working with graphemes is slower than UTF-16 
>> code UNITS (level 1), but says nothing about streaming 
>> decoding of code POINTS (what we have).
>>
>> 3) That document is from 2000, and its claims about 
>> performance are surely extremely out-dated, anyway. Computers 
>> and the Unicode standard have both changed much since then.
>
> 1) Right because a special toggleable syntax is definitely not 
> "opt-in".

It is not "opt-in" unless it is toggled off by default. The only 
reason it doesn't talk about toggling in the level 1 section, is 
because that section is written with the assumption that many 
programs will *only* support level 1.

> 2) Several people in this thread noted that working on 
> graphemes is way slower (which makes sense, because its yet 
> another processing you need to do after you decoded - therefore 
> more work - therefore slower) than working on code points.

And working on code points is way slower than working on code 
units (the actual level 1).

> 3) Not an argument - doing more work makes code slower.

What do you think I'm arguing for? It's not graphemes-by-default.

What I actually want to see: permanently deprecate the 
auto-decoding range primitives. Force the user to explicitly 
specify whichever of `by!dchar`, `byCodePoint`, or `byGrapheme` 
their specific algorithm actually needs. Removing the implicit 
conversions between `char`, `wchar`, and `dchar` would also be 
nice, but isn't really necessary I think.

That would be a standards-compliant solution (one of several 
possible). What we have now is non-standard, at least going by 
the old version Walter linked.


More information about the Digitalmars-d mailing list