The Case Against Autodecode

Tue May 31 14:11:17 PDT 2016

On Tue, May 31, 2016 at 05:01:17PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/31/2016 04:01 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Wasn't the whole point of operating at the code point level by
> > default to make it so that code would be operating on full
> > characters by default instead of chopping them up as is so easy to
> > do when operating at the code unit level?
> 
> The point is to operate on representation-independent entities
> (Unicode code points) instead of low-level representation-specific
> artifacts (code units).

This is basically saying that we operate on dchar[] by default, except
that we disguise its detrimental memory usage consequences by
compressing to UTF-8/UTF-16 and incurring the cost of decompression
every time we access its elements.  Perhaps you love the idea of running
an OS that stores all files in compressed form and always decompresses
upon every syscall to read(), but I prefer a higher-performance system.

> That's the contract, and it seems meaningful
> seeing how Unicode is defined in terms of code points as its abstract
> building block.

Where's this contract stated, and when did we sign up for this?

> If user code needs to go lower at the code unit level, they can do so.
> If user code needs to go upper at the grapheme level, they can do so.

Only with much pain by using workarounds to bypass meticulously-crafted
autodecoding algorithms in Phobos.

> If anything this thread strengthens my opinion that autodecoding is a
> sweet spot. -- Andrei

No, autodecoding is a stalemate that's neither fast nor correct.

T

-- 
"Real programmers can write assembly code in any language. :-)" -- Larry Wall