The Case Against Autodecode

Mon May 30 12:52:19 PDT 2016

On Mon, May 30, 2016 at 03:28:38PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/30/2016 03:04 PM, Timon Gehr wrote:
> > On 30.05.2016 18:01, Andrei Alexandrescu wrote:
> > > On 05/28/2016 03:04 PM, Walter Bright wrote:
> > > > On 5/28/2016 5:04 AM, Andrei Alexandrescu wrote:
> > > > > So it harkens back to the original mistake: strings should NOT
> > > > > be arrays with the respective primitives.
> > > > 
> > > > An array of code units provides consistency, predictability,
> > > > flexibility, and performance. It's a solid base upon which the
> > > > programmer can build what he needs as required.
> > > 
> > > Nope. Not buying it.
> > 
> > I'm buying it. IMO alias string=immutable(char)[] is the most useful
> > choice, and auto-decoding ideally wouldn't exist.
> 
> Wouldn't D then be seen (and rightfully so) as largely not supporting
> Unicode, seeing as its many many core generic algorithms seem to
> randomly work or not on arrays of characters?

They already randomly work or not work on ranges of dchar. I hope we
don't have to rehash all the examples of why things that seem to work,
like count, filter, map, etc., actually *don't* work outside of a very
narrow set of languages. The best of all this is that they *both* don't
work properly *and* make your program pay for the performance overhead,
even when you're not even using them -- thanks to ubiquitous
autodecoding.

> Wouldn't ranges - the most important artifact of D's stdlib - default
> for strings on the least meaningful approach to strings (dumb code
> units)?

No, ideally there should *not* be a default range type -- the user needs
to specify what he wants to iterate by, whether code unit, code point,
or grapheme, etc..

> Would a smattering of Unicode primitives in std.utf and friends
> entitle us to claim D had dyed Unicode in its wool? (All are not
> rhetorical.)

I have no idea what this means.

> I.e. wouldn't be in a worse place than now? (This is rhetorical.) The
> best argument for autodecoding is to contemplate where we'd be without
> it: the ghetto of Unicode string handling.

I've no idea what you're talking about. Without autodecoding we'd
actually have faster string handling, and forcing the user to specify
the unit of iteration would actually bring more Unicode-awareness which
would improve the quality of string handling code, instead of
proliferating today's wrong code that just happens to work in some
languages but make a hash of things everywhere else.

> I'm not going to debate this further (though I'll look for meaningful
> answers to the questions above). But this thread has been informative
> in that it did little to change my conviction that autodecoding is a
> good thing for D, all things considered (i.e. the wrong decision to
> not encapsulate string as a separate type distinct from bare array of
> code units). I'd lie if I said it did nothing. It did, but only a
> little.
> 
> Funny thing is that's not even what's important. What's important is
> that autodecoding is here to stay - there's no realistic way to
> eliminate it from D. So the focus should be making autodecoding the
> best it could ever be.
[...]

If I ever had to write string-heavy code, I'd probably fork Phobos just
so I can get decent performance. Just sayin'.

T

-- 
People walk. Computers run.