The Case Against Autodecode

Daniel Kozak via Digitalmars-d digitalmars-d at puremagic.com
Thu May 12 16:23:01 PDT 2016


On Thursday, 12 May 2016 at 20:15:45 UTC, Walter Bright wrote:
> On 5/12/2016 9:29 AM, Andrei Alexandrescu wrote:
> > I am as unclear about the problems of autodecoding as I am
> about the necessity
> > to remove curl. Whenever I ask I hear some arguments that
> work well emotionally
> > but are scant on reason and engineering. Maybe it's time to
> rehash them? I just
> > did so about curl, no solid argument seemed to come together.
> I'd be curious of
> > a crisp list of grievances about autodecoding. -- Andrei
>
> Here are some that are not matters of opinion.
>
> 1. Ranges of characters do not autodecode, but arrays of 
> characters do. This is a glaring inconsistency.
>
> 2. Every time one wants an algorithm to work with both strings 
> and ranges, you wind up special casing the strings to defeat 
> the autodecoding, or to decode the ranges. Having to constantly 
> special case it makes for more special cases when plugging 
> together components. These issues often escape detection when 
> unittesting because it is convenient to unittest only with 
> arrays.
>
> 3. Wrapping an array in a struct with an alias this to an array 
> turns off autodecoding, another special case.
>
> 4. Autodecoding is slow and has no place in high speed string 
> processing.
>
> 5. Very few algorithms require decoding.
>
> 6. Autodecoding has two choices when encountering invalid code 
> units - throw or produce an error dchar. Currently, it throws, 
> meaning no algorithms using autodecode can be made nothrow.
>
> 7. Autodecode cannot be used with unicode path/filenames, 
> because it is legal (at least on Linux) to have invalid UTF-8 
> as filenames. It turns out in the wild that pure Unicode is not 
> universal - there's lots of dirty Unicode that should remain 
> unmolested, and autocode does not play with that.
>
> 8. In my work with UTF-8 streams, dealing with autodecode has 
> caused me considerably extra work every time. A convenient 
> timesaver it ain't.
>
> 9. Autodecode cannot be turned off, i.e. it isn't practical to 
> avoid importing std.array one way or another, and then 
> autodecode is there.
>
> 10. Autodecoded arrays cannot be RandomAccessRanges, losing a 
> key benefit of being arrays in the first place.
>
> 11. Indexing an array produces different results than 
> autodecoding, another glaring special case.

For me it is not about autodecoding. I would like to have 
something like String type which do that. But what I am really 
piss of is that current string type is alias to immutable(char)[] 
(so it is not usable at all). This is really problem for me. 
Because this make working on array of chars almost impossible.

Even char[] is unusable. So I am force to used ubyte[], but this 
is really not an array of chars.

ATM D does not support even full Unicode strings and even basic 
array of chars :(.

I hope this will be fixed one day. So I could start to expand D 
in Czech, until than I am unable to do that.


More information about the Digitalmars-d mailing list