D's Auto Decoding and You

Vladimir Panteleev via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Tue May 17 10:18:35 PDT 2016


On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
> http://jackstouffer.com/blog/d_auto_decoding_and_you.html

Thanks for writing this. Great article.

Some remarks:

>    static assert(is(typeof(s.front()) == dchar));

I believe .front is a property (so some ranges can implement it 
as a field, not a @property function). Hence, no parens.

> So, why is typeof(s.front) == dchar.

Question mark?

> In plain English, this means when iterating over strings in D, 
> D will look ahead in the string and combine any code units that 
> make up a single code point.

Perhaps clarify that this only applies to ranges. `foreach` on a 
string will iterate over chars, but you can iterate over code 
points if you specify the dchar type explicitly.

More confusing text on the same issue lower, and in the intro:

> Iterating a char array with C style for loops produces 
> different results than foreach loops due to auto decoding.

> One feature of D that is confusing to a lot of new comers is 
> the behavior of strings in relation to range based features 
> like the foreach statement and range algorithms.

---

> E.g. for ë the code units C3 AB (for UTF-8) would turn into a 
> single code point.

Perhaps choose a character that is not also expressable via 
composite characters, to avoid potential for confusion.

> string s = "cassé";

Ditto (unless the goal was to complement the example from my .d 
file below)

>  These glaring inconsistencies are the cause of a lot of 
> confusion for new comers.

(Opinion) I would say that they also cause issues in generic code.

> Every time one wants a generic algorithm to work with both 
> strings and ranges, you wind up special casing via static 
> if-ing narrow strings to defeat the auto decoding, or to decode 
> the ranges. Case in point.

Link to the exact SHA to prevent the link from getting outdated. 
On Github, just hit 'y' on your keyboard to go to the "permalink" 
version.

> Auto decoding has two choices when encountering invalid code 
> units: throw, or produce an error dchar like std.utf.byUTF does.

(Aside) This was an interesting discussion on the subject: 
https://issues.dlang.org/show_bug.cgi?id=14519

> However, in my opinion D is too far along to to suddenly ask 
> people

"to to"

---

Some more info / links on the subject I collected a few years ago:

http://wiki.dlang.org/Language_issues#Unicode_and_ranges



More information about the Digitalmars-d-announce mailing list