Proposal for fixing dchar ranges

Mon Mar 10 13:58:01 PDT 2014

On Monday, 10 March 2014 at 19:48:34 UTC, H. S. Teoh wrote:
> On Mon, Mar 10, 2014 at 07:49:04PM +0100, Johannes Pfau wrote:
>> Am Mon, 10 Mar 2014 11:30:07 -0700
>> schrieb Walter Bright <newshound2 at digitalmars.com>:
>> 
>> > On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
>> > > An idea to fix the whole problems I see with char[] being 
>> > > treated
>> > > specially by phobos: introduce an actual string type, with 
>> > > char[]
>> > > as backing, that is a dchar range, that actually dictates 
>> > > the
>> > > rules we want. Then, make the compiler use this type for 
>> > > literals.
>> > 
>> > Proposals to make a string class for D have come up many 
>> > times. I
>> > have a kneejerk dislike for it. It's a really strong feature 
>> > for D
>> > to have strings be an array type, and I'll go to great 
>> > lengths to
>> > keep it that way.
>
> I'm on the fence about this one. The nice thing about strings 
> being an
> array type, is that it is a familiar concept to C coders, and 
> it allows
> array slicing for extracting substrings, etc., which fits 
> nicely with
> the C view of strings as character arrays. As a C coder myself, 
> I like
> it this way too. But the bad thing about strings being an array 
> type, is
> that it's a holdover from C, and it allows slicing for 
> extracting
> substrings -- malformed substrings by permitting slicing a 
> multibyte
> (multiword) character.
>
> Basically, the nice aspects of strings being arrays only apply 
> when
> you're dealing with ASCII (or mostly-ASCII) strings. These very 
> same
> "nice" aspects turn into problems when dealing with anything 
> non-ASCII.
> The only way the user can get it right using only array 
> operations, is
> if they understand the whole of Unicode in their head and are 
> willing to
> reinvent Unicode algorithms every time they slice a string or 
> do some
> operation on it. Since D purportedly supports Unicode by 
> default, it
> shouldn't be this way. D should *actually* support Unicode all 
> the way
> -- use proper Unicode algorithms for substring extraction, 
> collation,
> line-breaking, normalization, etc.. Being a systems language, 
> of course,
> means that D should allow you to get under the hood and do 
> things
> directly with the raw string representation -- but this 
> shouldn't be the
> *default* modus operandi.  The default should be a 
> properly-encapsulated
> string type with Unicode algorithms to operate on it (with the 
> option of
> reaching into the raw representation where necessary).
>
>

You started off on the fence, but you seem pretty convinced by 
the end!