Proposal for fixing dchar ranges
John Colvin
john.loughran.colvin at gmail.com
Mon Mar 10 13:58:01 PDT 2014
On Monday, 10 March 2014 at 19:48:34 UTC, H. S. Teoh wrote:
> On Mon, Mar 10, 2014 at 07:49:04PM +0100, Johannes Pfau wrote:
>> Am Mon, 10 Mar 2014 11:30:07 -0700
>> schrieb Walter Bright <newshound2 at digitalmars.com>:
>>
>> > On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
>> > > An idea to fix the whole problems I see with char[] being
>> > > treated
>> > > specially by phobos: introduce an actual string type, with
>> > > char[]
>> > > as backing, that is a dchar range, that actually dictates
>> > > the
>> > > rules we want. Then, make the compiler use this type for
>> > > literals.
>> >
>> > Proposals to make a string class for D have come up many
>> > times. I
>> > have a kneejerk dislike for it. It's a really strong feature
>> > for D
>> > to have strings be an array type, and I'll go to great
>> > lengths to
>> > keep it that way.
>
> I'm on the fence about this one. The nice thing about strings
> being an
> array type, is that it is a familiar concept to C coders, and
> it allows
> array slicing for extracting substrings, etc., which fits
> nicely with
> the C view of strings as character arrays. As a C coder myself,
> I like
> it this way too. But the bad thing about strings being an array
> type, is
> that it's a holdover from C, and it allows slicing for
> extracting
> substrings -- malformed substrings by permitting slicing a
> multibyte
> (multiword) character.
>
> Basically, the nice aspects of strings being arrays only apply
> when
> you're dealing with ASCII (or mostly-ASCII) strings. These very
> same
> "nice" aspects turn into problems when dealing with anything
> non-ASCII.
> The only way the user can get it right using only array
> operations, is
> if they understand the whole of Unicode in their head and are
> willing to
> reinvent Unicode algorithms every time they slice a string or
> do some
> operation on it. Since D purportedly supports Unicode by
> default, it
> shouldn't be this way. D should *actually* support Unicode all
> the way
> -- use proper Unicode algorithms for substring extraction,
> collation,
> line-breaking, normalization, etc.. Being a systems language,
> of course,
> means that D should allow you to get under the hood and do
> things
> directly with the raw string representation -- but this
> shouldn't be the
> *default* modus operandi. The default should be a
> properly-encapsulated
> string type with Unicode algorithms to operate on it (with the
> option of
> reaching into the raw representation where necessary).
>
>
You started off on the fence, but you seem pretty convinced by
the end!
More information about the Digitalmars-d
mailing list