Jonathan M Davis
jmdavisProg at gmx.com
Wed Jun 27 14:28:58 PDT 2012
On Wednesday, June 27, 2012 17:11:56 Steven Schveighoffer wrote:
> On Wed, 27 Jun 2012 16:55:49 -0400, Timon Gehr <timon.gehr at gmx.ch> wrote:
> > On 06/27/2012 10:22 PM, Steven Schveighoffer wrote:
> >> On Wed, 27 Jun 2012 15:20:26 -0400, Timon Gehr <timon.gehr at gmx.ch>
> >> wrote:
> >>> There is no reason for anyone to be confused about this endlessly. It
> >>> is simple to understand. Furthermore, think about the implications of a
> >>> library-defined string type: it just introduces the problem of what the
> >>> type of built-in string literals should be. This would cause endless
> >>> pain with type deduction, ifti, string mixins, ... A library-defined
> >>> string type cannot be a full string type. Pretending that it can has no
> >>> value.
> >> Default type of the literal should be the library type.
> > Then it is not a library type, but a built-in type. Are you planning to
> > inject a dependency on Phobos into the compiler?
> No, druntime, and include minimal utf support. We do the same thing with
> >> If you want immutable(char), use "abc".codeunits or equivalent.
> > I really don't want to type .codeunits, but I want to use
> > immutable(char) everywhere. This 'library type' is just an interface
> > change that makes writing nice and efficient code a kludge.
> When most string functions take strings, why would you want to use
> immutable(char) everywhere?
> >> Of course, it should by default work as a zero-terminated char * for C
> >> compatibility.
> >> The current situation is not simple to understand.
> > It is simple, even if not immediately obvious. It does not have to be
> > immediately obvious without explanation. It needs to be convenient.
> Try sorting an array of ascii characters.
Cast it to ubyte. Problem solved. I honestly don't think that operating on
code units like that should be encourage at all, so if it's a bit hard to do,
then that's a _good_ thing (but since all that's required is casting to
ubyte, it's still quite easy - you just have to tell the compiler that
that's what you really want to do rather than it being the default behavior).
The problem that we have is the inconsistencies between how the language
treats strings and how the library does, not the fact that operating on char
as if it were ASCII rather than UTF-8 requires some casting.
> >> Generic code that accepts arrays has to special-case narrow-width
> >> strings if you plan to
> >> use phobos with them in some cases. That is a horrible situation.
> > Generic code accepts ranges, not arrays. All necessary (or maybe
> > unnecessary, I don't know) special casing is already done for you in
> > Phobos. The _only_ thing that is problematic is the inconsistent
> > 'foreach' behaviour.
> Plenty of generic code specializes on arrays.
You're stuck doing that regardless of how strings are represented. You have to
operate on them as ranges of code points (or even graphemes) if you want
correct string processing, but that's inefficient, so anything caring about
efficiency which can gain extra efficiency by coding with knowledge of how unicode
works and operate on the code units will need to special case. Whether string
is an array or a struct has zero effect on that. All that it affects is what
operates on it as an array of code units vs a range of code points.
> >>> alias immutable(char) string is just fine.
> >> That is technically fine, but if phobos wants to treat immutable(char)
> >> as something other than an array, it is not fine.
> >> -Steve
> > Phobos does not treat immutable(char) as something other than an
> > array. It does not treat all arrays uniformly though.
> It certainly does. An array by definition is a random-access range. It
> does not treat strings as random access ranges.
Well, now you're getting into a semantics argument. isRandomAccessRange defines
what a random access range is. All arrays which aren't narrow strings qualify.
Narrow strings do not. Yes, they do have random-access operations, but they
aren't random-access ranges, because they're ranges of code points, not code
Yes, this makes it so that character arrays are treated inconsistently from
other arrays, but the library is very consistent in how it handles them,
because it _never_ deals with strings as being made of code units. If it's
operating on them as arrays, then it takes unicode into account, and if it's
operating on them as ranges, it treats them as ranges of code points. It
_always_ makes sure that it's operating on code points. Plenty of code
specializes on strings so that it can deal with the code units in an efficient
manner rather than having to decode them all the time, but Phobos is
completely consistent with regards to how it treats strings. The _only_
inconsintencies are between the language and the library - namely how foreach
iterates on code units by default and the fact that while the language defines
length, slicing, and random-access operations for strings, the library
effectively does not consider strings to have them.
- Jonathan M Davis
More information about the Digitalmars-d