standard ranges

Thu Jun 28 06:05:44 PDT 2012

On 06/28/2012 11:28 AM, Christophe Travert wrote:
> Jonathan M Davis , dans le message (digitalmars.D:170872), a écrit :
>> On Thursday, June 28, 2012 08:05:19 Christophe Travert wrote:
>>> "Jonathan M Davis" , dans le message (digitalmars.D:170852), a écrit :
>>>> completely consistent with regards to how it treats strings. The _only_
>>>> inconsintencies are between the language and the library - namely how
>>>> foreach iterates on code units by default and the fact that while the
>>>> language defines length, slicing, and random-access operations for
>>>> strings, the library effectively does not consider strings to have them.
>>
>>> char[] is not treated as an array by the library
>>
>> Phobos _does_ treat char[] as an array. isDynamicArray!(char[]) is true, and
>> char[] works with the functions in std.array. It's just that they're all
>> special-cased appropriately to handle narrow strings properly. What it doesn't
>> do is treat char[] as a range of char.
>>
>>> and is not treated as a RandomAccessRange.
>
> All arrays are treated as RandomAccessRanges, except for char[] and
> wchar[]. So I think I am entitled to say that strings are not treated as
> arrays.

"Not treated like other arrays", is the strongest claim that can be
made there.

> An I would say I am also entitle to say strings are not normal
> ranges, since they define length, but have isLength as true,

hasLength as false. They define length, but it is not part of the range
interface.

It is analogous to the following:

class charArray : ForwardRange!dchar{
     /* interface ForwardRange!dchar */
     dchar front();
     bool empty();
     void popFront();
     NarrowString save();

     /* other methods */
     size_t length();
     char opIndex(size_t i);
     String opSlice(size_t a, size_t b);
}

> and define opIndex and opSlice,

[] and [..] operate on code units, but for a random access range as
defined by Phobos, they would not.

> but are not RandomAccessRanges.
>
> The fact that isDynamicArray!(char[]) is true, but
> isRandomAccessRange is not is just another aspect of the schizophrenia.
> The behavior of a templated function on a string will depend on which
> was used as a guard.
>

No, it won't.

>>
>> Which is what I already said.
>>
>>> That is a second inconsistency, and it would be avoided is string were a
>> struct.
>>
>> No, it wouldn't. It is _impossible_ to implement length, slicing, and indexing
>> for UTF-8 and UTF-16 strings in O(1). Whether you're using an array or a
>> struct to represent them is irrelevant. And if you can't do those operations
>> in O(1), then they can't be random access ranges.
>
> I never said strings should support length and slicing. I even said
> they should not. foreach is inconsistent with the way strings are
> treated in phobos, but opIndex, opSlice and length, are inconsistent to.
> string[0] and string.front do not even return the same....
>
> Please read my post a little bit more carefully before
> answering them.
>

This is very impolite.

On Thursday, June 28, 2012 08:05:19 Christophe Travert wrote:
> Slicing is provided for convenience, but not as opSlice, since it is not O(1), but
> as a method with a separate name.

> About the rest of your post, I basically say the same as you in shorter
> terms, except that I am in favor of changing things (but I didn't even
> said they should be changed in my conclusion).
>

When read carefully, the conclusion says that code compatibility is
important only a couple sentences before it says that breaking code for
the fun of it may be a good thing.

> newcomers are troubled by this problem,  and I think it is important.

Newcomers sometimes become seasoned D programmers. Sometimes they know
what Unicode is about even before that.

> They will make mistakes when using both array and range functions on
> strings in the same algorithm, or when using array functions without
> knowing about utf8 encoding issues (the fact that array functions are
> also valid range functions if not for strings does not help). But I also
> think experienced programmers can be affected, because of inattention,
> reusing codes written by inexperienced programmers, or inappropriate
> template guards usage.

In the ASCII-7 subset, UTF-8 strings are actually random access, and
iterating an UTF-8 string by code point is safe if you are eg. just
going to treat some ASCII characters specially.

I don't care much whether or not (bad?) code handles Unicode correctly,
but it is important that code correctly documents whether or not it
does so, and to what extent it does. The new std.regex has good Unicode
support, and to enable that, it had to add some pretty large tables to
Phobos, the functionality of which is not exposed to the library user
as of now. It is therefore safe to say that many/most existing D
programs do not handle the whole Unicode standard correctly.

Unicode has to be _actively_ supported. There are distinct issues that
are hard to abstract away efficiently. Treating an Unicode string as a
range of code points is not solving them. (dchar[] indexing is still
not guaranteed to give back the 'i'th character!) Why build this
interpretation into the language?

>
> As a more general comment, I think having a consistent langage is a very
> important goal to achieve when designing a langage. It makes everything
> simpler, from langage design to user through compiler and library
> development. It may not be too late for D.
>

The language is consistent here. The library treats some language
features specially. It is not the language that is "confusing". The
whole reason to introduce the library behaviour is probably based on
similar reasoning as given in your post. The special casing has not
caused me any trouble, and sometimes it was useful.