Ranges

spir denis.spir at gmail.com
Fri Mar 18 03:32:35 PDT 2011


On 03/18/2011 10:29 AM, Peter Alexander wrote:
> On 13/03/11 12:05 AM, Jonathan M Davis wrote:
>> So, when you're using a range of char[] or wchar[], you're really using a range
>> of dchar. These ranges are bi-directional. They can't be sliced, and they can't
>> be indexed (since doing so would likely be invalid). This generally works very
>> well. It's exactly what you want in most cases. The problem is that that means
>> that the range that you're iterating over is effectively of a different type
>> than
>> the original char[] or wchar[].
>
> This has to be the worst language design decision /ever/.
>
> You can't just mess around with fundamental principles like "the first element
> in an array of T has type T" for the sake of a minor convenience. How are we
> supposed to do generic programming if common sense reasoning about types
> doesn't hold?
>
> This is just std::vector<bool> from C++ all over again. Can we not learn from
> mistakes of the past?

I partially agree, but. Compare with a simple ascii text: you could iterate 
over it chars (=codes=bytes), words, lines... Or according to specific schemes 
for your app (eg reverse order, every number in it, every word at start of 
line...). A piece of is not only a stream of codes.

The problem is there is no good decision, in the case of char[] or wchar[]. We 
should have to choose a kind of "natural" sense of what it means to iterate 
over a text, but there no such thing. What does it *mean*? What is the natural 
unit of a text?
Bytes or words are code units which mean nothing. Code units (<-> dchars) are 
not guaranteed to mean anything neither (as shown by past discussion: a code 
unit may be the base 'a', the following one be the composite '^', both in "â"). 
Code unit do not represent "characters" in the common sense. So, it is very 
clear that implicitely iterating over dchars is a wrong choice. But what else?
I would rather get rid of wchar and dchar and deal with plain stream of bytes 
supposed to represent utf8. Until we get a good solution to operate at the 
level of "human" characters.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d-learn mailing list