VLERange: a range in between BidirectionalRange and RandomAccessRange
Michel Fortin
michel.fortin at michelf.com
Mon Jan 17 14:54:04 PST 2011
On 2011-01-17 15:49:26 -0500, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> said:
> On 1/17/11 2:29 PM, Michel Fortin wrote:
>> The problem I see currently is that you rely on dchar being the element
>> type. That should be an implementation detail, not something client code
>> can see or rely on.
>
> But at some point you must be able to talk about individual characters
> in a text. It can't be something that client code doesn't see!!!
It seems that it can. NSString only exposes individual UTF-16 code
units directly (or semi-directly via an accessor method), even though
searching and comparing is grapheme-aware. I'm not saying it's a good
design, but it certainly can work in practice.
In any case, I didn't mean to say the client code should't be aware of
the characters in a string. I meant that the client shouldn't assume
the algorithm works at the same layer as ElementType!(string) for a
given string type. Even if ElementType!(string) is dchar, the default
function you get if you don't use any of toCodeUnit, toDchar, or
toGrapheme can work at the dchar or grapheme level if it makes more
sense that way.
In other words, the client says: "I have two strings, compare them!"
The client didn't specify if they should be compared by char, wchar,
dchar, or by normalized grapheme; so we do what's sensible. That's what
I call the 'default' string functions, those you get when you don't ask
for anything specific. They should have a signature making them able to
work at the grapheme level, even though they might not for practical
reasons (performance). This way if it becomes more important or
practical to support graphemes, it's easy to evolve to them.
> SuperDuperText txt;
> auto c = giveMeTheFirstCharacter(txt);
>
> What is the type of c? That is visible to the client!
That depends on how you implement the giveMeTheFirstCharacter function. :-)
More seriously, you have four choice:
1. code unit
2. code point
3. grapheme
4. require the client to state explicitly which kind of 'character' he
wants; 'character' being an overloaded word, it's reasonable to ask for
disambiguation.
You and Walter can't come to understand each other between 1 and 2,
regarding foreach and ranges. To keep things consistent with what I
said above I'd tend to say 4, but that's weird for something that looks
like an array. My second choice goes for 1 when it comes to
consistency, and 3 when it comes to correctness, and 2 when it comes to
being practical.
Given something is going to be inconsistent either way, I'd say any of
the above is acceptable. But please make sure you and Walter agree on
the default element type for ranges and foreach.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list