VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin michel.fortin at michelf.com
Mon Jan 17 14:54:04 PST 2011


On 2011-01-17 15:49:26 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail at erdani.org> said:

> On 1/17/11 2:29 PM, Michel Fortin wrote:
>> The problem I see currently is that you rely on dchar being the element
>> type. That should be an implementation detail, not something client code
>> can see or rely on.
> 
> But at some point you must be able to talk about individual characters 
> in a text. It can't be something that client code doesn't see!!!

It seems that it can. NSString only exposes individual UTF-16 code 
units directly (or semi-directly via an accessor method), even though 
searching and comparing is grapheme-aware. I'm not saying it's a good 
design, but it certainly can work in practice.

In any case, I didn't mean to say the client code should't be aware of 
the characters in a string. I meant that the client shouldn't assume 
the algorithm works at the same layer as ElementType!(string) for a 
given string type. Even if ElementType!(string) is dchar, the default 
function you get if you don't use any of toCodeUnit, toDchar, or 
toGrapheme can work at the dchar or grapheme level if it makes more 
sense that way.

In other words, the client says: "I have two strings, compare them!" 
The client didn't specify if they should be compared by char, wchar, 
dchar, or by normalized grapheme; so we do what's sensible. That's what 
I call the 'default' string functions, those you get when you don't ask 
for anything specific. They should have a signature making them able to 
work at the grapheme level, even though they might not for practical 
reasons (performance). This way if it becomes more important or 
practical to support graphemes, it's easy to evolve to them.


> SuperDuperText txt;
> auto c = giveMeTheFirstCharacter(txt);
> 
> What is the type of c? That is visible to the client!

That depends on how you implement the giveMeTheFirstCharacter function. :-)

More seriously, you have four choice:

1. code unit
2. code point
3. grapheme
4. require the client to state explicitly which kind of 'character' he 
wants; 'character' being an overloaded word, it's reasonable to ask for 
disambiguation.

You and Walter can't come to understand each other between 1 and 2, 
regarding foreach and ranges. To keep things consistent with what I 
said above I'd tend to say 4, but that's weird for something that looks 
like an array. My second choice goes for 1 when it comes to 
consistency, and 3 when it comes to correctness, and 2 when it comes to 
being practical.

Given something is going to be inconsistent either way, I'd say any of 
the above is acceptable. But please make sure you and Walter agree on 
the default element type for ranges and foreach.


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



More information about the Digitalmars-d mailing list