VLERange: a range in between BidirectionalRange and RandomAccessRange

Steven Schveighoffer schveiguy at yahoo.com
Sat Jan 15 08:45:11 PST 2011


On Fri, 14 Jan 2011 15:54:19 -0500, Gerrit Wichert <gwichert at yahoo.com>  
wrote:

> Am 14.01.2011 15:34, schrieb Steven Schveighoffer:
>>
>> Is it common to have multiple modifiers on a single character?  The
>> problem I see with using decomposed canonical form for strings is that
>> we would have to return a dchar[] for each 'element', which severely
>> complicates code that, for instance, only expects to handle English.
>>
>> I was hoping to lazily transform a string into its composed canonical
>> form, allowing the (hopefully rare) exception when a composed
>> character does not exist.  My thinking was that this at least gives a
>> useful string representation for 90% of usages, leaving the remaining
>> 10% of usages to find a more complex representation (like your Text
>> type).  If we only get like 20% or 30% there by making dchar the
>> element type, then we haven't made it useful enough.
>>
> I'm afraid that this is not a proper way to handle this problem. It may
> be better for a language not to 'translate' by default.
> If the user wants to convert the codepoints this can be requested on
> demand. But pemature default conversion is a subltle way to lose
> information that may be important.
> Imagine we want to write a tool for dealing with the in/output of some
> other ignorant legacy software. Even if it is only text files, that
> software may choke on some converted input. So i belive that it is very
> importent that we are able to reproduce strings in exact that form in
> which we have read them in.

Actually, this would only lazily *and temporarily* convert the string per  
grapheme.  Essentially, the original is left alone, so no harm there.

-Steve.


More information about the Digitalmars-d mailing list