VLERange: a range in between BidirectionalRange and RandomAccessRange
Steven Schveighoffer
schveiguy at yahoo.com
Fri Jan 14 06:34:55 PST 2011
On Fri, 14 Jan 2011 08:59:35 -0500, spir <denis.spir at gmail.com> wrote:
> On 01/14/2011 02:37 PM, Steven Schveighoffer wrote:
>>
>> * I don't even know how to make a grapheme that is more than one
>> code-unit, let alone more than one code-point :) Every time I try, I
>> get 'invalid utf sequence'.
>>
>> I feel significantly ignorant on this issue, and I'm slowly getting
>> enough knowledge to join the discussion, but being a dumb American who
>> only speaks English, I have a hard time grasping how this shit all
>> works.
>
> 1. See my text at
> https://bitbucket.org/denispir/denispir-d/src/c572ccaefa33/U%20missing%20level%20of%20abstraction
I can't read that document, it's black background with super-dark-grey
text.
> 2.
> writeln ("A\u0308\u0330");
> <A + tilde above + umlaut below> (or the opposite)
> If it does not display properly, either set your terminal to UTF* or use
> a more unicode-aware font (eg DejaVu series).
OK, I'll have to remember this so I can use it to test my string type ;)
> The point is not playing like that with Unicode flexibility. Rather that
> composite characters are just normal thingies in most languages of the
> world. Actually, on this point, english is a rare exception (discarding
> letters imported from foreign languages like french 'à'); to the point
> of beeing, I guess, the only western language without any diacritic.
Is it common to have multiple modifiers on a single character? The
problem I see with using decomposed canonical form for strings is that we
would have to return a dchar[] for each 'element', which severely
complicates code that, for instance, only expects to handle English.
I was hoping to lazily transform a string into its composed canonical
form, allowing the (hopefully rare) exception when a composed character
does not exist. My thinking was that this at least gives a useful string
representation for 90% of usages, leaving the remaining 10% of usages to
find a more complex representation (like your Text type). If we only get
like 20% or 30% there by making dchar the element type, then we haven't
made it useful enough.
Either way, we need a string type that can be compared canonically for
things like searches or opEquals.
-Steve
More information about the Digitalmars-d
mailing list