Unicode handling comparison

Wyatt wyatt.epp at gmail.com
Wed Nov 27 08:18:32 PST 2013


On Wednesday, 27 November 2013 at 15:43:11 UTC, Jakob Ovrum wrote:
>
> The author also doesn't seem to understand the Unicode 
> definitions of character and grapheme, which is a shame, 
> because the difference is more or less the whole point of the 
> post.
>
I agree with the assertion that people SHOULD know how unicode 
works if they want to work with it, but the way our docs are now 
is off-putting enough that most probably won't learn anything.  
If they know, they know; if they don't, the wall of jargon is 
intimidating and hard to grasp (more examples up front of more 
things that you'd actually use std.uni for).  Even though I'm 
decently familiar with Unicode, I was having trouble following 
all that (e.g. Isn't "noe\u0308l" a grapheme cluster according to 
std.uni?).  On the flip side, std.utf has a serious dearth of 
examples and the relationship between the two isn't clear.

> On that note, I tried to use std.uni to write a simple example 
> of how to correctly handle this in D, but it became apparent 
> that std.uni should expose something like `byGrapheme` which 
> lazily transforms a range of code points to a range of 
> graphemes (probably needs a `byCodePoint` to do the converse 
> too). The two extant grapheme functions, `decodeGrapheme` and 
> `graphemeStride`, are *awful* for string manipulation (granted, 
> they are probably perfect for text rendering).

Yes, please.  While operations on single codepoints and 
characters seem pretty robust (i.e. you can do lots of things 
with and to them), it feels like it just falls apart when you try 
to work with strings.  It honestly surprised me how many things 
in std.uni don't seem to work on ranges.

-Wyatt


More information about the Digitalmars-d mailing list