what's the correct way to handle unicode? - trying to print out graphemes here.

aliak something at something.com
Wed Jul 4 15:12:17 UTC 2018


On Tuesday, 3 July 2018 at 14:43:37 UTC, Steven Schveighoffer 
wrote:
> On 7/3/18 10:37 AM, ag0aep6g wrote:
>> On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
>>> foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈") {
>>>   writeln(c);
>>> }
>>>
>>> So basically the above just doesn't work. Prints gibberish.
>> 
>> Because you're printing one UTF-8 code unit (`char`) per line.
>> 
>>> So I figured, std.uni.byGrapheme would help, since that's 
>>> what they are, but I can't get it to print them back out? Is 
>>> there a way?
>>>
>>> foreach (c; "👩‍👩‍👦‍👦🏳️‍🌈".byGrapheme) {
>>>   writeln(c.<????>);
>>> }
>> 
>> You're looking for `c[]`. But that won't work, because std.uni 
>> apparently doesn't recognize those as grapheme clusters. The 
>> emojis may be too new. std.uni is based on Unicode version 
>> 6.2, which is a couple years old.
>
> Oops! I didn't realize this, ignore my message about reporting 
> a bug.
>
> I still think it's very odd for printing a grapheme to print 
> the data structure.
>
> -Steve


Aha, ok I see. Many gracias!

Though, seems by a couple years old you mean 6 years! :) Is 
updating unicode stuff to the latest a matter of some config file 
somewhere with the code point configurations that result in 
specific graphemes? Feels kinda ... quite bad that we're 6 years 
behind the current standard.

Also, any reason (technical or otherwise) that we have to slice a 
grapheme to get it printed? Or just no one implemented something 
like toString or the like? It's quite non intuitive as it is 
right now IMO. I can't really imagine anyone figuring out that 
they have to slice a grapheme to get it to print 🤔

Cheers,
- Ali


More information about the Digitalmars-d-learn mailing list