what's the correct way to handle unicode? - trying to print out graphemes here.
aliak
something at something.com
Wed Jul 4 15:12:17 UTC 2018
On Tuesday, 3 July 2018 at 14:43:37 UTC, Steven Schveighoffer
wrote:
> On 7/3/18 10:37 AM, ag0aep6g wrote:
>> On Tuesday, 3 July 2018 at 13:32:52 UTC, aliak wrote:
>>> foreach (c; "👩👩👦👦🏳️🌈") {
>>> writeln(c);
>>> }
>>>
>>> So basically the above just doesn't work. Prints gibberish.
>>
>> Because you're printing one UTF-8 code unit (`char`) per line.
>>
>>> So I figured, std.uni.byGrapheme would help, since that's
>>> what they are, but I can't get it to print them back out? Is
>>> there a way?
>>>
>>> foreach (c; "👩👩👦👦🏳️🌈".byGrapheme) {
>>> writeln(c.<????>);
>>> }
>>
>> You're looking for `c[]`. But that won't work, because std.uni
>> apparently doesn't recognize those as grapheme clusters. The
>> emojis may be too new. std.uni is based on Unicode version
>> 6.2, which is a couple years old.
>
> Oops! I didn't realize this, ignore my message about reporting
> a bug.
>
> I still think it's very odd for printing a grapheme to print
> the data structure.
>
> -Steve
Aha, ok I see. Many gracias!
Though, seems by a couple years old you mean 6 years! :) Is
updating unicode stuff to the latest a matter of some config file
somewhere with the code point configurations that result in
specific graphemes? Feels kinda ... quite bad that we're 6 years
behind the current standard.
Also, any reason (technical or otherwise) that we have to slice a
grapheme to get it printed? Or just no one implemented something
like toString or the like? It's quite non intuitive as it is
right now IMO. I can't really imagine anyone figuring out that
they have to slice a grapheme to get it to print 🤔
Cheers,
- Ali
More information about the Digitalmars-d-learn
mailing list