Converting Unicode Escape Sequences to UTF-8

Nordlöw via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Oct 22 12:13:18 PDT 2015


On Thursday, 22 October 2015 at 18:40:06 UTC, anonymous wrote:
> On Thursday, October 22, 2015 08:10 PM, Nordlöw wrote:
>
>> How do I convert a `string` containing Unicode escape 
>> sequences such as "\uXXXX" into UTF-8?
>
> Ali explained that "\uXXXX" is already UTF-8.
>
> But if you actually want to interpret such escape sequences 
> from user input or some such, then find all occurrences, and 
> for each of them do:

Yep, that's exactly what I want to do.

I want to use this to correctly decode DBpedia downloads since it 
encodes it Unicode characters with these sequences.

> * Drop the backslash and the 'u'.
> * Parse XXXX as a hexadecimal integer, and cast to dchar.
> * Use std.utf.encode to convert to UTF-8. std.conv.to can 
> probably do it
> too, and possibly simpler, but would allocate.
>
> Also be aware of the longer variant with a capital U: 
> \UXXXXXXXX (8 Xs)

Hmm, why isn't this already in Phobos?


More information about the Digitalmars-d-learn mailing list