Converting Unicode Escape Sequences to UTF-8
Nordlöw via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Oct 22 12:13:18 PDT 2015
On Thursday, 22 October 2015 at 18:40:06 UTC, anonymous wrote:
> On Thursday, October 22, 2015 08:10 PM, Nordlöw wrote:
>
>> How do I convert a `string` containing Unicode escape
>> sequences such as "\uXXXX" into UTF-8?
>
> Ali explained that "\uXXXX" is already UTF-8.
>
> But if you actually want to interpret such escape sequences
> from user input or some such, then find all occurrences, and
> for each of them do:
Yep, that's exactly what I want to do.
I want to use this to correctly decode DBpedia downloads since it
encodes it Unicode characters with these sequences.
> * Drop the backslash and the 'u'.
> * Parse XXXX as a hexadecimal integer, and cast to dchar.
> * Use std.utf.encode to convert to UTF-8. std.conv.to can
> probably do it
> too, and possibly simpler, but would allocate.
>
> Also be aware of the longer variant with a capital U:
> \UXXXXXXXX (8 Xs)
Hmm, why isn't this already in Phobos?
More information about the Digitalmars-d-learn
mailing list