Converting Unicode Escape Sequences to UTF-8

anonymous via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Oct 22 11:40:05 PDT 2015


On Thursday, October 22, 2015 08:10 PM, Nordlöw wrote:

> How do I convert a `string` containing Unicode escape sequences
> such as "\uXXXX" into UTF-8?

Ali explained that "\uXXXX" is already UTF-8.

But if you actually want to interpret such escape sequences from user input 
or some such, then find all occurrences, and for each of them do:

* Drop the backslash and the 'u'.
* Parse XXXX as a hexadecimal integer, and cast to dchar.
* Use std.utf.encode to convert to UTF-8. std.conv.to can probably do it 
too, and possibly simpler, but would allocate.

Also be aware of the longer variant with a capital U: \UXXXXXXXX (8 Xs)


More information about the Digitalmars-d-learn mailing list