string to character code hex string
ag0aep6g via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Sep 3 03:03:26 PDT 2017
On 09/03/2017 01:39 AM, Ali Çehreli wrote:
> Ok, I see that I made a mistake but I still don't think the conversion
> is one way. If we can convert byte-by-byte, we should be able to convert
> back byte-by-byte, right?
You weren't converting byte-by-byte. You were only converting the
significant bytes of the code points, throwing away leading zeroes.
> What I failed to ensure was to iterate by code
> units.
A UTF-8 code unit is a byte, so "%02x" is enough, yes. But for UTF-16
and UTF-32 code units, it's not. You need to match the format width to
the size of the code unit.
Or maybe just convert everything to UTF-8 first. That also sidesteps any
endianess issues.
> The following is able to get the same string back:
>
> import std.stdio;
> import std.string;
> import std.algorithm;
> import std.range;
> import std.utf;
> import std.conv;
>
> auto toHex(R)(R input) {
> // As Moritz Maxeiner says, this format is expensive
> return input.byCodeUnit.map!(c => format!"%02x"(c)).joiner;
> }
>
> int hexValue(C)(C c) {
> switch (c) {
> case '0': .. case '9':
> return c - '0';
> case 'a': .. case 'f':
> return c - 'a' + 10;
> default:
> assert(false);
> }
> }
>
> auto fromHex(R, Dst = char)(R input) {
> return input.chunks(2).map!((ch) {
> auto high = ch.front.hexValue * 16;
> ch.popFront();
> return high + ch.front.hexValue;
> }).map!(value => cast(Dst)value);
> }
>
> void main() {
> assert("AAA".toHex.fromHex.equal("AAA"));
>
> assert("ö…".toHex.fromHex.equal("ö…".byCodeUnit));
> // Alternative check:
> assert("ö…".toHex.fromHex.text.equal("ö…"));
> }
Still fails with UTF-16 and UTF-32 strings:
----
writeln("…"w.toHex.fromHex.text); /* prints " &" */
writeln("…"d.toHex.fromHex.text); /* prints " &" */
----
More information about the Digitalmars-d-learn
mailing list