Using decodeFront with a generalised input range
Dennis
dkorpel at gmail.com
Fri Nov 9 11:11:40 UTC 2018
On Friday, 9 November 2018 at 10:45:49 UTC, Vinay Sajip wrote:
> As I see it, a ubyte 0x20 could be decoded to an ASCII char '
> ', and likewise to wchar or dchar. It doesn't (to me) make
> sense to decode a char to a wchar or dchar. Anyway, you've
> shown me how decodeFront can be used, so great!
The character ' ' simply is the number 0x20 in char, wchar and
dchar. The difficulty arises when you use non-ascii characters:
if ("€"[0] == '€')
The character code of € is U+20AC, but a char only goes to 0xFF.
To work around that, UTF-8 gives higher code points multiple
bytes (or code units). The € sign will be represented as [0xE2,
0x82, 0xAC]. So the code above actually checks 0xE2 == 0x20AC,
which will return false. If you decodeFront on [0xE2, 0x82,
0xAC], it will actually output 0x20AC and modify the range to be
[] since it consumed all three code units. That way you can
handle code points properly.
See: https://en.wikipedia.org/wiki/UTF-8#Examples
On Friday, 9 November 2018 at 10:45:49 UTC, Vinay Sajip wrote:
> Supplementary question: is an operation like r.map!(x =>
> cast(char) x) effectively a run-time no-op and just to keep the
> compiler happy, or does it actually result in code being
> executed? I came across a similar issue with ranges recently
> where the answer was to map immutable(byte) to byte in the same
> way.
On dmd without optimization, the map function will compile to:
push RBP //
mov RBP,RSP //
sub RSP,010h // build stack frame
mov -8[RBP],EDI // put argument0 on the stack
mov AL,-8[RBP] // put the stack value in the lower 8 bits of
the return register
leave // delete stack frame
ret // return
So that will be essentially a run-time no-op. However, if you
pass -O -inline to dmd I'm pretty sure it will optimize it away.
GDC and LDC with -O1 or higher will certainly eliminate all
run-time cost.
More information about the Digitalmars-d-learn
mailing list