Using decodeFront with a generalised input range

Fri Nov 9 11:11:40 UTC 2018

On Friday, 9 November 2018 at 10:45:49 UTC, Vinay Sajip wrote:
> As I see it, a ubyte 0x20 could be decoded to an ASCII char ' 
> ', and likewise to wchar or dchar. It doesn't (to me) make 
> sense to decode a char to a wchar or dchar. Anyway, you've 
> shown me how decodeFront can be used, so great!

The character ' ' simply is the number 0x20 in char, wchar and 
dchar. The difficulty arises when you use non-ascii characters:

if ("€"[0] == '€')

The character code of € is U+20AC, but a char only goes to 0xFF. 
To work around that, UTF-8 gives higher code points multiple 
bytes (or code units). The € sign will be represented as [0xE2, 
0x82, 0xAC]. So the code above actually checks 0xE2 == 0x20AC, 
which will return false. If you decodeFront on [0xE2, 0x82, 
0xAC], it will actually output 0x20AC and modify the range to be 
[] since it consumed all three code units. That way you can 
handle code points properly.
See: https://en.wikipedia.org/wiki/UTF-8#Examples

On Friday, 9 November 2018 at 10:45:49 UTC, Vinay Sajip wrote:
> Supplementary question: is an operation like r.map!(x => 
> cast(char) x) effectively a run-time no-op and just to keep the 
> compiler happy, or does it actually result in code being 
> executed? I came across a similar issue with ranges recently 
> where the answer was to map immutable(byte) to byte in the same 
> way.

On dmd without optimization, the map function will compile to:
	push	RBP          //
	mov	RBP,RSP      //
	sub	RSP,010h     // build stack frame
	mov	-8[RBP],EDI  // put argument0 on the stack
	mov	AL,-8[RBP]   // put the stack value in the lower 8 bits of 
the return register
	leave                // delete stack frame
	ret                  // return

So that will be essentially a run-time no-op. However, if you 
pass -O -inline to dmd I'm pretty sure it will optimize it away. 
GDC and LDC with -O1 or higher will certainly eliminate all 
run-time cost.