Using decodeFront with a generalised input range

Vinay Sajip vinay_sajip at yahoo.co.uk
Fri Nov 9 12:22:27 UTC 2018


On Friday, 9 November 2018 at 11:24:42 UTC, Jonathan M Davis 
wrote:
> decode and decodeFront are for converting a UTF code unit to a 
> Unicode code point. So, you're taking UTF-8 code unit (char), 
> UTF-16 code unit (wchar), or a UTF-32 code unit (dchar) and 
> decoding it. In the case of UTF-32, that's a no-op, since 
> UTF-32 code units are already code points, but for UTF-8 and 
> UTF-16, they're not the same at all.

> I would advise against doing much with decode or decodeFront 
> without having a decent understanding of the basics of Unicode.
>

I think I understand enough of the basics of Unicode, at least 
for my application; my unfamiliarity is with the D language and 
standard library, to which I am very new.

There are applications where one needs to decode a stream of 
bytes into Unicode text: perhaps it's just semantic quibbling 
distinguishing between "a ubyte" and "a UTF-8 code unit", as 
they're the same at the level of bits and bytes (as I understand 
it - please tell me if you think otherwise). If I open a file 
using mode "rb", I get a sequence of bytes, which may contain 
structured binary data, parts of which are to be interpreted as 
text encoded in UTF-8. Is there something in the D standard 
library which enables incremental decoding of such (parts of) a 
byte stream? Or does one have to resort to the `map!(x => 
cast(char) x)` method for this?


More information about the Digitalmars-d-learn mailing list