Using decodeFront with a generalised input range
Vinay Sajip
vinay_sajip at yahoo.co.uk
Fri Nov 9 12:22:27 UTC 2018
On Friday, 9 November 2018 at 11:24:42 UTC, Jonathan M Davis
wrote:
> decode and decodeFront are for converting a UTF code unit to a
> Unicode code point. So, you're taking UTF-8 code unit (char),
> UTF-16 code unit (wchar), or a UTF-32 code unit (dchar) and
> decoding it. In the case of UTF-32, that's a no-op, since
> UTF-32 code units are already code points, but for UTF-8 and
> UTF-16, they're not the same at all.
> I would advise against doing much with decode or decodeFront
> without having a decent understanding of the basics of Unicode.
>
I think I understand enough of the basics of Unicode, at least
for my application; my unfamiliarity is with the D language and
standard library, to which I am very new.
There are applications where one needs to decode a stream of
bytes into Unicode text: perhaps it's just semantic quibbling
distinguishing between "a ubyte" and "a UTF-8 code unit", as
they're the same at the level of bits and bytes (as I understand
it - please tell me if you think otherwise). If I open a file
using mode "rb", I get a sequence of bytes, which may contain
structured binary data, parts of which are to be interpreted as
text encoded in UTF-8. Is there something in the D standard
library which enables incremental decoding of such (parts of) a
byte stream? Or does one have to resort to the `map!(x =>
cast(char) x)` method for this?
More information about the Digitalmars-d-learn
mailing list