Using lazy code to process large files
kdevel via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Aug 2 11:28:52 PDT 2017
On Wednesday, 2 August 2017 at 17:37:09 UTC, Steven Schveighoffer
wrote:
> What is expected? What I see on the screen when I run my code
> is:
>
> [Ü]
Upper case?
> What I see when I run your "working" code is:
>
> [?]
Your terminal is incapable of rendering the Latin-1 encoding. The
program prints one byte of value 0xfc. You may pipe the output
into hexdump -C:
00000000 5b fc 5d 0a |[ü].|
00000004
> You are missing the point that your input string is invalid.
It's perfectly okay to put any value a octet can take into an
octet. I did not claim that the data in the string memory is
syntactically valid UTF-8. Read the comment in line 9 of my post
of 15:02:22.
> std.algorithm is not validating the entire string,
True and it should not. So this is what I want.
> and so it doesn't throw an error like string.stripLeft does.
That is the point. You wrote
| I wouldn't expect good performance from this, as there is
auto-decoding all
| over the place.
I erroneously thought that using byCodeUnit disables the whole
UTF-8 processing and enforces operation on (u)bytes. But this is
not the case at least not for stripLeft and probably other string
functions.
> writeln doesn't do any decoding of individual strings. It
> avoids the problem and just copies your bad data directly.
That is what I expected.
More information about the Digitalmars-d-learn
mailing list