Using lazy code to process large files
Steven Schveighoffer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed Aug 2 08:52:13 PDT 2017
On 8/2/17 11:02 AM, kdevel wrote:
> On Wednesday, 2 August 2017 at 13:45:01 UTC, Steven Schveighoffer wrote:
>> As Daniel said, using byCodeUnit will help.
>
> stripLeft seems to autodecode even when fed with CodeUnits. How do I
> prevent this?
>
> 1 void main ()
> 2 {
> 3 import std.stdio;
> 4 import std.string;
> 5 import std.conv;
> 6 import std.utf;
> 7 import std.algorithm;
> 8
> 9 string [] src = [ " \xfc" ]; // blank + latin-1 encoded u
> umlaut
> 10 auto result = src
> 11 .map!(a => a.byCodeUnit)
> 12 .map!(a => a.stripLeft);
> 13 result.writeln;
> 14 }
>
> Crashes with a C++-like dump.
>
First, as a tip, please post either a link to a paste site, or don't put
the line numbers. It's much easier to copy-paste your code into an
editor if you don't have the line numbers.
What has happened is that you injected a non-encoded code point. In
UTF8, any code point above 0x7f must be encoded into a string of several
code units. See the table on this page: https://en.wikipedia.org/wiki/%C3%9C
If we use the correct code unit sequence (0xc3 0x9c), then it works:
https://run.dlang.io/is/4umQoo
-Steve
More information about the Digitalmars-d-learn
mailing list