Parsing a UTF-16LE file line by line, BUG?

Daniel Kozák via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Jan 15 08:29:23 PST 2017


V Sun, 15 Jan 2017 14:48:12 +0000
Nestor via Digitalmars-d-learn <digitalmars-d-learn at puremagic.com> napsáno:

> On Friday, 6 January 2017 at 11:42:17 UTC, Mike Wey wrote:
> > On 01/06/2017 11:33 AM, pineapple wrote:  
> >> On Friday, 6 January 2017 at 06:24:12 UTC, rumbu wrote:  
> >>>>
> >>>> I'm not sure if this works quite as intended, but I was at 
> >>>> least able
> >>>> to produce a UTF-16 decode error rather than a UTF-8 decode 
> >>>> error by
> >>>> setting the file orientation before reading it.
> >>>>
> >>>>     import std.stdio;
> >>>>     import core.stdc.wchar_ : fwide;
> >>>>     void main(){
> >>>>         auto file = File("UTF-16LE encoded file.txt");
> >>>>         fwide(file.getFP(), 1);
> >>>>         foreach(line; file.byLine){
> >>>>             writeln(file.readln);
> >>>>         }
> >>>>     }  
> >>>
> >>> fwide is not implemented in Windows:
> >>> https://msdn.microsoft.com/en-us/library/aa985619.aspx  
> >>
> >> That's odd. It was on Windows 7 64-bit that I put together and 
> >> tested
> >> that example, and calling fwide definitely had an effect on 
> >> program
> >> behavior.  
> >
> > Are you compiling a 32bit binary? Because in that case you 
> > would be using the digital mars c runtime which might have an 
> > implementation for fwide.  
> 
> After some testing I realized that byLine was not the one 
> failing, but any string manipulation done to the obtained line. 
> Compile the following example with and without -debug and run to 
> see what I mean:
> 
> import std.stdio, std.string;
> 
> enum
>    EXIT_SUCCESS = 0,
>    EXIT_FAILURE = 1;
> 
> int main() {
>    version(Windows) {
>      import core.sys.windows.wincon;
>      SetConsoleOutputCP(65001);
>    }
>    auto f = File("utf16le.txt", "r");
>    foreach (line; f.byLine()) try {
>      string s;
>      debug s = cast(string)strip(line); // this is the one causing 
> problems
>      if (1 > s.length) continue;
>      writeln(s);
>    } catch(Exception e) {
>      writefln("Error. %s\nFile \"%s\", line %s.", e.msg, e.file, 
> e.line);
>      return EXIT_FAILURE;
>    }
>    return EXIT_SUCCESS;
> }

This is because byLine does return range, so until you do something with that
it does not cause any harm :)



More information about the Digitalmars-d-learn mailing list