Parsing a UTF-16LE file line by line, BUG?
Daniel Kozák via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Jan 15 08:29:23 PST 2017
V Sun, 15 Jan 2017 14:48:12 +0000
Nestor via Digitalmars-d-learn <digitalmars-d-learn at puremagic.com> napsáno:
> On Friday, 6 January 2017 at 11:42:17 UTC, Mike Wey wrote:
> > On 01/06/2017 11:33 AM, pineapple wrote:
> >> On Friday, 6 January 2017 at 06:24:12 UTC, rumbu wrote:
> >>>>
> >>>> I'm not sure if this works quite as intended, but I was at
> >>>> least able
> >>>> to produce a UTF-16 decode error rather than a UTF-8 decode
> >>>> error by
> >>>> setting the file orientation before reading it.
> >>>>
> >>>> import std.stdio;
> >>>> import core.stdc.wchar_ : fwide;
> >>>> void main(){
> >>>> auto file = File("UTF-16LE encoded file.txt");
> >>>> fwide(file.getFP(), 1);
> >>>> foreach(line; file.byLine){
> >>>> writeln(file.readln);
> >>>> }
> >>>> }
> >>>
> >>> fwide is not implemented in Windows:
> >>> https://msdn.microsoft.com/en-us/library/aa985619.aspx
> >>
> >> That's odd. It was on Windows 7 64-bit that I put together and
> >> tested
> >> that example, and calling fwide definitely had an effect on
> >> program
> >> behavior.
> >
> > Are you compiling a 32bit binary? Because in that case you
> > would be using the digital mars c runtime which might have an
> > implementation for fwide.
>
> After some testing I realized that byLine was not the one
> failing, but any string manipulation done to the obtained line.
> Compile the following example with and without -debug and run to
> see what I mean:
>
> import std.stdio, std.string;
>
> enum
> EXIT_SUCCESS = 0,
> EXIT_FAILURE = 1;
>
> int main() {
> version(Windows) {
> import core.sys.windows.wincon;
> SetConsoleOutputCP(65001);
> }
> auto f = File("utf16le.txt", "r");
> foreach (line; f.byLine()) try {
> string s;
> debug s = cast(string)strip(line); // this is the one causing
> problems
> if (1 > s.length) continue;
> writeln(s);
> } catch(Exception e) {
> writefln("Error. %s\nFile \"%s\", line %s.", e.msg, e.file,
> e.line);
> return EXIT_FAILURE;
> }
> return EXIT_SUCCESS;
> }
This is because byLine does return range, so until you do something with that
it does not cause any harm :)
More information about the Digitalmars-d-learn
mailing list