Parsing a UTF-16LE file line by line, BUG?

Nestor via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Jan 15 06:48:12 PST 2017


On Friday, 6 January 2017 at 11:42:17 UTC, Mike Wey wrote:
> On 01/06/2017 11:33 AM, pineapple wrote:
>> On Friday, 6 January 2017 at 06:24:12 UTC, rumbu wrote:
>>>>
>>>> I'm not sure if this works quite as intended, but I was at 
>>>> least able
>>>> to produce a UTF-16 decode error rather than a UTF-8 decode 
>>>> error by
>>>> setting the file orientation before reading it.
>>>>
>>>>     import std.stdio;
>>>>     import core.stdc.wchar_ : fwide;
>>>>     void main(){
>>>>         auto file = File("UTF-16LE encoded file.txt");
>>>>         fwide(file.getFP(), 1);
>>>>         foreach(line; file.byLine){
>>>>             writeln(file.readln);
>>>>         }
>>>>     }
>>>
>>> fwide is not implemented in Windows:
>>> https://msdn.microsoft.com/en-us/library/aa985619.aspx
>>
>> That's odd. It was on Windows 7 64-bit that I put together and 
>> tested
>> that example, and calling fwide definitely had an effect on 
>> program
>> behavior.
>
> Are you compiling a 32bit binary? Because in that case you 
> would be using the digital mars c runtime which might have an 
> implementation for fwide.

After some testing I realized that byLine was not the one 
failing, but any string manipulation done to the obtained line. 
Compile the following example with and without -debug and run to 
see what I mean:

import std.stdio, std.string;

enum
   EXIT_SUCCESS = 0,
   EXIT_FAILURE = 1;

int main() {
   version(Windows) {
     import core.sys.windows.wincon;
     SetConsoleOutputCP(65001);
   }
   auto f = File("utf16le.txt", "r");
   foreach (line; f.byLine()) try {
     string s;
     debug s = cast(string)strip(line); // this is the one causing 
problems
     if (1 > s.length) continue;
     writeln(s);
   } catch(Exception e) {
     writefln("Error. %s\nFile \"%s\", line %s.", e.msg, e.file, 
e.line);
     return EXIT_FAILURE;
   }
   return EXIT_SUCCESS;
}


More information about the Digitalmars-d-learn mailing list