Parsing a UTF-16LE file line by line, BUG?
Nestor via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Jan 15 06:48:12 PST 2017
On Friday, 6 January 2017 at 11:42:17 UTC, Mike Wey wrote:
> On 01/06/2017 11:33 AM, pineapple wrote:
>> On Friday, 6 January 2017 at 06:24:12 UTC, rumbu wrote:
>>>>
>>>> I'm not sure if this works quite as intended, but I was at
>>>> least able
>>>> to produce a UTF-16 decode error rather than a UTF-8 decode
>>>> error by
>>>> setting the file orientation before reading it.
>>>>
>>>> import std.stdio;
>>>> import core.stdc.wchar_ : fwide;
>>>> void main(){
>>>> auto file = File("UTF-16LE encoded file.txt");
>>>> fwide(file.getFP(), 1);
>>>> foreach(line; file.byLine){
>>>> writeln(file.readln);
>>>> }
>>>> }
>>>
>>> fwide is not implemented in Windows:
>>> https://msdn.microsoft.com/en-us/library/aa985619.aspx
>>
>> That's odd. It was on Windows 7 64-bit that I put together and
>> tested
>> that example, and calling fwide definitely had an effect on
>> program
>> behavior.
>
> Are you compiling a 32bit binary? Because in that case you
> would be using the digital mars c runtime which might have an
> implementation for fwide.
After some testing I realized that byLine was not the one
failing, but any string manipulation done to the obtained line.
Compile the following example with and without -debug and run to
see what I mean:
import std.stdio, std.string;
enum
EXIT_SUCCESS = 0,
EXIT_FAILURE = 1;
int main() {
version(Windows) {
import core.sys.windows.wincon;
SetConsoleOutputCP(65001);
}
auto f = File("utf16le.txt", "r");
foreach (line; f.byLine()) try {
string s;
debug s = cast(string)strip(line); // this is the one causing
problems
if (1 > s.length) continue;
writeln(s);
} catch(Exception e) {
writefln("Error. %s\nFile \"%s\", line %s.", e.msg, e.file,
e.line);
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
More information about the Digitalmars-d-learn
mailing list