Strange behavior in console with UTF-8

Steven Schveighoffer via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Mar 25 06:58:44 PDT 2016


On 3/24/16 8:54 PM, Jonathan Villa wrote:
> I prefer to post this thing here because it could that I'm doing
> something wrong.
>
> I'm using std.stdio -> readln() to read whatever I'm typing in the console.
> BUT, if the line contains some UTF-8 characters, the data obtained is
> EMPTY and
>
> <code>
>
> module runnable;
>
> import std.stdio;
> import std.string : chomp;
> import std.experimental.logger;
>
> void doSomethingElse(wchar[] data)
> {
>      writeln("hello!");
> }
>
> int main(string[] args)
> {
>      /* Some fix I found to fix UTF-8 related problems, I'm using
> Windows 10 */
>      version(Windows)
>      {
>          import core.sys.windows.windows;
>          if (SetConsoleCP(65001) == 0)
>              throw new Exception("failure");
>          if (SetConsoleOutputCP(65001) == 0)
>              throw new Exception("failure");
>      }
>      FileLogger fl = new FileLogger("log.log");
>      wchar[] readerBuffer;
>
>      readln(readerBuffer);
>      readerBuffer = chomp(readerBuffer);
>
>      fl.info(readerBuffer.length); /* <- if the readed string contains
> at least one UTF-8
>                                          char this prints 0, else it
> prints its length
>                                     */
>
>      if (readerBuffer != "exit"w)
>          doSomethingElse(readerBuffer);
>
>      /* Also, all the following code doesn't run as expected, the
> program doesn't wait for
>         you, it executes readln() even without pressing/sending a key */
>      readln(readerBuffer);
>      fl.info(readerBuffer.length);
>      readln(readerBuffer);
>      fl.info(readerBuffer.length);
>      readln(readerBuffer);
>      fl.info(readerBuffer.length);
>      readln(readerBuffer);
>      fl.info(readerBuffer.length);
>      readln(readerBuffer);
>      fl.info(readerBuffer.length);
>
>      return 0;
> }
> </code>
> The real code is bigger but this describes the bug. Also, if it needs to
> print UTF-8 there's no problem.
>
> My main problem is that the line is gonna be sended through a TCP socket
> and I wanna make it work with UTF-8. I'm using WCHAR instead of CHAR
> with the hope to get less problems in the future.
>
> I you comment the fixed Windows code, the program crashes
> http://prntscr.com/ajmy14
>
> Also I tried stdin.flush() right after the first readln() but nothing
> seems to fix it.
>
> I'm doing something wrong?
> many thanks.

D's File i/o uses C's FILE * i/o system. At least on Windows, this has 
literally zero support for wchar (you can set stream width, and the 
library just ignores it).

What is likely happening is that it is putting the char code units into 
wchar buffer directly, which is not what you want.

I am not certain of this cause, but I would steer clear of any i/o that 
is not char-based. What you can do is read into a char buffer, and then 
re-encode using std.conv.to to get wchar strings if you need that.

-Steve


More information about the Digitalmars-d-learn mailing list