[Issue 1448] New: UTF-8 output to console is seriously broken

d-bugmail at puremagic.com d-bugmail at puremagic.com
Tue Aug 28 20:51:08 PDT 2007


http://d.puremagic.com/issues/show_bug.cgi?id=1448

           Summary: UTF-8 output to console is seriously broken
           Product: D
           Version: 1.020
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla at digitalmars.com
        ReportedBy: a.solovey at gmail.com


If windows console code page is set to 65001 (UTF-8) and program outputs
non-ascii characters in UTF-8 encoding, there will be no more output after the
first new line after accented character. I believe that problem is in
underlying DMC stdio, but it is more disturbing with D as it has good Unicode
support and it is very convenient to work international texts in it.
This problem has been reported in newsgroup several times before, see for
example
http://www.digitalmars.com/d/archives/digitalmars/D/announce/openquran_v0.21_8492.html
Here is the code to illustrate the problem:
////////
import std.c.stdio;
import std.c.windows.windows;

extern(Windows) export BOOL SetConsoleOutputCP( UINT );

void main() {
    SetConsoleOutputCP( 65001 ); // or use "chcp 65001" instead
    // Codepoint 00e9 is "Latin small letter e with acute"
    puts( "Output utf-8 accented char \u00e9\n... and the rest is cut off!\n"
);
}
/////////
If you run it, "... and the rest is cut off!" won't be displayed. Do not forget
to set console font to Lucida Console before trying this.


-- 



More information about the Digitalmars-d-bugs mailing list