Fix bugs caused by encoding in the DMD compiler under Windows

mm comatmsam at sina.com
Mon May 8 15:16:49 UTC 2023


This post should have been posted to the DMD compiler area, but I 
struggled for 3 hours and couldn't get there. I'll try posting 
here to see if I can successfully post it

修正dmd编译器在windows下编码导致的bug
Fix bugs caused by encoding in the DMD compiler under Windows

以下问题在 dmd 2.103.1 99.1 100.1版本都存在
The following issues exist in all versions of dmd 2.103.1, 99.1, 
and 100.1

一般linux使用utf8不会出现这个问题
Normally, using utf8 on Linux does not cause this issue

只有windows才会出现. 当win10以上系统Windows ANSI code page = utf8时该问题也不会出现
Only Windows will appear This issue will not occur when Windows 
ANSI code page=utf8 is used on systems above win10

由于和linux 系统表现不一致 所以把这问题定义为bug
Due to inconsistent performance with the Linux system, this issue 
is defined as a bug

下面来复现这个bug 然后修复它
Now let's reproduce this bug and fix it


假设:
Assumption:

系统Windows ANSI code page != utf8
System Windows ANSI code page != utf8
------------------------
有2个源码文件 a.d
There are two source code files,    a.d 你好.d

a.d    文件内容如下:
a.d    The file content is as follows:

import 你好;

-------------------------
此时我们 cmd.exe 下输入
At this point, we      cmd.exe Lower input

dmd a.d   //失败无法找到  你好.d  (乱码)
           //Failed to find 你好. d (garbled code)
---------------------------

之所以出现这个问题是因为dmd 访问文件的时候需要把文件名称 转换为utf16
The reason for this issue is that when dmd accesses files, it 
needs to convert the file name to utf16


但是dmd转换 参数出现了错误
But there was an error in the dmd conversion parameters

下面来修复该问题:
Let's fix this issue:

1
   1.1 打开 ..\dmd\dmd\common\string.d
       open ..\dmd\dmd\common\string.d

   1.2 查找  toWStringz
       search for    toWStringz

   1.3 修改如下:
       Modify as follows:

  version(Windows) wchar[] toWStringz(const(char)[] narrow, ref 
SmallBuffer!wchar buffer) nothrow
{
     //import core.sys.windows.winnls : CP_ACP, 
MultiByteToWideChar;
     import core.sys.windows.winnls : CP_UTF8, MultiByteToWideChar;
     // assume filenames encoded in system default Windows ANSI 
code page
     //enum CodePage = CP_ACP;
     enum CodePage = CP_UTF8;

   1.4 保存 并编译dmd
       Save and compile dmd

--------------------------
   此时输入dmd a.d   完成ok
   At this point, enter   dmd a.d      to complete OK

   此时输入dmd 你好.d   失败
   At this point, enter   你好.d       failed at this time

   原因是cmd的编码使用的是ANSI 他使用 toWStringz转换的参数也有问题 不能再使用这个函数
   The reason is that the encoding of cmd uses ANSI, and there are 
also issues with the parameters converted using toWStringz.  This 
function cannot be used anymore
--------------------------

下面修正问题
Fix the problem

2
   2.1 打开 ..\dmd\dmd\common\string.d
       open ..\dmd\dmd\common\string.d

   2.2 添加函数 如下:
       Add functions :

// 使用windows api 互相转换编码
// Using the Windows API to convert encoding to and from each 
other
version(Windows) char* Encodingconversion(const(char)* buffer,int 
CodePage,int toCodePage )
{
	import core.sys.windows.winnls : 
MultiByteToWideChar,WideCharToMultiByte;
	import core.stdc.string : strlen;

         int bufferlen = cast(int)strlen(buffer);

	int utf16len = MultiByteToWideChar(CodePage, 0, 
buffer,bufferlen, null, cast(int) 0);
	wchar[] utf16 = new wchar[utf16len];
	utf16len = MultiByteToWideChar(CodePage, 0, buffer, bufferlen, 
utf16.ptr, utf16len);


	int len=WideCharToMultiByte(toCodePage, 0, utf16.ptr, 
cast(int)utf16len, null, 0, null, null);

	char* utfx= cast(char*)new char[len];

	WideCharToMultiByte(toCodePage, 0, utf16.ptr, cast(int)utf16len, 
utfx, len, null, null);
	utfx[len]='\0';

	return utfx;
}
   2.3 保存..
       Save ..
--------------------------------------------
   2.4 打开   ..\dmd\dmd\mars.d
       open   ..\dmd\dmd\mars.d

   2.5 查找 main(int
       search for main(int

   2.6 修改如下:
       Modify as follows:

     extern (C) int main(int argc, char** argv)
     {
         bool lowmem = false;
         foreach (i; 1 .. argc)
         {
             if (strcmp(argv[i], "-lowmem") == 0)
             {
                 lowmem = true;
                 break;
             }
         }
         if (!lowmem)
         {
             __gshared string[] disable_options = [ 
"gcopt=disable:1" ];
             rt_options = disable_options;
             mem.disableGC();
         }
	version(Windows)
	{
		//不要把该代码放在上面的循环体
		//Do not place this code in the loop body above


		//当 { lowmem == true  }  时会出错误
		//When {lowmem==true}, an error will occur
		foreach (i; 0 .. argc)
        		{
			import dmd.common.string;
			import core.sys.windows.winnls : GetACP,CP_UTF8;
			int CodePage=GetACP();
			if(CodePage!= CP_UTF8)
			{
				argv[i]=Encodingconversion(argv[i] , 
CodePage,cast(int)CP_UTF8);
			}
		}
	}
         // initialize druntime and call _Dmain() below
         return _d_run_main(argc, argv, &_Dmain);
     }


   2.7 保存
       Save

------------------------------
    dmd 你好.d   链接失败 link failure

    原因是dmd输出的命令编码有问题
    The reason is that there is an issue with the encoding of the 
command output by DMD
------------------------------

   2.8 打开   ..\dmd\dmd\link.d
       open   ..\dmd\dmd\link.d

   2.9 查找        executecmd
       search for  executecmd

      找到   find:
      private int executecmd(const(char)* cmd, const(char)* args)

      修改为  Modify to:
      private int executecmd1(const(char)* cmd, const(char)* args)


   2.10 在修改代码的上方 加入函数 :
        Add functions above the modified code:

      private int executecmd(const(char)* cmd, const(char)* args)
     {
            //编译器调用外部连接器cmd 必须把utf8编码转换为Windows ANSI code
            //The compiler must convert utf8 encoding to Windows 
ANSI code when calling external connector cmd
         import std.stdio;
	import dmd.common.string;
	import core.stdc.string : strlen;
	import core.sys.windows.winnls : GetACP,CP_UTF8;
	
	int CodePage=GetACP();
	if(CodePage!= CP_UTF8)
	{
		char* args1=Encodingconversion(args ,cast(int)CP_UTF8, 
CodePage);
		char* cmd1=Encodingconversion(cmd ,cast(int)CP_UTF8, CodePage);
		return executecmd1(cmd1,args1);
	}
	return executecmd1(cmd,args);
     }
    2.11 保存 并编译 编译器 dmd
         Save and compile dmd


---------------
此时在cmd.exe
At this point, in cmd.exe

此时输入dmd a.d 完成ok
At this point, enter dmd a.d to complete OK

此时输入dmd 你好.d 完成ok
At this point, enter dmd Hello. d Complete OK

bug修复完成了问题
The bug has been fixed and the problem has been resolved

--------------------------------------


另外说一个问题 应该是标准库的问题
Another issue should be with the standard library

以下问题在windows dmd 2.103.1 版本都存在
The following issues exist in Windows DMD version 2.103.1


extern (C) int main(int argc, char** argv)
{
   argv[i]  ///编码 == 当前系统编码
   argv[i]  ///编码 == Encoding ==Current system code
}
extern (D) int main(string[] argv)
{
   argv[i]  //编码 == utf8
   argv[i]  //Encoding ==utf8
}
extern (C++) int main(int argc, char** argv)
{
   argv[i]  //不是编码问题了,是数据不可用 .
            //It's not a coding issue anymore, it's data 
unavailable
}


More information about the Digitalmars-d-ide mailing list