Fix bugs caused by encoding in the DMD compiler under Windows
mm
comatmsam at sina.com
Mon May 8 15:16:49 UTC 2023
This post should have been posted to the DMD compiler area, but I
struggled for 3 hours and couldn't get there. I'll try posting
here to see if I can successfully post it
修正dmd编译器在windows下编码导致的bug
Fix bugs caused by encoding in the DMD compiler under Windows
以下问题在 dmd 2.103.1 99.1 100.1版本都存在
The following issues exist in all versions of dmd 2.103.1, 99.1,
and 100.1
一般linux使用utf8不会出现这个问题
Normally, using utf8 on Linux does not cause this issue
只有windows才会出现. 当win10以上系统Windows ANSI code page = utf8时该问题也不会出现
Only Windows will appear This issue will not occur when Windows
ANSI code page=utf8 is used on systems above win10
由于和linux 系统表现不一致 所以把这问题定义为bug
Due to inconsistent performance with the Linux system, this issue
is defined as a bug
下面来复现这个bug 然后修复它
Now let's reproduce this bug and fix it
假设:
Assumption:
系统Windows ANSI code page != utf8
System Windows ANSI code page != utf8
------------------------
有2个源码文件 a.d
There are two source code files, a.d 你好.d
a.d 文件内容如下:
a.d The file content is as follows:
import 你好;
-------------------------
此时我们 cmd.exe 下输入
At this point, we cmd.exe Lower input
dmd a.d //失败无法找到 你好.d (乱码)
//Failed to find 你好. d (garbled code)
---------------------------
之所以出现这个问题是因为dmd 访问文件的时候需要把文件名称 转换为utf16
The reason for this issue is that when dmd accesses files, it
needs to convert the file name to utf16
但是dmd转换 参数出现了错误
But there was an error in the dmd conversion parameters
下面来修复该问题:
Let's fix this issue:
1
1.1 打开 ..\dmd\dmd\common\string.d
open ..\dmd\dmd\common\string.d
1.2 查找 toWStringz
search for toWStringz
1.3 修改如下:
Modify as follows:
version(Windows) wchar[] toWStringz(const(char)[] narrow, ref
SmallBuffer!wchar buffer) nothrow
{
//import core.sys.windows.winnls : CP_ACP,
MultiByteToWideChar;
import core.sys.windows.winnls : CP_UTF8, MultiByteToWideChar;
// assume filenames encoded in system default Windows ANSI
code page
//enum CodePage = CP_ACP;
enum CodePage = CP_UTF8;
1.4 保存 并编译dmd
Save and compile dmd
--------------------------
此时输入dmd a.d 完成ok
At this point, enter dmd a.d to complete OK
此时输入dmd 你好.d 失败
At this point, enter 你好.d failed at this time
原因是cmd的编码使用的是ANSI 他使用 toWStringz转换的参数也有问题 不能再使用这个函数
The reason is that the encoding of cmd uses ANSI, and there are
also issues with the parameters converted using toWStringz. This
function cannot be used anymore
--------------------------
下面修正问题
Fix the problem
2
2.1 打开 ..\dmd\dmd\common\string.d
open ..\dmd\dmd\common\string.d
2.2 添加函数 如下:
Add functions :
// 使用windows api 互相转换编码
// Using the Windows API to convert encoding to and from each
other
version(Windows) char* Encodingconversion(const(char)* buffer,int
CodePage,int toCodePage )
{
import core.sys.windows.winnls :
MultiByteToWideChar,WideCharToMultiByte;
import core.stdc.string : strlen;
int bufferlen = cast(int)strlen(buffer);
int utf16len = MultiByteToWideChar(CodePage, 0,
buffer,bufferlen, null, cast(int) 0);
wchar[] utf16 = new wchar[utf16len];
utf16len = MultiByteToWideChar(CodePage, 0, buffer, bufferlen,
utf16.ptr, utf16len);
int len=WideCharToMultiByte(toCodePage, 0, utf16.ptr,
cast(int)utf16len, null, 0, null, null);
char* utfx= cast(char*)new char[len];
WideCharToMultiByte(toCodePage, 0, utf16.ptr, cast(int)utf16len,
utfx, len, null, null);
utfx[len]='\0';
return utfx;
}
2.3 保存..
Save ..
--------------------------------------------
2.4 打开 ..\dmd\dmd\mars.d
open ..\dmd\dmd\mars.d
2.5 查找 main(int
search for main(int
2.6 修改如下:
Modify as follows:
extern (C) int main(int argc, char** argv)
{
bool lowmem = false;
foreach (i; 1 .. argc)
{
if (strcmp(argv[i], "-lowmem") == 0)
{
lowmem = true;
break;
}
}
if (!lowmem)
{
__gshared string[] disable_options = [
"gcopt=disable:1" ];
rt_options = disable_options;
mem.disableGC();
}
version(Windows)
{
//不要把该代码放在上面的循环体
//Do not place this code in the loop body above
//当 { lowmem == true } 时会出错误
//When {lowmem==true}, an error will occur
foreach (i; 0 .. argc)
{
import dmd.common.string;
import core.sys.windows.winnls : GetACP,CP_UTF8;
int CodePage=GetACP();
if(CodePage!= CP_UTF8)
{
argv[i]=Encodingconversion(argv[i] ,
CodePage,cast(int)CP_UTF8);
}
}
}
// initialize druntime and call _Dmain() below
return _d_run_main(argc, argv, &_Dmain);
}
2.7 保存
Save
------------------------------
dmd 你好.d 链接失败 link failure
原因是dmd输出的命令编码有问题
The reason is that there is an issue with the encoding of the
command output by DMD
------------------------------
2.8 打开 ..\dmd\dmd\link.d
open ..\dmd\dmd\link.d
2.9 查找 executecmd
search for executecmd
找到 find:
private int executecmd(const(char)* cmd, const(char)* args)
修改为 Modify to:
private int executecmd1(const(char)* cmd, const(char)* args)
2.10 在修改代码的上方 加入函数 :
Add functions above the modified code:
private int executecmd(const(char)* cmd, const(char)* args)
{
//编译器调用外部连接器cmd 必须把utf8编码转换为Windows ANSI code
//The compiler must convert utf8 encoding to Windows
ANSI code when calling external connector cmd
import std.stdio;
import dmd.common.string;
import core.stdc.string : strlen;
import core.sys.windows.winnls : GetACP,CP_UTF8;
int CodePage=GetACP();
if(CodePage!= CP_UTF8)
{
char* args1=Encodingconversion(args ,cast(int)CP_UTF8,
CodePage);
char* cmd1=Encodingconversion(cmd ,cast(int)CP_UTF8, CodePage);
return executecmd1(cmd1,args1);
}
return executecmd1(cmd,args);
}
2.11 保存 并编译 编译器 dmd
Save and compile dmd
---------------
此时在cmd.exe
At this point, in cmd.exe
此时输入dmd a.d 完成ok
At this point, enter dmd a.d to complete OK
此时输入dmd 你好.d 完成ok
At this point, enter dmd Hello. d Complete OK
bug修复完成了问题
The bug has been fixed and the problem has been resolved
--------------------------------------
另外说一个问题 应该是标准库的问题
Another issue should be with the standard library
以下问题在windows dmd 2.103.1 版本都存在
The following issues exist in Windows DMD version 2.103.1
extern (C) int main(int argc, char** argv)
{
argv[i] ///编码 == 当前系统编码
argv[i] ///编码 == Encoding ==Current system code
}
extern (D) int main(string[] argv)
{
argv[i] //编码 == utf8
argv[i] //Encoding ==utf8
}
extern (C++) int main(int argc, char** argv)
{
argv[i] //不是编码问题了,是数据不可用 .
//It's not a coding issue anymore, it's data
unavailable
}
More information about the Digitalmars-d-ide
mailing list