Multibyte support on Windows, Phobos vs Tango, which is right?
Kris
foo at bar.com
Thu Apr 10 00:51:59 PDT 2008
Yidabu:
Tango has a multi-platform API based around Unicode, thus it is not biased
for windows, linux, or darwin. All the items you mention appear to be
reasonably specific to Win32, so keep that in mind when reading this reply:
1) You'll find something functionally similar in tango.sys.win32.CodePage
2) Like many O/S, Tango expects file names to be Unicode. This helps makes
the library portable. On Win32 the blahW() functions are used, with utf8 to
utf16 conversion applied internally, except when you explicitly stipulate
the version=Win32SansUnicode compiler option. If you do that, Tango
currently does no internal conversion for file names. In short, if you
explicitly disable Unicode support within the library then you currently
need to handle Win32 code-page conversion yourself (see #1). This might be a
problem if you're running Tango on Win95 or an old Win32S hybrid
3) you have a recent ticket open for this specific issue, and it is somewhat
related to #2 above. By default, Tango should happily handle Unicode names
in a portable manner between O/S. Your ticket has identified a problem with
the zip package, which does need to be fixed. Perhaps you'd like to try
fixing the bug in the zip package yourself? Tango is open-source, and
patches are always welcome. If you'd like to add some more multibyte
testcases to the codebase, we'd certainly be happy to run them.
Hope that helps
"yidabu" <yidabu.nospam at gmail.com> wrote in message
news:20080410071434.587eb8e9.yidabu.nospam at gmail.com...
> Multibyte support on Windows, Phobos vs Tango, which is right ?
>
> 1 Phobos has toMBSz function for Converts the UTF-8 string s into a
> null-terminated string in a Windows
> 8-bit character set.
> like this:
>
> char* toMBSz(char[] s, uint codePage = 0)
> {
> // Only need to do this if any chars have the high bit set
> foreach (char c; s)
> {
> if (c >= 0x80)
> {
> //do convert
> }
> }
> return std.string.toStringz(s);
> }
>
> Tango has not this function, is it necessary ?
>
> 2 Is toMBSz(char[]) same as char[] ~ '\0' ?
>
> for example, FileCreateA
>
> Phobos way:
> char[] name;
> CreateFileA(toMBSz(name) ...)
>
> Tango way:
> char[] name;
> FileCreateA( name ~ '\0' ...)
>
> Is toMBSz(char[]) always same as char[] ~ '\0' ?
> Is toMBSz("ChineseººÓï"c) always same as "ChineseººÓï"c ~ '\0' ?
>
> If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0'
> everywhere for calling A version Windows API!
>
>
> 3 Phobos zip vs Tango Zip
>
> I used Phobos zip module, it works fine, a trick is
> zip.ArchiveMember.name should be locale encode for multibyte environment.
>
> Tango way:
> char[][] files = [r"D:\ChineseÖÐÎÄ.txt"];
> createArchive(r"test.zip", Method.Deflate, files);
>
> cause Exception:
> object.Exception: cannot encode character "20013" in codepage 437.
>
> Tango seems lacks multibyte support on Windows,
> and have not run special unittests for multibyte environment on Windows
> before publish a new vesion.
>
>
>
>
> --
> yidabu <yidabu.nospam at gmail.com>
> DÓïÑÔ¡¡ÖÐÎÄÖ§³Ö(D Chinese Support)
> http://www.d-programming-language-china.org/
> http://bbs.d-programming-language-china.org/
> http://dwin.d-programming-language-china.org/
> http://scite4d.d-programming-language-china.org/
More information about the Digitalmars-d
mailing list