Multibyte support on Windows, Phobos vs Tango, which is right?

Kris foo at bar.com
Thu Apr 10 00:51:59 PDT 2008


Yidabu:

Tango has a multi-platform API based around Unicode, thus it is not biased 
for windows, linux, or darwin. All the items you mention appear to be 
reasonably specific to Win32, so keep that in mind when reading this reply:


1) You'll find something functionally similar in tango.sys.win32.CodePage


2) Like many O/S, Tango expects file names to be Unicode. This helps makes 
the library portable. On Win32 the blahW() functions are used, with utf8 to 
utf16 conversion applied internally, except when you explicitly stipulate 
the version=Win32SansUnicode compiler option. If you do that, Tango 
currently does no internal conversion for file names. In short, if you 
explicitly disable Unicode support within the library then you currently 
need to handle Win32 code-page conversion yourself (see #1). This might be a 
problem if you're running Tango on Win95 or an old Win32S hybrid


3) you have a recent ticket open for this specific issue, and it is somewhat 
related to #2 above. By default, Tango should happily handle Unicode names 
in a portable manner between O/S. Your ticket has identified a problem with 
the zip package, which does need to be fixed. Perhaps you'd like to try 
fixing the bug in the zip package yourself? Tango is open-source, and 
patches are always welcome. If you'd like to add some more multibyte 
testcases to the codebase, we'd certainly be happy to run them.


Hope that helps




"yidabu" <yidabu.nospam at gmail.com> wrote in message 
news:20080410071434.587eb8e9.yidabu.nospam at gmail.com...
> Multibyte support on Windows, Phobos vs Tango, which is right ?
>
> 1  Phobos has toMBSz function for Converts the UTF-8 string s into a 
> null-terminated string in a Windows
>   8-bit character set.
>   like this:
>
>    char* toMBSz(char[] s, uint codePage = 0)
>    {
>        // Only need to do this if any chars have the high bit set
>        foreach (char c; s)
>        {
>            if (c >= 0x80)
>            {
>                //do convert
>            }
>        }
>        return std.string.toStringz(s);
>    }
>
>   Tango has not this function, is it necessary ?
>
> 2  Is toMBSz(char[]) same as char[] ~ '\0' ?
>
>    for example, FileCreateA
>
>    Phobos way:
>    char[] name;
>    CreateFileA(toMBSz(name) ...)
>
>    Tango way:
>    char[] name;
>    FileCreateA( name ~ '\0' ...)
>
>    Is toMBSz(char[]) always same as char[] ~ '\0' ?
>    Is toMBSz("ChineseººÓï"c) always same as "ChineseººÓï"c ~ '\0' ?
>
>    If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0' 
> everywhere for calling A version Windows API!
>
>
> 3   Phobos zip vs Tango Zip
>
>    I used Phobos zip module, it works fine, a trick is 
> zip.ArchiveMember.name should be locale encode for multibyte environment.
>
>    Tango way:
>    char[][] files = [r"D:\ChineseÖÐÎÄ.txt"];
>    createArchive(r"test.zip", Method.Deflate, files);
>
>    cause Exception:
>    object.Exception: cannot encode character "20013" in codepage 437.
>
>    Tango seems lacks multibyte support on Windows,
>    and have not run special unittests for multibyte environment on Windows 
> before publish a new vesion.
>
>
>
>
> -- 
> yidabu <yidabu.nospam at gmail.com>
> DÓïÑÔ¡¡ÖÐÎÄÖ§³Ö(D Chinese Support)
> http://www.d-programming-language-china.org/
> http://bbs.d-programming-language-china.org/
> http://dwin.d-programming-language-china.org/
> http://scite4d.d-programming-language-china.org/ 





More information about the Digitalmars-d mailing list