Multibyte support on Windows, Phobos vs Tango, which is right?

yidabu yidabu.nospam at gmail.com
Thu Apr 10 00:58:45 PDT 2008


On Wed, 9 Apr 2008 23:51:59 -0800
"Kris" <foo at bar.com> wrote:

> Yidabu:
> 
> Tango has a multi-platform API based around Unicode, thus it is not biased 
> for windows, linux, or darwin. All the items you mention appear to be 
> reasonably specific to Win32, so keep that in mind when reading this reply:
> 
> 
> 1) You'll find something functionally similar in tango.sys.win32.CodePage
> 
> 
> 2) Like many O/S, Tango expects file names to be Unicode. This helps makes 
> the library portable. On Win32 the blahW() functions are used, with utf8 to 
> utf16 conversion applied internally, except when you explicitly stipulate 
> the version=Win32SansUnicode compiler option. If you do that, Tango 
> currently does no internal conversion for file names. In short, if you 
> explicitly disable Unicode support within the library then you currently 
> need to handle Win32 code-page conversion yourself (see #1). This might be a 
> problem if you're running Tango on Win95 or an old Win32S hybrid
> 
> 
> 3) you have a recent ticket open for this specific issue, and it is somewhat 
> related to #2 above. By default, Tango should happily handle Unicode names 
> in a portable manner between O/S. Your ticket has identified a problem with 
> the zip package, which does need to be fixed. Perhaps you'd like to try 
> fixing the bug in the zip package yourself? Tango is open-source, and 
> patches are always welcome. If you'd like to add some more multibyte 
> testcases to the codebase, we'd certainly be happy to run them.
> 
> 
> Hope that helps
> 
> 
> 
> 
> "yidabu" <yidabu.nospam at gmail.com> wrote in message 
> news:20080410071434.587eb8e9.yidabu.nospam at gmail.com...
> > Multibyte support on Windows, Phobos vs Tango, which is right ?
> >
> > 1  Phobos has toMBSz function for Converts the UTF-8 string s into a 
> > null-terminated string in a Windows
> >   8-bit character set.
> >   like this:
> >
> >    char* toMBSz(char[] s, uint codePage = 0)
> >    {
> >        // Only need to do this if any chars have the high bit set
> >        foreach (char c; s)
> >        {
> >            if (c >= 0x80)
> >            {
> >                //do convert
> >            }
> >        }
> >        return std.string.toStringz(s);
> >    }
> >
> >   Tango has not this function, is it necessary ?
> >
> > 2  Is toMBSz(char[]) same as char[] ~ '\0' ?
> >
> >    for example, FileCreateA
> >
> >    Phobos way:
> >    char[] name;
> >    CreateFileA(toMBSz(name) ...)
> >
> >    Tango way:
> >    char[] name;
> >    FileCreateA( name ~ '\0' ...)
> >
> >    Is toMBSz(char[]) always same as char[] ~ '\0' ?
> >    Is toMBSz("Chinese汉语"c) always same as "Chinese汉语"c ~ '\0' ?
> >
> >    If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0' 
> > everywhere for calling A version Windows API!
> >
> >
> > 3   Phobos zip vs Tango Zip
> >
> >    I used Phobos zip module, it works fine, a trick is 
> > zip.ArchiveMember.name should be locale encode for multibyte environment.
> >
> >    Tango way:
> >    char[][] files = [r"D:\Chinese中文.txt"];
> >    createArchive(r"test.zip", Method.Deflate, files);
> >
> >    cause Exception:
> >    object.Exception: cannot encode character "20013" in codepage 437.
> >
> >    Tango seems lacks multibyte support on Windows,
> >    and have not run special unittests for multibyte environment on Windows 
> > before publish a new vesion.
> >

Kris,
    Thanks for you reply.
    
    1) I know the CodePage module, the issue is Tango does not use it for conversion file names. 
    
    2) since pass (char[] ~ '\0') to Ansi Win32 API is not the right way, Why not instead of Phobos way ? 
    Does pass toMBsz(char[]) to Ansi Win32 API influence on the library portable? 
    Does Ansi Win32 API infulence on the library portalbe (My code is Unicode, just Ansi Win32API need local codepage encode, not me:) ?

    Some Tango modules only have Ansi Win32 API implementation, what Tango users can do ? copy the modue to somewhere, modify (char[] ~ '\0') to toMBSz(char[]) before use this module? 
    
   3) Since tango pass (char[] ~ '\0') to Ansi Win32 API everywhere, sometimes, it is diffcult to debug the code.
   
   Thank Tango team for the exciting Library you offered to all of us. 



  
   
        



-- 

yidabu <yidabu.nospam at gmail.com>
DWin http://www.dsource.org/projects/dwin

D语言 中文支持(D Chinese Support)
http://www.d-programming-language-china.org/
http://bbs.d-programming-language-china.org/
http://dwin.d-programming-language-china.org/
http://scite4d.d-programming-language-china.org/



More information about the Digitalmars-d mailing list