Multibyte support on Windows, Phobos vs Tango, which is right?
yidabu
yidabu.nospam at gmail.com
Thu Apr 10 00:58:45 PDT 2008
On Wed, 9 Apr 2008 23:51:59 -0800
"Kris" <foo at bar.com> wrote:
> Yidabu:
>
> Tango has a multi-platform API based around Unicode, thus it is not biased
> for windows, linux, or darwin. All the items you mention appear to be
> reasonably specific to Win32, so keep that in mind when reading this reply:
>
>
> 1) You'll find something functionally similar in tango.sys.win32.CodePage
>
>
> 2) Like many O/S, Tango expects file names to be Unicode. This helps makes
> the library portable. On Win32 the blahW() functions are used, with utf8 to
> utf16 conversion applied internally, except when you explicitly stipulate
> the version=Win32SansUnicode compiler option. If you do that, Tango
> currently does no internal conversion for file names. In short, if you
> explicitly disable Unicode support within the library then you currently
> need to handle Win32 code-page conversion yourself (see #1). This might be a
> problem if you're running Tango on Win95 or an old Win32S hybrid
>
>
> 3) you have a recent ticket open for this specific issue, and it is somewhat
> related to #2 above. By default, Tango should happily handle Unicode names
> in a portable manner between O/S. Your ticket has identified a problem with
> the zip package, which does need to be fixed. Perhaps you'd like to try
> fixing the bug in the zip package yourself? Tango is open-source, and
> patches are always welcome. If you'd like to add some more multibyte
> testcases to the codebase, we'd certainly be happy to run them.
>
>
> Hope that helps
>
>
>
>
> "yidabu" <yidabu.nospam at gmail.com> wrote in message
> news:20080410071434.587eb8e9.yidabu.nospam at gmail.com...
> > Multibyte support on Windows, Phobos vs Tango, which is right ?
> >
> > 1 Phobos has toMBSz function for Converts the UTF-8 string s into a
> > null-terminated string in a Windows
> > 8-bit character set.
> > like this:
> >
> > char* toMBSz(char[] s, uint codePage = 0)
> > {
> > // Only need to do this if any chars have the high bit set
> > foreach (char c; s)
> > {
> > if (c >= 0x80)
> > {
> > //do convert
> > }
> > }
> > return std.string.toStringz(s);
> > }
> >
> > Tango has not this function, is it necessary ?
> >
> > 2 Is toMBSz(char[]) same as char[] ~ '\0' ?
> >
> > for example, FileCreateA
> >
> > Phobos way:
> > char[] name;
> > CreateFileA(toMBSz(name) ...)
> >
> > Tango way:
> > char[] name;
> > FileCreateA( name ~ '\0' ...)
> >
> > Is toMBSz(char[]) always same as char[] ~ '\0' ?
> > Is toMBSz("Chinese汉语"c) always same as "Chinese汉语"c ~ '\0' ?
> >
> > If Phobos is right, too many bugs in Tango, Tango use char[] ~ '\0'
> > everywhere for calling A version Windows API!
> >
> >
> > 3 Phobos zip vs Tango Zip
> >
> > I used Phobos zip module, it works fine, a trick is
> > zip.ArchiveMember.name should be locale encode for multibyte environment.
> >
> > Tango way:
> > char[][] files = [r"D:\Chinese中文.txt"];
> > createArchive(r"test.zip", Method.Deflate, files);
> >
> > cause Exception:
> > object.Exception: cannot encode character "20013" in codepage 437.
> >
> > Tango seems lacks multibyte support on Windows,
> > and have not run special unittests for multibyte environment on Windows
> > before publish a new vesion.
> >
Kris,
Thanks for you reply.
1) I know the CodePage module, the issue is Tango does not use it for conversion file names.
2) since pass (char[] ~ '\0') to Ansi Win32 API is not the right way, Why not instead of Phobos way ?
Does pass toMBsz(char[]) to Ansi Win32 API influence on the library portable?
Does Ansi Win32 API infulence on the library portalbe (My code is Unicode, just Ansi Win32API need local codepage encode, not me:) ?
Some Tango modules only have Ansi Win32 API implementation, what Tango users can do ? copy the modue to somewhere, modify (char[] ~ '\0') to toMBSz(char[]) before use this module?
3) Since tango pass (char[] ~ '\0') to Ansi Win32 API everywhere, sometimes, it is diffcult to debug the code.
Thank Tango team for the exciting Library you offered to all of us.
--
yidabu <yidabu.nospam at gmail.com>
DWin http://www.dsource.org/projects/dwin
D语言 中文支持(D Chinese Support)
http://www.d-programming-language-china.org/
http://bbs.d-programming-language-china.org/
http://dwin.d-programming-language-china.org/
http://scite4d.d-programming-language-china.org/
More information about the Digitalmars-d
mailing list