Ansi vs Unicode API
Denis Koroskin
2korden at gmail.com
Mon Nov 16 03:10:42 PST 2009
On Mon, 16 Nov 2009 12:36:30 +0300, Walter Bright
<newshound1 at digitalmars.com> wrote:
> Denis Koroskin wrote:
>> I'd like to raise 2 issues for a discussion.
>> First, Phobos makes calls to different functions, based on the OS we
>> are running on (e.g. CreateFileA vs. CreateFileW) and I wonder if it's
>> *really* necessary, since Microsoft has a Unicode Layer for those
>> Operating Systems.
>> All an application needs to do to call W API on those OS'es is link
>> with unicows.lib (which could be a part of Phobos). It does nothing on
>> Win2k+ and only triggers on 9x OS family.
>> A very good overview of it is written here:
>> http://msdn.microsoft.com/en-us/goglobal/bb688166.aspx
>
> The unicows doesn't do anything more than what Phobos does in attempting
> to translate unicode into the local code page. All that using unicows
> would do is cause confusion and installation problems as the user would
> have to get a copy of unicows and install it. unicows doesn't exist on
> default Windows 9x installations.
>
> There is simply no advantage to unicows.
>
>
End-users don't have to worry about it at all. They will just use W
functions all the time and unicows will trigger and translate UTF16
strings into ANSI strings automatically on those operating systems. The
change would be transparent for them. There is also a redistributable
version of unicows, so those users who want to deploy their software on
Win9x could use it and don't force manual install of the .dll.
I was about to propose a drop of Win9x support initially, but thought it
might get hostile reception...
>> Second, "A" API accepts ansi strings as parameters, not UTF-8 strings.
>> I think this should be reflected in the function signatures, since D
>> encourages distinguishing between UTF-8 and ANSI strings and not store
>> the latter as char[].
>> LPCSTR currently resolves to char*/const(char)*, but it could be
>> better for it to be an alias to ubyte*/const(ubyte)* so that user
>> couldn't pass unicode string to an API that doesn't expect one. The
>> same is applicable to other APIs, too, for example, how does C stdlib
>> co-operate with Unicode? I.e. is core.stdc.stdio.fopen() unicode-aware?
>
> Calling C functions means one needs to pass them what the host C system
> expects. C itself doesn't define what character set char* is. If you use
> the Phobos functions, those are required to work with unicode.
Since char*/char[] denotes a sequence of Unicode characters in D, I see no
reason for the API that works with ANSI characters to accept it. For
example, there is a std.windows.charset.toMBSz() function that returns an
ANSI variant of a Unicode string. I think it might be preferred for it to
return ubyte sequence instead of char sequence.
Ideally, I'd like to see all the function that aren't guarantied to work
with UTF-8 strings to accept ubyte*/ubyte[] instead.
More information about the Digitalmars-d
mailing list