Ansi vs Unicode API

Mon Nov 16 03:10:42 PST 2009

On Mon, 16 Nov 2009 12:36:30 +0300, Walter Bright  
<newshound1 at digitalmars.com> wrote:

> Denis Koroskin wrote:
>> I'd like to raise 2 issues for a discussion.
>>  First, Phobos makes calls to different functions, based on the OS we  
>> are running on (e.g. CreateFileA vs. CreateFileW) and I wonder if it's  
>> *really* necessary, since Microsoft has a Unicode Layer for those  
>> Operating Systems.
>>  All an application needs to do to call W API on those OS'es is link  
>> with unicows.lib (which could be a part of Phobos). It does nothing on  
>> Win2k+ and only triggers on 9x OS family.
>>  A very good overview of it is written here:
>> http://msdn.microsoft.com/en-us/goglobal/bb688166.aspx
>
> The unicows doesn't do anything more than what Phobos does in attempting  
> to translate unicode into the local code page. All that using unicows  
> would do is cause confusion and installation problems as the user would  
> have to get a copy of unicows and install it. unicows doesn't exist on  
> default Windows 9x installations.
>
> There is simply no advantage to unicows.
>
>

End-users don't have to worry about it at all. They will just use W  
functions all the time and unicows will trigger and translate UTF16  
strings into ANSI strings automatically on those operating systems. The  
change would be transparent for them. There is also a redistributable  
version of unicows, so those users who want to deploy their software on  
Win9x could use it and don't force manual install of the .dll.

I was about to propose a drop of Win9x support initially, but thought it  
might get hostile reception...

>> Second, "A" API accepts ansi strings as parameters, not UTF-8 strings.  
>> I think this should be reflected in the function signatures, since D  
>> encourages distinguishing between UTF-8 and ANSI strings and not store  
>> the latter as char[].
>>  LPCSTR currently resolves to char*/const(char)*, but it could be  
>> better for it to be an alias to ubyte*/const(ubyte)* so that user  
>> couldn't pass unicode string to an API that doesn't expect one. The  
>> same is applicable to other APIs, too, for example, how does C stdlib  
>> co-operate with Unicode? I.e. is core.stdc.stdio.fopen() unicode-aware?
>
> Calling C functions means one needs to pass them what the host C system  
> expects. C itself doesn't define what character set char* is. If you use  
> the Phobos functions, those are required to work with unicode.

Since char*/char[] denotes a sequence of Unicode characters in D, I see no  
reason for the API that works with ANSI characters to accept it. For  
example, there is a std.windows.charset.toMBSz() function that returns an  
ANSI variant of a Unicode string. I think it might be preferred for it to  
return ubyte sequence instead of char sequence.

Ideally, I'd like to see all the function that aren't guarantied to work  
with UTF-8 strings to accept ubyte*/ubyte[] instead.