Planning to migrate SDWF to Unicode

Stewart Gordon smjg_1998 at yahoo.com
Sat Jan 21 16:56:47 PST 2012


Those who've been following SDWF will by now have realised that it abuses char[] for ANSI 
strings, whereas D strings are meant to be in Unicode.  It's high time I did something 
about this.

When I started on it, I was still using Windows 98, which has very limited Unicode 
support.  But that was years ago now.  And it must be coming on 7 years now since MS 
discontinued support for it.  So maybe I might as well drop Windows 9x support, just like 
16-bit support was dropped with the creation of D (which was only 4 years after Windows 
95, after all).

As such, I plan to change SDWF to work in Unicode.  Probably using UTF-16 internally, but 
possibly giving the programmer the choice between UTF-8 and UTF-16.

But this begs the question of what to do with the existing char-based API.  Possibilities 
I've thought of:

(a) Just get rid of it.  Programmers upgrading to the new SDWF version will be forced to 
change instances of char to wchar; what more there is to do depends on what else the 
program does with character/string data.

(b) Keep functions that take a char or char[] parameter, make them convert from ANSI to 
UTF-16, but deprecate them.  Thinking about it now, there are problems:
- In order to have versions of each function that return an ANSI string and that return a 
Unicode string, I would need to name them differently, which could get ugly.
- When returning ANSI, what would happen to characters outside the code page?
- Mixing ANSI and Unicode could also have adverse effects on the interpretation of string 
literals.
So maybe this isn't a good plan at all.

(c) Use versioning to give the programmer the choice of an ANSI API or a UTF-16 API, 
rather like the WindowsAPI bindings themselves.

(d) Change char functions to use UTF-8.  This would break any code that relies on the 
characters being ANSI, or even that manipulates text on a one character, one byte basis. 
As with (c), versioning could be used to give a choice between UTF-8 and UTF-16.


If path (b) or (c) is taken, the ANSI API could later be removed.  Once this is done, or 
if path (a) is taken, we could add UTF-8 support, thereby ending up at (d).

It's early days yet, but the thread I started a few hours ago ("D1, D2 and the future of 
libraries") could still lead to my migrating SDWF to D2.  If it does, I'll likely combine 
the migration to Unicode with this.

Thoughts?

Stewart.


More information about the Digitalmars-d mailing list