Wide characters support in D

Mon Jun 7 22:26:56 PDT 2010

> You only need to do that where you are shipping closed
> source and for that, it should be trivial to get the
> compiler to generate all three versions. 

You will also need to do it in open source projects if you want to include generated template code into dynamic library as opposed to user's program (read as unnecessary space "burden" where code is repeated over and over again across user programs).

But, yes, closed source programs is a good particular example. True, you can compile all 3 versions. But the whole argument was about additional generated code which someone claimed will not happen. 

> 
> Your, right: it depends. In the few cases I can think of
> where more of the D code will be interacting with non D code
> than just processing the text, you could almost use void[]
> as your type. Where would you care about the encoding but
> not do much worth it?
> 
> Also unless you have large amounts of text, you are going
> to have to work hard to get perf problems. If you do have
> large amounts of text, you are going to be I/O bound (cache
> misses etc.) and at that point, the cost of any operation,
> is it's I/O. From that, Reading in some date, doing a single
> pass of processing on it and writing it back out would only
> take 2/3 long with translations on both side.
> 

True. But even simple string handling is faster for UTF-16. The time required to read 2 bytes from UTF-16 string is the same 1 byte from UTF-8. Generally, we have to read one code point after another (not more than this) since data guaranteed to be aligned by 2 byte boundary for wchar and 1 byte for char. Not to mention that converting 2 code points takes less time in UTF-16. And why not use this opportunity if system already natively support this?  

In addition, I want to mention that reading/writing file in text mode is very transparent. For instance, in Windows, the conversion will happen automatically from multibyte to unicode for open, fopen, etc. when text mode is specified. In general, it is a good practice since 1 byte char text is not necessary UTF-8 anyway and can be ANSI as well.

Also, some other OS use 2 bytes UTF-16 natively, so it's not just for Windows. If I am not wrong, Symbian should be one such example.