UTF-8 Everywhere

Sun Jun 19 23:36:09 PDT 2016

To me it seems that a lot of the time processing is more efficient with 
UCS-4 (what I call utf-32).  Storage is clearly more efficient with 
utf-8, but access is more direct with UCS-4.  I agree that utf-8 is 
generally to be preferred where it can be efficiently used, but that's 
not everywhere.  The problem is efficient bi-directional 
conversion...which D appears to handle fairly well already with text() 
and dtext().  (I don't see any utility for utf-16.  To me that seems 
like a first attempt that should have been deprecated.)

On 06/19/2016 05:49 PM, Walter Bright via Digitalmars-d wrote:
> http://utf8everywhere.org/
>
> It has a good explanation of the issues and problems, and how these 
> things came to be.
>
> This is pretty much in line with my current (!) opinion on Unicode. 
> What it means for us is I don't think it is that important anymore for 
> algorithms to support strings of UTF-16 or UCS-4.
>