First Impressions!

H. S. Teoh hsteoh at quickfur.ath.cx
Fri Dec 1 23:16:45 UTC 2017


On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via Digitalmars-d wrote:
> On 11/30/2017 9:23 AM, Kagamin wrote:
> > On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
> > > Be aware Microsoft is alone in thinking that UTF-16 was awesome.
> > > Everybody else standardized on UTF-8 for Unicode.
> > 
> > UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C,
> > Swift, Dart and ms tech, which is 28% of tiobe index.
> 
> "was" :-) Those are pretty much pre-surrogate pair designs, or based
> on them (Dart compiles to JavaScript, for example).
> 
> UCS2 has serious problems:
> 
> 1. Most strings are in ascii, meaning UCS2 doubles memory consumption.
> Strings in the executable file are twice the size.

This is not true in Asia, esp. where the CJK block is extensively used.
A CJK block character is 3 bytes in UTF-8, meaning that string sizes are
150% of the UCS2 encoding.  If your code contains a lot of CJK text,
that's a lot of bloat.

But then again, in non-Latin locales you'd generally store your strings
separately of the executable (usually in l10n files), so this may not be
that big an issue. But the blanket statement "Most strings are in ASCII"
is not correct.


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.


More information about the Digitalmars-d mailing list