The Case Against Autodecode

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Tue May 31 12:46:35 PDT 2016


On Tue, May 31, 2016 at 07:40:13PM +0000, Wyatt via Digitalmars-d wrote:
> On Tuesday, 31 May 2016 at 19:20:19 UTC, Timon Gehr wrote:
> > 
> > The 'length' of a character is not one in all contexts.
> > The following text takes six columns in my terminal:
> > 
> > 日本語
> > 123456
> 
> That's a property of your font and font rendering engine, not Unicode.
> (Also, it's probably not quite six columns; most fonts I've tested,
> 漢字 are rendered as something like 1.5 characters wide, assuming your
> terminal doesn't overlap them.)
[...]

I believe he was talking about a console terminal that uses 2 columns to
render the so-called "double width" characters. The CJK block does
contain "double-width" versions of selected blocks (e.g., the ASCII
block), to be used with said characters.

Of course, using string length to measure string width is a risky
venture fraught with pitfalls, because your terminal may not actually
render them the way you think it should. Nevertheless, it does serve to
highlight why a construct like s.walkLength is essentially buggy,
because there is not enough information to determine which length it
should return -- length of the buffer in bytes, or the number of code
points, or the number of graphemes, or the width of the string. No
matter which choice you make, it only works for a subset of cases and is
wrong for the other cases.  This is a prime illustration of why forcing
autodecoding on every string in D is a wrong design.


T

-- 
Не дорог подарок, дорога любовь.


More information about the Digitalmars-d mailing list