The Case Against Autodecode

Sun May 29 13:47:32 PDT 2016

On Sun, May 29, 2016 at 03:55:22PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:
> On 05/29/2016 09:42 AM, Tobias M wrote:
> > On Friday, 27 May 2016 at 19:43:16 UTC, H. S. Teoh wrote:
> > > On Fri, May 27, 2016 at 03:30:53PM -0400, Andrei Alexandrescu via
> > > Digitalmars-d wrote:
> > > > On 5/27/16 3:10 PM, ag0aep6g wrote:
> > > > > I don't think there is value in distinguishing by language.
> > > > > The point of Unicode is that you shouldn't need to do that.
> > > > 
> > > > It seems code points are kind of useless because they don't
> > > > really mean anything, would that be accurate? -- Andrei
> > > 
> > > That's what we've been trying to say all along! :-P  They're a
> > > kind of low-level Unicode construct used for building "real"
> > > characters, i.e., what a layperson would consider to be a
> > > "character".
> > 
> > Code points are *the fundamental unit* of unicode. AFAIK most (all?)
> > algorithms in the unicode spec are defined in terms of code points.
> > Sure, some algorithms also work on the code unit level. That can be
> > used as an optimization, but they are still defined on code points.
> > 
> > Code points are also abstracting over the different representations
> > (UTF-...), providing a uniform "interface".
> 
> So now code points are good? -- Andrei

It depends on what you're trying to accomplish. That's the point we're
trying to get at.  For some operations, working with code points makes
the most sense. But for other operations, it does not.  There is no one
representation that is best for all situations; it needs to be decided
on a case-by-case basis.  Which is why forcing everything to decode to
code points eventually leads to problems.

T

-- 
Customer support: the art of getting your clients to pay for your own incompetence.