Today's programming challenge - How's your Range-Fu ?

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Sat Apr 18 11:28:12 PDT 2015


On Sat, Apr 18, 2015 at 10:50:18AM -0700, Walter Bright via Digitalmars-d wrote:
> On 4/18/2015 4:35 AM, Jacob Carlborg wrote:
> >\u0301 is the "combining acute accent" [1].
> >
> >[1] http://www.fileformat.info/info/unicode/char/0301/index.htm
> 
> I won't deny what the spec says, but it doesn't make any sense to have
> two different representations of eacute, and I don't know why anyone
> would use the two code point version.

Well, *somebody* has to convert it to the single code point eacute,
whether it's the human (if the keyboard has a single key for it), or the
code interpreting keystrokes (the user may have typed it as e +
combining acute), or the program that generated the combination, or the
program that receives the data. When we don't know provenance of
incoming data, we have to assume the worst and run normalization to be
sure that we got it right.

The two code-point version may also arise from string concatenation, in
which case normalization has to be done again (or possibly from the
point of concatenation, given the right algorithms).


T

-- 
Mediocrity has been pushed to extremes.


More information about the Digitalmars-d mailing list