The Case Against Autodecode

Sun May 29 06:13:36 PDT 2016

On Sunday, 29 May 2016 at 12:41:50 UTC, Chris wrote:
> Ok, you have a point there, to be precise <sh> is a multigraph 
> (a digraph)(cf. [1]). In French you can have multigraphs 
> consisting of three or more characters <eau> /o/, as in Irish 
> <aoi> => /i:/. However, a phoneme is not necessarily a spoken 
> "character" as <sh> represents one phoneme but consists of two 
> "characters" or graphemes. <th> can represent two different 
> phonemes (voiced and unvoiced "th" as in `this` vs. `thorough`).

What I meant was, a phoneme is the "character" (smallest unit) in 
a spoken language, not that it corresponds to a character 
(whatever that means).

> My point was that we have to be _very_ careful not to mix our 
> cultural experience with written text with machine 
> representations. There's bound to be confusion. That's why we 
> should always make clear what we refer to when we use the words 
> grapheme, character, code point etc.

I used 'character' in quotes, because it's not a well defined 
therm. Code point, grapheme and phoneme are well defined.