The Case Against Autodecode
Tobias M via Digitalmars-d
digitalmars-d at puremagic.com
Sun May 29 06:13:36 PDT 2016
On Sunday, 29 May 2016 at 12:41:50 UTC, Chris wrote:
> Ok, you have a point there, to be precise <sh> is a multigraph
> (a digraph)(cf. [1]). In French you can have multigraphs
> consisting of three or more characters <eau> /o/, as in Irish
> <aoi> => /i:/. However, a phoneme is not necessarily a spoken
> "character" as <sh> represents one phoneme but consists of two
> "characters" or graphemes. <th> can represent two different
> phonemes (voiced and unvoiced "th" as in `this` vs. `thorough`).
What I meant was, a phoneme is the "character" (smallest unit) in
a spoken language, not that it corresponds to a character
(whatever that means).
> My point was that we have to be _very_ careful not to mix our
> cultural experience with written text with machine
> representations. There's bound to be confusion. That's why we
> should always make clear what we refer to when we use the words
> grapheme, character, code point etc.
I used 'character' in quotes, because it's not a well defined
therm. Code point, grapheme and phoneme are well defined.
More information about the Digitalmars-d
mailing list