Walter's Famous German Language Essentials Guide

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Thu May 5 16:47:15 PDT 2016


On Thu, May 05, 2016 at 05:20:00PM +0000, Chris via Digitalmars-d wrote:
> As a not on the side, there are those who say that letter-to-sound
> systems should never be rule based, they should purely be based on
> machine learning.  The proponents of this are usually native English
> speakers. For English you do need machine learning. For Spanish not so
> much. If you can feed the computer the rule "ch" = /tʃ/, why would you
> want to train it :)

Rule-based letter-to-sound systems don't work too well for English
precisely because you have to basically reproduce 500 years' worth of
sound change plus all the exceptions introduced by words borrowed from
other contemporous languages across the centuries. A rule-based system
possibly could work, provided the rules were extensive enough (and
multi-layered, to account for borrowed exceptions and other oddities).
But there comes a point where even the most industrious programmer would
throw up his hands and say, forget this exercise in futility, let's just
have the machine teach itself instead.

Rule-based systems work better for Spanish because the orthography is
much closer to actual pronunciation, and other parameters such as stress
is more predictable.  I'd venture to guess that rule-based systems might
not work as well for Russian, in spite of the orthography being almost
1-to-1 with actual pronunciation, because of unpreditable stress
positions which can fundamentally alter vowel values. At best, you'd
need a database of stress patterns for various words so that the accent
would fall in the correct places. Plus a set of exceptions for certain
archaic word combinations that have unusual stress.  If you had a
database of English stress positions, I think half the battle is already
won.

French would have the same problem as English, except that you could
just do as a first approximation:

	if (rand() > someFactor)
		word = word[0 .. $/2];

and then touch it up with a small set of exceptions.  :-P


T

-- 
English is useful because it is a mess. Since English is a mess, it maps
well onto the problem space, which is also a mess, which we call
reality. Similarly, Perl was designed to be a mess, though in the nicest
of all possible ways. -- Larry Wall


More information about the Digitalmars-d mailing list