Walter's Famous German Language Essentials Guide

Fri May 6 03:24:25 PDT 2016

On Thursday, 5 May 2016 at 23:47:15 UTC, H. S. Teoh wrote:
>
> Rule-based letter-to-sound systems don't work too well for 
> English precisely because you have to basically reproduce 500 
> years' worth of sound change plus all the exceptions introduced 
> by words borrowed from other contemporous languages across the 
> centuries. A rule-based system possibly could work, provided 
> the rules were extensive enough (and multi-layered, to account 
> for borrowed exceptions and other oddities). But there comes a 
> point where even the most industrious programmer would throw up 
> his hands and say, forget this exercise in futility, let's just 
> have the machine teach itself instead.

It's not just sound changes, English is just weird from a 
non-native speaker's point of view. As Kurt Tucholsky, one of the 
best German writers ever, once said, English is a simple and a 
difficult language at the same time. It consists of foreign words 
that are pronounced wrongly. English pronunciation makes any 
speaker of a Latin language cringe. In many European languages, 
and certainly in Latin languages, the letter-to-sound 
correspondence is more or less one-to-one: <a> is /a/, <e> is /e/ 
etc. In English it's often /ei/ and /i:/. <i> is often /ai/ (of 
for f**k's sake!): "emeritus", a Latin word, is pronounced 
/e.'me(:).ri.tus/, in English it's /em at .'rai.d at s/. This just 
makes you cringe. Native speakers of English often don't realize 
how weird their pronunciation sounds to those who natively speak 
the language they borrowed the words from (around 60% of the 
words). Makes me laugh when I hear English speakers who say "Oh, 
there is no Irish word for 'afterhours'!?" - Well, what's the 
English for "restaurant", "evict", "condone", "depot", "deposit" 
... and what's the English for "language"?

> Rule-based systems work better for Spanish because the 
> orthography is much closer to actual pronunciation, and other 
> parameters such as stress is more predictable.  I'd venture to 
> guess that rule-based systems might not work as well for 
> Russian, in spite of the orthography being almost 1-to-1 with 
> actual pronunciation, because of unpreditable stress positions 
> which can fundamentally alter vowel values. At best, you'd need 
> a database of stress patterns for various words so that the 
> accent would fall in the correct places. Plus a set of 
> exceptions for certain archaic word combinations that have 
> unusual stress.  If you had a database of English stress 
> positions, I think half the battle is already won.
>
> French would have the same problem as English, except that you 
> could just do as a first approximation:
>
> 	if (rand() > someFactor)
> 		word = word[0 .. $/2];
>
> and then touch it up with a small set of exceptions.  :-P
>
>
> T

Are Russian stress-rules based on context? Long vs. short vowels, 
palatalized vs. velarized consonants etc.? If yes, you can 
program rules.