[Semi OT] The programming language wars

Joakim via Digitalmars-d digitalmars-d at puremagic.com
Sat Mar 21 12:13:10 PDT 2015


On Saturday, 21 March 2015 at 14:07:28 UTC, FG wrote:
> On 2015-03-21 at 06:30, H. S. Teoh via Digitalmars-d wrote:
>> On Sat, Mar 21, 2015 at 04:17:00AM +0000, Joakim via 
>> Digitalmars-d wrote:
>> [...]
>>> What I was going to say too, neither CLI or GUI will win, 
>>> speech
>>> recognition will replace them both, by providing the best of 
>>> both.
>>> Rather than writing a script to scrape several shopping 
>>> websites for
>>> the price of a Galaxy S6, I'll simply tell the intelligent 
>>> agent on my
>>> computer "Find me the best deal on a S6" and it will go find 
>>> it.
>>
>> I dunno, I find that I can express myself far more precisely 
>> and
>> concisely on the keyboard than I can verbally. Maybe for 
>> everyday tasks
>> like shopping for the best deals voice recognition is Good 
>> Enough(tm),
>> but for more complex tasks, I have yet to find something more 
>> expressive
>> than the keyboard.
>
> "Find me the best deal on a S6" is only a little more complex 
> than "make me a cup of coffee." Fine for doing predefined tasks 
> but questionable as an ubiquitous input method. It's hard 
> enough for mathematicians to dictate a theorem without using 
> any symbolic notation. There is too much ambiguity and room for 
> interpretation in speech to make it a reliable and easy input 
> method for all tasks. Even in your example:
>
> You say: "Find me the best deal on a S6."
> I hear: "Fine me the best teal on A.S. six."
> Computer: "Are you looking for steel?"

Just tried it on google's voice search, it thought I said "Find 
me the best deal on a last sex" the first time I tried.  After 
3-4 more tries- "a sex," "nsx," etc- it finally got it right.  
But it never messed up anything before "on," only the 
intentionally difficult S6, which requires context to understand. 
  Ask that question to the wrong person and they'd have no idea 
what you meant by S6 either.

My point is that the currently deployed, state-of-the-art systems 
are already much better than what you'd hear or what you think 
the computer would guess, and soon they will get that last bit 
right too.

> Now imagine the extra trouble if you mix languages. Also, how 
> do you include meta-text control sequences in a message? By 
> raising your voice or tilting your head when you say the magic 
> words? Cf.:
>
> "There was this famous quote QUOTE to be or not to be END QUOTE 
> on page six END PARAGRAPH..."

Just read that out normally and it'll be smart enough to know 
that the upper-case terms you highlighted are punctuation marks 
and not part of the sentence, by using various grammar and word 
frequency heuristics.  In the rare occurrence of real ambiguity, 
you'll be able to step down to a lower-level editing mode and 
correct it.

Mixing languages is already hellish with keyboards and will be a 
lot easier with speech recognition.

> Very awkward, if talking to oneself wasn't awkward already.

Put a headset on and speak a bit lower and nobody watching will 
know what you're saying or who you're saying it to.

> Therefore I just cannot imagine voice being used anywhere where 
> exact representation is required, especially in programming:
>
> "Define M1 as a function that takes in two arguments. The state 
> of the machine labelled ES and an integer number in range 
> between two and six inclusive labelled X. The result of M1 is a 
> boolean. M1 shall return true if and only if the ES member 
> labelled squat THATS SQUAT WITH A T AT THE END is equal to zero 
> modulo B. OH SHIT IT WAS NOT B BUT X. SCRATCH EVERYTHING."

As Paulo alludes to, the current textual representation of 
programming languages is optimized for keyboard entry.  
Programming languages themselves will change to allow fluid 
speech input.

On Saturday, 21 March 2015 at 15:13:13 UTC, Piotrek wrote:
> Just for fun. A visualization of the problem from 2007 (I doubt 
> there was breakthrough meanwhile)
>
> https://www.youtube.com/watch?v=MzJ0CytAsec

Got a couple minutes into that before I knew current speech 
recognition is much better, as it has progressed by leaps and 
bounds over the intervening eight years.  Doesn't mean it's good 
enough to throw away your keyboard yet, but it's nowhere near 
that bad anymore.

On Saturday, 21 March 2015 at 15:47:14 UTC, H. S. Teoh wrote:
> It's about the ability to abstract, that's
> currently missing from today's ubiquitous GUIs. I would 
> willingly leave
> my text-based interfaces behind if you could show me a GUI that 
> gives me
> the same (or better) abstraction power as the expressiveness of 
> a CLI
> script, for example. Contemporary GUIs fail me on the following 
> counts:
>
> 1) Expressiveness: there is no simple way of conveying complex
--snip--
> 5) Precision: Even when working with graphical data, I prefer 
> text-based
> interfaces where practical, not because text is the best way to 
> work
> with them -- it's quite inefficient, in fact -- but because I 
> can
> specify the exact coordinates of object X and the exact 
> displacement(s)
> I desire, rather than fight with the inherently imprecise mouse 
> movement
> and getting myself a wrist aneurysm trying to position object X
> precisely in a GUI. I have yet to see a GUI that allows you to 
> specify
> things in a precise way without essentially dropping back to a
> text-based interface (e.g., an input field that requires you to 
> type in
> numbers... which is actually not a bad solution; many GUIs 
> don't even
> provide that, but instead give you the dreaded slider control 
> which is
> inherently imprecise and extremely cumbersome to use. Or worse, 
> the text
> box with the inconveniently-small 5-pixel up/down arrows that 
> changes
> the value by 0.1 per mouse click, thereby requiring an 
> impractical
> number of clicks to get you to the right value -- if you're 
> really
> unlucky, you can't even type in an explicit number but can only 
> use
> those microscopic arrows to change it).

A lot of this is simply that you are a different kind of computer 
user than the vast majority of computer users.  You want to drive 
a Mustang with a manual transmission and a beast of an engine, 
whereas most computer users are perfectly happy with their Taurus 
with automatic transmission.  A touch screen or WIMP GUI suits 
their mundane tasks best, while you need more expressiveness and 
control so you use the CLI.

The great promise of voice interfaces is that they will _both_ be 
simple enough for casual users and expressive enough for power 
users, while being very efficient and powerful for both.  We 
still have some work to do to get these speech recognition 
engines there, but once we do, the entire visual interface to 
your computer will have to be redone to best suit voice input and 
nobody will use touch, mice, _or_ keyboards after that.


More information about the Digitalmars-d mailing list