More about speech recognition

Pretending to think intelligently!

 That’s right! I just need to pretend to think intelligently! Just like speech recognition – it just pretends. And sometimes it slips badly.

It is true that I wrote yesterday’s entry with Dragon NaturallySpeaking speech recognition, except for a few words. What I didn’t mention was all the corrections I made. When I say that the software has the capabilities of a young adult, this is partly praise for the software but also partly reflects my cynicism of the human race. Humans lose concentration, get distracted and make typographical errors; computers do not. Instead, computers completely lack the ability to understand what is said, so will without hesitation write the most insane things if that’s what your speech most sounds like.

I am sure most of my readers will get much better results from Dragon NaturallySpeaking (or other speech recognition) than I do. My Norwegian accent is quite noticeable, and my large vocabulary mostly comes from reading. There are, to put it bluntly, a number of words I use regularly which I can’t say for sure that I have ever heard spoken. English is my third language, and back in the 1970s even my English teachers did not actually speak English like a native. Well, they spoke it like a native Norwegian! Almost the only actual English I heard while growing up was pop songs. Perhaps as a result of this, I can sometimes reduce the error rate of my dictation by singing difficult passages. (I am however reluctant to do this if the neighbors upstairs can hear me…)

Another difference from most of you is that my voice gets hoarse* after only a couple paragraphs. (*And yes, Dragon of course wrote “horse” there originally.) I simply speak so little that my body can’t take the strain of speaking out loud for more than a couple minutes without a long break. This means I can only take a few phone calls each day at work, but it also means that dictating an entry for my journal takes much longer than typing it. Although I can actually speak more softly to my computer than I can to the customers, another bonus point for speech recognition over humans. But for most of you, this problem does not exist. Almost every human I have met seems able to talk continually for hours… ^_^

Finally, there’s the question of training. If you speak clearly and without too much accent, the software works OK right out of the box. But the more you use it, the more reliable it becomes. This was quite noticeable with version 9, which I used extensively. At that time, I had serious problems with my wrists. For this reason I found myself using speech recognition even though it was less than perfect (and less perfect than now, for certain.) After weeks of use, it actually started to get used to my pronunciation and my choice of words. I am sure the same would happen with version 10 and 11, but in the meantime my wrists have become much better, when my throat has become worse. (In fact, Nuance Communications claims that it’s learning abilities have been significantly improved. In my very limited experience, this seems to be true.)

So when you see the many YouTube videos of people using Dragon NaturallySpeaking quickly and perfectly, you should take into account that they have probably spent weeks training the system, in addition to having an almost perfect pronunciation. Even then, I would guess that some of those videos are not the first try, or perhaps even the second.

But under those conditions, the software is indeed able to take dictation faster and more reliably than the vast majority of human beings. Let’s face it: Even with an error or two or three, you would not be able to transcribe that fast unless you’re a highly trained professional.

I have dictated this entry as well, and with two space heaters humming loudly in the background. That we try to dictate this paragraph without making any corrections, just to show you the difference. My throat is starting to get sore, but on the other hand this to fairly long entries have given the software the charms to get better used to my pronunciation and vocabulary. As you can see, it still makes a number of mistakes. But at least it doesn’t make typos. Back when I included links to my year ago entries, I would lead to throw then after a year and almost without exception find several typos in them. Then the next year I would read through them again, I still find a couple typos. It is really hard to read what I have actually written, not what I intended to write.

Of course, this is true with speech recognition as well. And it can get even more creative with its arrows than typos. <– – Error included as proof. If you are dictating a business proposal, you should definitely either let it rest overnight and read it again in the morning, or let someone else read through it before you send it. Then again, that’s probably a good rule anyway.

 

 

3 thoughts on “More about speech recognition

  1. Jenna and I spent a couple of HOURS the other day saying things to the Dragon on my phone (amusing mental image) and giggling about what it actually heard. So many of our words that should be monosyllabic are polysyllabic. Hysterical. Not so amusing when you try to use it, but I’ve found by speaking with large spaces between the words and moving our jaws and lips exaggeratedly (for us), it works fairly well.

  2. Speech recognition on phones today is similar to what it was on desktop computers when I first discovered it. I try it from time to time, but quickly give up. But smartphones double their processing power every year and a half or so, just like the computers did before them, so it should only be a matter of time. In four years or so it should be genuinely useful, I expect, if things continue roughly like now. And people are used to talking to phones anyway, so it may seem less strange than talking to a computer.

Leave a Reply

Your email address will not be published. Required fields are marked *