Dragon NaturallySpeaking 15

Screenshot anime Overlord, season 3, episode 8, last scene

Why is there a Dragon here? For speaking, naturally! Dragon NaturallySpeaking is the world’s premiere speech recognition software, now with Deep Learning Artificial Intelligence that adjusts to your accent and the common cold. Fire breathing not included.

Today I upgraded (in a manner of speaking) from Dragon NaturallySpeaking version 13 professional individual to Dragon NaturallySpeaking version 15 home. I virtually never used the more advanced features of the earlier version.

The most important part for me is accuracy of recognition, and I have to say that version 15 is almost indistinguishable from magic in that regard. And I mean right out of the box: There is no longer even an option to train the program by reading a text for it. Version 13 was pretty good after training and a few days of practice. Version 15 is that good right out of the box. (At least I believe it doesn’t have access to my previous training, as it required me to uninstall the previous version and reboot the computer before I was allowed to install the newest version.)

I have used and reviewed many different versions of Dragon NaturallySpeaking over the years, both before and after it was acquired by Nuance. There has definitely been progress! I believe the first version I reviewed was either six or seven, and I generously compared it to homesick Asian high school exchange student. I could probably have added seasick as well, as its performance was unimpressive, to say the least. If you had functioning hands, you were better off using those, even if you typed with one finger.

Those days are definitely gone! Dragon NaturallySpeaking 15 takes dictation like a highly trained secretary, only faster. Actually, Dragon has outpaced secretaries for at least a couple of versions now, but this required you to speak clearly and train the program first. And the results were less impressive for me, who has a strong Scandinavian accent. Actually, “accent” might be too weak a word. If you are familiar with the computer game “Skyrim”, the pronunciation by the Nord bandits in that game is pretty close to how I speak in real life. I am not sure how a highly trained secretary will handle that, but Dragon NaturallySpeaking 15 has well over 99% accuracy, right out of the box, with that kind of foreign accent.

***

There are still some challenges. In my experience, they are not too bad, but I see a lot of one star reviews on Amazon. Most notably, Dragon is squeamish about working with applications it doesn’t know. Supposedly this includes earlier versions of Microsoft Office. When I started writing in LibreOffice, Dragon NaturallySpeaking automatically popped up to the “Dictation box” where you can dictate and edit your text before transferring it to the target application. It’s an okay solution in my opinion, but it can be distracting, and you cannot interact directly with the target program using your voice for instance “click file save” the way you can in supported programs. Removing the checkmark for automatically opening Dictation box lets me dictate directly in LibreOffice, but it still struggles with commands, and you cannot edit the text with Dragon after you dictate it.

I have the same problem with my favorite browser, Vivaldi. Admittedly that is not very common browser, So I installed the Dragon Web extension For chrome.As you can see from the previous sentence, that didn’t work too well, and it doesn’t work too well in Google Chrome either. Luckily I have fingers, and so Dictation Box it is. But Google Chrome is by far the most popular browser for Windows, and not having native support for that makes the program seem rushed, at best. Especially when you consider that Dragon NaturallySpeaking is a very expensive program. It is not so bad by Norwegian standards, since both salaries and living expenses here are already very high. Even so, I only buy Dragon NaturallySpeaking when it is discounted, as it was in this case. In the USA, a single person could eat for a month for this much money, and in the actual developing world even more. So in that perspective, you would expect a more polished product than this.

But what it does well, is take dictation. And at that, it is the best in the world. No software and no human can match it for the combination of speed, accuracy, and fast learning.

Dragon Professional Individual 15

Dragon from video game Skyrim

No need to shout, the Dragon understands my Nordic dialect right away!

Over the years, I have made a habit of reviewing the various versions of Dragon NaturallySpeaking. Lately, Nuance has stopped using the phrase NaturallySpeaking in most contexts, but it is still the same product, and it is now up to version 15.

As the software has become more expensive again, and as it is already good enough for my limited use, I have started skipping some versions. Dragon version 13 was already good enough that I did not really expect it to get any better. Impressively, Dragon version 15 is actually noticeably better right out of the box.

Dragon version 15 uses a new “deep learning” technology similar to what is used in the most successful artificial intelligence projects. Dragon has always (or at least for as long as I have used it) had the ability to improve based on feedback from the user, as well as adapt its vocabulary and writing style by reading through documents. While these options still exist, there is less focus on them now as Dragon quietly adjusts in the background during everyday use.

Dragon has also clearly had some opportunity to acquaint itself with human speech in general before shipping to the customer: The product is amazingly accurate right out of the box. Longtime readers (if any) may remember that I compared some of the early versions to homesick exchange students from other continents. That time is long gone. Dragon version 15 understands even my “Skyrim” pronunciation of English (I grew up in Norway in the 1960s, where even the English teachers has rarely if ever been to England, let alone America or Australia.)

There is one problem that has dogged this software from the start, and it still remains, even if just barely. When we speak, we don’t actually pronounce periods at the end of the sentence; rather, we slightly change the tone of our pronunciation toward the end, typically speaking less forcefully. Conversely, we don’t actually pronounce a capital character at the beginning of a sentence; instead, we pronounce the first sound slightly differently from the rest. Ideally, speech recognition software might be able to use this to take dictation without requiring us to specify punctuation. Dragon NaturallySpeaking used to have this functionality, but I gave up on it pretty quickly. What actually happens is that even when I dictate punctuation, there is a slight increase in mistakes at the very beginning and end of the sentences. This is especially true if I don’t pronounce some form of punctuation at the end of my string of words, for instance because I run out of breath during a long sentence. I have to say, however, that this problem has been almost eradicated in the latest version of Dragon.

To me, recognition accuracy is by far the most important part of any speech recognition engine. But Dragon 15 has also some other features in addition to the improved accuracy. It has better support for various modern software, and it allows voice activated macros. (I believe this feature was also in version 13, but I did not use it then and I don’t use it now. In any case, functions like “insert signature” should be part of your email software, rather than your speech recognition software.) Also, the big unnecessarily helpful sidebar with examples no longer starts up by default. It used to do, and is also used to permanently displace any windows that happened to be in its way.

As usual, I am including a paragraph where I don’t in any way correct this transcription. This is that paragraph. (It may not be obvious to the reader, but that should be “the transcription” in the first line above.) Dragon used to be available in a few languages besides English; I am pretty sure I saw touch at some point, and Japanese? I can’t find any trace of that now, but I will admit that I have not looked very carefully.

Not too bad, huh? That should of course not be “touch” in the previous paragraph, but rather Dutch, the language in the Netherlands. (It actually got it right this time without correction. Go figure.)

Dragon NaturallySpeaking 12 – part 2

“If you don’t listen to everything, you won’t understand anything.” When dictating, speak in statements, or at least phrases. Don’t stop randomly, for instance between “the” and noun.

I have now had the new version of Dragon NaturallySpeaking for a couple days. With my throat condition, that probably corresponds to a couple hours for those of you who talk a lot. I intend to use Dragon to dictate this entry, but I I will still need to make corrections. Perhaps you won’t, if you are a native English speaker without too much accent or dialect.

I am impressed by how quickly  Dragon has adapted to my voice.  It certainly happened much faster than with any earlier version. In all fairness, I also have more experience with Dragon now. For instance, as I mentioned in my previous entry,  I have made sure to perform training at different times of the day and at the beginning and end of a “speech”.

(I actually dictated the previous paragraph without making any corrections, but that’s not the rule for longer paragraphs yet.)

*** 

A problem with browsers: I haven’t heard about this from anyone else, but I have found Dragon to operate erratically in text entry fields in browsers. This could be a serious drawback, considering how much time we spend on the Internet these days, both at home and in the office. At first I thought the problem was only with Opera, which is my browser of choice. This program is not explicitly supported by Dragon, and in version 11 the text field where I write my journal was marked as unrecognized. While I could try to dictate there, the result was usually pretty bad. In version 12, Dragon alternates between “unknown text field” and “normal mode”. If I dictate while in normal mode, it seems to work well enough. If it is in unknown mode, I can usually just wait and it will switch to normal mode  after a few seconds. Even so, the hotkeys don’t work, and corrections  frequently mess up the text a little. So for longer texts,  I tend to use the DragonPad and just paste the result into the browser.

Unfortunately, I have similar problems in Internet Explorer when using Google+. Again, this may be a problem with that particular application – even typing can sometimes be sluggish in Google+ – but there are tens of millions of people who use that application frequently. Then again, it might be just me. Since I am one of the first to actually buy the product, there aren’t much in the way of reviews for me to compare with.

Is this a big deal? After a few days, you would probably not need to make corrections every time you post. A more serious problem might be if parts of the text are missing because you dictated while it was in “unknown field” mode. Again, this could be peculiar to my computer – there certainly doesn’t seem to be any problems in the demonstrations on YouTube. (Then again, they use neither Opera nor Google plus.)

***

 I haven’t had any problems with other programs. Dragon works beautifully with yWriter, the program I use when writing fiction. It seems to work fine with all kinds of notepads, whether plaintext or rich text. The commands for opening programs, switching between programs or clicking on buttons work as expected. And the on-screen help which came with version 11 makes it unnecessary to memorize the handbook with its dozens and dozens of commands. I am sure there are a number of features that I am never going to use, but better that than the other way around. And in version 12 you can even turn off features at a very detailed level if you’re afraid of activating them by mistake or if you simply need more speed.

You guys, I really feel like I can’t get across how smart this program is. When I first tried Dragon NaturallySpeaking approximately a decade ago, I compared it to a drunk and homesick high school exchange student. I compared version 11 to a native English speaker with a college education. But version 12… It is like a professional secretary with a genius IQ. Oh, it still has problems now and then, but it has only spent a couple hours with me, and there are several sounds in English that Scandinavians of my generation simply cannot pronounce. I am not sure any of my English-speaking readers would be able to understand me that well after listening to me for a couple hours.

Because I have spent decades mostly in silence, I cannot dictate a long entry like this without taking breaks. My voice simply dries up. If not for this physical handicap, I would be sorely tempted to do exactly what Nuance proposes in its slogan: “Stop typing, start talking.” It really is that impressive.

The Dragon has landed!

 

Dragon NaturallySpeaking 12 became ready for download today for us  existing Dragon users who had pre-ordered. I’ll come back to the installation shortly.

For those who do not know, Dragon NaturallySpeaking is a voice input program for the Windows computer, and the leader in this category. It takes dictation but also allows you to open programs, search the web, compose mail and edit existing texts without using your hands. As such, it solves an acute problem for those who don’t have hands or can’t use them. For us who have hands, it is most useful for dictation. It is fast and, with a little practice, amazingly accurate. The new version claims a 20% increase in accuracy, putting it well above 99% accuracy with 15 minutes of training. In practice, it takes longer, but the program keeps learning the more you use it. When you see an experienced user work with Dragon 11.5 (the previous version) it is “indistinguishable from magic”.

Installation:  The download link from Nuance arrived by email before I woke up in the morning. A separate mail also contained link to the training video. While I am personally a fan of reading, the training video will surely be welcome by dyslexic users, another core customer group. (The program can also read text out loud, even text you have not dictated.)

The download process proceeds in several steps. You first download a tiny download manager program. It does not really matter much where you save this, it is very small. This program must be run to start the main download. The main download is a compressed file, but still close to 3 gigabytes. This must again be unpacked to a larger set of files before the actual installation. During the unpacking process, both the compressed file and the unpacked file take up space simultaneously, and that’s before the actual install into the Program Files directory. This program is not recommended for people with small disks!

It is recommended that you back up the compressed file so that you can install from this if your computer suddenly crashes or if you simply decide to buy a new at some point.

The download went without glitches, but the install itself caused me some trouble. A ways into the installation, the program warned me that several processes had to be closed down before it could continue. Three of these were unknown to me, and did not appear with the given names in Windows Task Manager. I had to break off the installation and reboot the computer, then run the install again. The install did not automatically resume, and if I had not taken note of where the unpacked file was saved, I would have had to restart from the compressed file. I would recommend you reboot your PC before you start downloading, and not start any unnecessary programs until after the install is complete.

After installation, the software offers to let you register the product online. There is also an online activation which is necessary to continue using the program. The registration and the activation are unrelated tasks.

As a user of version 11, I had my existing program removed automatically and my user account upgraded to the new version. This takes some time even on a fast computer. New users will be led through creating an account instead, and the system checks the quality of your microphone input before asking you to read a text to attune the program to your voice and reading rhythm. You can skip this step and train the program by correcting mistakes if you want. New users also get an offer to let the program read through their email and documents to adapt to their vocabulary. This is a separate task from adapting to your voice. Again, you can skip this and just train the program through use, if you are impatient, but there will be more errors during your first few days of use if so.

Accuracy training: Since Dragon was complaining about my microphone, I bought another, an analog headset to replace the digital USB headset. I established a new user account and started over from scratch with the new hardware. This microphone passes Dragon’s test with flying colors, but the new account doesn’t have any of the accumulated experience with my speaking. Newsflash: It certainly wasn’t useful right out of the box!

My experience is probably not typical, since I am a foreigner to the English language and also have a chronic problem with my vocal cords – my voice grows “rusty” many times faster than a normal human – but I think we should still consider this. After all, most people aren’t native English speakers, or if they are, they have dialects or accents. And your voice does change with use even if more slowly than mine. And my experience is that it takes several hours for a new user before Dragon NaturallySpeaking 12 becomes truly useful. So don’t buy this program an hour before you need it. Set aside a couple days at least to become good friends with it before you start working together.

Not only does your voice change after you have used it for a while, but it is also slightly different from morning to evening. So it may be a good idea to do some reading training at different times, to help the computer get familiar with your voice. It is not necessary to read all the way through the exercises, you can click finish at any time. Also, try to make sure that you read the exercises in the same way that you speak to the computer when you dictate. For my part, I have found that I have a tendency to speak faster and in longer stretches when I read something, compared to when I dictate my own thoughts. For some reason I also tend to read louder – perhaps a habit from my school days? We used to be required to read aloud in class.

Features: The previous version mostly improved the user interface, introducing context-sensitive help in the form of the “Dragon Sidebar”. It also expanded support for more programs, and the engine was made more efficient. Version 12 has very few changes in the user interface; it supposedly includes 100 new features, but I don’t expect to need more than a few of them. Most of the development this time seems to have concentrated on the technical: In addition to the improved accuracy, the program also runs much faster, especially on new computers where it now takes advantage of multicore processors and extra memory. Additionally, even the home version can now take advantage of mobile phones as microphones: If you have an iPhone or an Android smartphone and it’s on the same Wi-Fi network as your computer, you can dictate to your smartphone and have the text appear on your computer screen!

One feature I thought was included in the home version, but which evidently isn’t, is playback of your own dictation. On the other hand, the program includes an excellent synthetic voice which can read what you have dictated (or any other normal text). This will begin to come in handy when the accuracy approaches 100%. Dragon doesn’t make typos; when it makes a mistake, it writes valid words, usually words that make sense  next to each other, but not the words you intended to say. We who have been typing for decades, will naturally look for typos when we proofread our text. It is all too easy for us to overlook that a wrong word has been used, such as “is” instead of “isn’t”. But chances are we catch it when we hear it out loud!

That’s all for this time, but I hope to be back with glowing praise when the accuracy approaches 100%. ^_^