The recently disseminated article on Verbal Compass was fascinating. But I
think the researchers don't know speech recognition products very well. 

NaturallySpeaking has several "verbal compass" commands (of sorts) built in,
yet I suspect most people don't use them. (The people I train are the
exception!). I doubt the researchers knew about these commands. (The last I
checked, ViaVoice had very little in the way of this kind of command.)  

The two primary commands in NaturallySpeaking to move around in this way are
"Insert Before X" and "Insert After X." For example, saying "Insert After
University of Maryland" places the cursor between Maryland and the comma in
the first big paragraph below. Once you have placed the cursor in this way,
the need to move the cursor letter by letter (or word by word) is zero. You
are where you want to be. 

In addition, the "Select" command can be used to navigate to a word or
expression when revising texts. To change "modern fad" to "recent
phenomenon," say "Select modern fad," pause a fraction of a second, and say,
"recent phenomenon." It's not even necessary to issue a Delete command. It is
often beneficial to choose two or more target words because NaturallySpeaking
recognizes expressions better than single words, especially single syllable
words. I even encourage my students to include in the target one or more
words that they don't want to delete. It's simply faster to dictate a word
again than to move the cursor exactly where it is needed. So for example, to
replace "the modern fad" with "the modern phenomenon," I would say "Select
the modern fad," pause, and say, "the modern phenomenon."  

I encourage people to lose their "typewriter mentality" when using speech
recognition: e.g., moving the cursor a line or a word at a time is how one
revises with a typewriter. But zipping directly to one's target is a strategy
that may be counterintuitive because it is not a familiar mechanical
approach. To make it work well, the user has to be thinking about the
likelihood that target words will be found. "Insert Before an" will probably
fail. Insert Before "an obvious alternative" will probably work. "Select for"
may work, but will likely find the wrong instance of the word. "Select for
further improvement" is not only more likely to be found, but may be easier
to say. 

Submitted by Alan Cantor
acantor@cantoraccess.com

---

Verbal Compass article:

Verbal Compass
Better speech-based error correction for dictation tools
From: Technology Review - March 2005 - page 80-81

Context: Extreme multitasking is the modern fad, but no person has enough
hands to manage a cell phone, a digital organizer, a steering wheel, and
coffee all at the same time. Accordingly, people want a hands-free way to
interact with computers. Although speech recognition systems are more
accurate than ever, typical users still spend more time correcting errors
than dictating text; half of their correction time is spent just moving a
cursor to errors identified in, say, a dictated e-mail. Confidence scores the
software's estimates of how likely it is to have captured the right word and
can be used to identify possible errors. Now Jinjuan Feng and Andrew Sears at
the University of Maryland, Baltimore County, have shown that confidence
scores can also be used to accelerate the correction process.  

Methods and Results: Twelve participants dictated 400-word documents using a
speech recognition system. It interpreted 17 percent of the words
incorrectly, a typical rate; it was the correction process that was atypical.
The software used confidence scores to tag words throughout the text as
navigation anchors. Users could quickly jump to each anchor with short voice
commands and then move a cursor word by word to the error. The researchers
measured the number of navigation commands the participants used, the failure
rates of the navigation commands, and the time spent dictating and
navigating. Average failure rates reported for other techniques are about 5
percent for direction-based navigation (move right) and 10 to 20 percent for
word-based navigation (select December). In a test of Feng and Searss
technique, the failure rate was only 3.2 percent. Even better, the time users
spent navigating to errors was cut by nearly a fifth. This is significant
compared with other error-correction techniques and it is promising, because
this work suggests the means for further improvement.  

Why it Matters: The Lilliputian buttons on PDAs and other pocket-sized
wonders are quickly shrinking under a constant-sized thumb. Multitasking is
on the rise, and more people with physical disabilities are entering the
workforce. Both trends will steer users away from computer systems with
manual interfaces. Speech recognition, but for its high error rate and long
correction times, is an obvious alternative. 

This work clearly shows that using confidence scores for navigation can
shrink users correction times. With further improvements, the technique
promises to boost the usability of hands-free error correction and so
engender a surge of new gadgets and applications. 

Source: Feng, J., and A. Sears. 2004. Using confidence scores to improve
hands-free speech based navigation in continuous dictation systems. ACM
Transactions on Computer-Human Interaction 11:329-356. 

From:
http://www2.technologyreview.com/articles/05/03/issue/synopsis_info.asp

Links:
Andrew L. Sears
http://userpages.umbc.edu/~asears/
