Speech Recognition, Digitization, Generation

Speech recognition,

digitization and generation

Speech technology
For designers of human/computer
interaction systems, speech and audio
technologies have at least five
variations:

Discrete-word recognition,
Continuous-speech recognition,
Voice information systems,
Speech generation and
Non-speech auditory interfaces
Discrete word recognition
Discrete-word recognition devices recognize individual words
spoken by a specific person

they can work with 90 to 98% reliability for 100 to 1000 word or
larger vocabularies.

Speaker-dependent training, in which users repeat the full
vocabulary once or twice, is a part of many systems. Such
training yields higher accuracy than in speaker-independent
systems, but the elimination of training expands the scope of
commercial applications.

Quiet environments, head-mounted microphones and careful
choice of vocabularies improve recognition rates
Telephone companies offer voice-dialing services, even on
cell phones, to allow users simply to say CallMom and be
connected.

Phone-based recognition of numbers, yes/no answers, and
selections from Voice menus are successful and increasingly
applied.

However, full-sentence commands such as Reserve two
seats on the first flight tomorrow from New York to
Washington are just moving from are search challenge to
commercial use.

Current research projects are devoted to improving
recognition rates in difficult conditions, eliminating the need
for speaker-dependent.

Speech recognition for discrete words works well for special-
purpose applications, but it does not serve as a general
interaction medium.
Continuous speech
recognition
Continuous-speech-recognition systems enable users to
dictate letters and compose reports verbally for automatic
transcription.

Review, correction, and revision are usually accomplished with
keyboards and displays.

Users need practice in dictation and seem to do best with
speech input when preparing standard reports.

Continuous speech-recognition systems also enable automatic
scanning and retrieval from radio or television programs, court
proceedings, lectures, or telephone calls for specific words or
topics

Difficulties in implementation
A major difficulty for software designers is recognizing the boundaries between
spoken words, because normal speech patterns blur the boundaries.

Other problems are diverse accents, variable speaking rates, disruptive back-
ground noise, and changing emotional intonation.

the most difficult problem is matching the semantic interpretation and contextual
understanding that humans apply easily to predict and disambiguate words.
Voice information systems
Stored speech is commonly used to provide telephone-based
information about tourist sites and government services, and
for after-hours messages from organizations.

These voice information systems, often called Interactive
Voice Response(IVR), can provide good customer service at
minimum cost if proper development methods and metrics
are used

Voice prompts guide users so they can press keys to check
on airline flight departure or arrival times etc

Voice information technologies are also used in popular
personal voicemail systems.
Speech generation
Speech generation is a successful technology with wide
spread application in consumer products and on telephones.

When algorithms are used to generate the sound(synthesis),
the intonation may sound robot-like and distracting. The
quality of the sound can be improved when phonemes, words
and phrases from digitized human speech can be smoothly
integrated into meaningful sentences.

Text-to-speech utilities like the built-in Microsoft Windows
Narrator can be used to read passages of text in web
browsers and word processors.
Speech generation and digitized speech
segments are usually preferable when:

the messages are simple and short,
deal with events in time,
require an immediate response
when users visual channels are overloaded
They must be free to move around
When the environment is too brightly lit, too poorly lit, subject
to severe vibration, or otherwise unsuitable for visual displays.
Non-speech auditory interfaces
Auditory outputs include individual audio tones and more complex information
presentation by combinations of sound and music

Computer systems added a range of tones to indicate warnings or to acknowledge
the completion of an action.

Early Teletypes included a bell tone to alert users that a message was coming or
that paper had run out. Later computer systems added a range of tones to indicate
warnings or to acknowledge the completion of an action.

Auditory icons, such as a door opening, liquid pouring, or ball bouncing, help
reinforce the visual metaphors in a graphical user interface or the product concepts
for a toy.
Game designers know that sounds can add realism, heighten
tension, and engage users in powerful ways.

Research continues on auditory methods for emphasizing the
distributions of data in information visualization or drawing
attention to patterns, outliers, and clusters.

Auditory web browsers for blind users or telephonic usage
have been developed. Users can hear text and link labels,
and then make selections by key entry.

Auditory file browsers continue to be refined: each file might
have a sound whose frequency is related to its size, and
might be assigned an instrument

when the directory is opened, each file might play its sound
simultaneously or sequentially. Alternatively, files might have
sounds associated with their file types, so that users can hear
whether there are spread sheet, graphic, or other textfiles.
The potential for novel musical instruments seems
especially attractive.

With touch-sensitive and haptic devices it is possible to
offer appropriate feedback to give musicians an
experience similar to a piano keyboard, a drum, or a
wood-wind or stringed instrument.

It is also possible to invent new instruments whose
frequencies, amplitudes, and effects are governed by
the placement of the touch, as well as by its direction,
and speed.

Music composition using computers expanded as
musical-instrument digital-interface(MIDI) hardware and
software became widely available at reasonable prices.

Speech Recognition, Digitization, Generation

Uploaded by

Copyright:

Available Formats

Speech Recognition, Digitization, Generation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech Recognition, Digitization, Generation

Uploaded by

Copyright:

Available Formats

Speech recognition,

digitization and generation

You might also like