Efficient empirical determination, computation, and use of acoustic confusability measures

US 8,959,019 B2
Filed: 10/31/2007
Issued: 02/17/2015
Est. Priority Date: 10/31/2002
Status: Active Grant

First Claim

Patent Images

1. A method for determining an empirically derived acoustic confusability measure, comprising the steps of:

using a computer for performing corpus processing by initially processing an original corpus, comprising both audio information and a true transcription thereof, with an automatic speech recognition system of interest once, one utterance at a time to produce a recognized corpus comprising a machine transcription of audio information; and

developing a family of phoneme confusability models by repeatedly processing said recognized corpus with said computer, after the corpus is initially processed by said automatic speech recognition system once, wherein each repetition comprises the steps of;

setting all phoneme pair counts to zero; and

analyzing to analyze each pair of phoneme sequences in said recognized corpus to collect information regarding the confusability of any two phonemes, wherein said information is collected by;

constructing a lattice from each said pair of phoneme sequences;

labeling each arc of the lattice with the appropriate value from the current family of decoding costs;

computing the minimum cost path through this lattice; and

traversing said minimum cost path and incrementing the phoneme pair count for each arc that is traversed; and

upon completion for each said pair of phoneme sequences of said minimum cost path traversal and associated incrementing of phoneme pair counts, using said accumulated phoneme pair counts to deliver a family of phoneme confusability models.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled choices about which specific phrases to make recognizable by a speech recognition application.

Citations

5 Claims

1. A method for determining an empirically derived acoustic confusability measure, comprising the steps of:
- using a computer for performing corpus processing by initially processing an original corpus, comprising both audio information and a true transcription thereof, with an automatic speech recognition system of interest once, one utterance at a time to produce a recognized corpus comprising a machine transcription of audio information; and
  
  developing a family of phoneme confusability models by repeatedly processing said recognized corpus with said computer, after the corpus is initially processed by said automatic speech recognition system once, wherein each repetition comprises the steps of;
  
  setting all phoneme pair counts to zero; and
  
  analyzing to analyze each pair of phoneme sequences in said recognized corpus to collect information regarding the confusability of any two phonemes, wherein said information is collected by;
  
  constructing a lattice from each said pair of phoneme sequences;
  
  labeling each arc of the lattice with the appropriate value from the current family of decoding costs;
  
  computing the minimum cost path through this lattice; and
  
  traversing said minimum cost path and incrementing the phoneme pair count for each arc that is traversed; and
  
  upon completion for each said pair of phoneme sequences of said minimum cost path traversal and associated incrementing of phoneme pair counts, using said accumulated phoneme pair counts to deliver a family of phoneme confusability models.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, said corpus processing step comprising the steps of:
    - for input each utterance, said recognition system generating both a decoding, in a decoded frame sequence, wherein a frame comprises a brief audio segment of the input utterance, and a confidence score which comprises a measure, determined by said recognition system, of the likelihood that a given decoding is correct;
      
      transforming said decoded frame sequence into a shorter decoded phoneme sequence;
      
      inspecting said true transcription of said input utterance, wherein said true transcription comprises regular text in a particular human language;
      
      transforming said true transcription into a true phoneme sequence comprising phonemes in a sequence derived from the true transcription;
      
      for each utterance outputting, collectively, said recognized corpus containing a large number of pairs of phoneme sequences, and comprising a confidence score and a pair of phoneme sequences which comprise the decoded phoneme sequence and the true phoneme sequence.
  - 3. The method of claim 2,wherein said decoded frame sequence comprises said recognizer'"'"'s best guess, for each frame of an utterance, of a phoneme being enunciated, in that audio frame;
    - andwherein said phoneme comprises one of a finite number of basic sound units of a human language.
  - 4. The method of claim 1, said developing step comprising the steps of:
    - iterating until there is no further change in said family of confusability models, or the change becomes negligible;
      
      outputting said family of confusability models, which estimates acoustic confusability of any two members a augmented phoneme alphabet; and
      
      deriving an acoustic confusability measure.
  - 5. The method of claim 1, said corpus comprising:
    - a representative set of utterances, in a given human language, with said true transcription comprising regular text in a particular human language;
      
      wherein an utterance comprises a sound recording, represented in a suitable computer-readable form;
      
      wherein said true transcription comprises a human-created conventional textual representation of said utterance; and
      
      wherein said true transcription is one that may be regarded as accurate.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Promptu Systems Corporation
Original Assignee
Promptu Systems Corporation
Inventors
Printz, Harry, Chittar, Narren
Primary Examiner(s)
Lerner, Martin

Application Number

US11/932,122
Publication Number

US 20080126089A1
Time in Patent Office

2,666 Days
Field of Search

704/236, 704/240, 704/243, 704/244, 704/245, 704/254, 704/255, 704/242
US Class Current

704/243
CPC Class Codes

G06F 16/95   Retrieval from the web

G06F 16/9535   Search customisation based ...

G06Q 30/02   Marketing; Price estimation...

G10L 15/02   Feature extraction for spee...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 17/26   Recognition of special voic...

G10L 2015/025   Phonemes, fenemes or fenone...

Efficient empirical determination, computation, and use of acoustic confusability measures

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient empirical determination, computation, and use of acoustic confusability measures

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links