Efficient empirical determination, computation, and use of acoustic confusability measures

US 10,121,469 B2
Filed: 03/13/2017
Issued: 11/06/2018
Est. Priority Date: 10/31/2002
Status: Active Grant

First Claim

Patent Images

1. A method for generating an acoustic confusability measure, said method comprising the steps of:

receiving as input a corpus, comprising a set of utterances with corresponding reliable transcriptions;

recognizing, via an automatic speech recognition system, at least one utterance among the set of utterances to yield a recognized utterance, wherein said recognized utterance includes at least one decoded frame sequence;

coalescing identical sequential phonemes of said at least one decoded frame sequence to yield at least one decoded phoneme sequence;

determining, for each corresponding reliable transcription, at least one pronunciation, wherein said at least one pronunciation includes at least one true phoneme sequence;

generating as output said recognized corpus comprising for each said recognized utterance at least said at least one decoded phoneme sequence and said at least one true phoneme sequence; and

generating an empirically derived acoustic confusability measure from said recognized corpus, said empirically derived acoustic confusability measure comprising a family of probability models Π

={p(d|t)} wherein each of d and t are phonemes drawn from an augmented phoneme alphabet Φ

′

.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled choices about which specific phrases to make recognizable by a speech recognition application.

Citations

7 Claims

1. A method for generating an acoustic confusability measure, said method comprising the steps of:
- receiving as input a corpus, comprising a set of utterances with corresponding reliable transcriptions;
  
  recognizing, via an automatic speech recognition system, at least one utterance among the set of utterances to yield a recognized utterance, wherein said recognized utterance includes at least one decoded frame sequence;
  
  coalescing identical sequential phonemes of said at least one decoded frame sequence to yield at least one decoded phoneme sequence;
  
  determining, for each corresponding reliable transcription, at least one pronunciation, wherein said at least one pronunciation includes at least one true phoneme sequence;
  
  generating as output said recognized corpus comprising for each said recognized utterance at least said at least one decoded phoneme sequence and said at least one true phoneme sequence; and
  
  generating an empirically derived acoustic confusability measure from said recognized corpus, said empirically derived acoustic confusability measure comprising a family of probability models Π
  
  ={p(d|t)} wherein each of d and t are phonemes drawn from an augmented phoneme alphabet Φ
  
  ′
  
  .
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein said recognizing, via an automatic speech recognition system, at least one utterance is performed in an N-best manner, wherein up to N variant decodings are produced for each recognized utterance, and wherein each said variant decoding includes a distinct decoded frame sequence.
  - 3. The method of claim 2, wherein each said variant decoding includes a corresponding confidence score.
  - 4. The method of claim 3, wherein in said method for generating said empirically derived acoustic confusability measure, a count associated with processing each said variant decoding is incremented by said corresponding confidence score.
  - 5. The method of claim 1, further comprising:
    - applying a phoneme map to frames of said at least one decoded frame sequence to yield at least one mapped decoded frame sequence, said at least one mapped decoded frame sequence thereafter replacing said at least one decoded frame sequence.
  - 6. The method of claim 5, in which said applying the phoneme map comprises at least one of the steps of:
    - mapping a phoneme of each frame of said at least one decoded frame sequence to yield a mapped phoneme, said mapped phoneme being either the same as or different from an original phoneme, thereby yielding said at least one decoded frame sequence;
      
      ormapping said phoneme of each frame of said at least one decoded frame sequence to yield said mapped phoneme, said mapped phoneme being either the same as or different from the original phoneme, to include left phonetic context, right phonetic context or both into the mapped phoneme, thereby yielding said at least one decoded frame sequence.
  - 7. The method of claim 1, wherein said determining for each reliable transcription said at least one pronunciation comprises any of the steps of:
    - for each word w of said each reliable transcription, utilizing as a pronunciation of w, from a set of all pronunciations of w, its most popular pronunciation;
      
      for each word w of said each reliable transcription, utilizing as the pronunciation of w, from the set of all pronunciations of w, a pronunciation selected at random;
      
      for each word w of said each reliable transcription, utilizing as the pronunciation of w, from the set of all pronunciations of w, the pronunciation that is closest, in the sense of string edit distance, to the at least one decoded phoneme sequence for w within the said at least one decoded phoneme sequence;
      
      orfor each word w of said each reliable transcription, utilizing as the pronunciation of w each of a plurality of pronunciations from the set of all pronunciations of w, thereby generating a multiplicity of pronunciations for each reliable transcription.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Promptu Systems Corporation
Original Assignee
Promptu Systems Corporation
Inventors
Printz, Harry, Chittar, Naren
Primary Examiner(s)
Lerner, Martin

Application Number

US15/457,964
Publication Number

US 20170186421A1
Time in Patent Office

603 Days
Field of Search

704235, 704236, 704242, 704243, 7042562
US Class Current
CPC Class Codes

G06F 16/95   Retrieval from the web

G06F 16/9535   Search customisation based ...

G06Q 30/02   Marketing; Price estimation...

G10L 15/02   Feature extraction for spee...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 17/26   Recognition of special voic...

G10L 2015/025   Phonemes, fenemes or fenone...

Efficient empirical determination, computation, and use of acoustic confusability measures

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient empirical determination, computation, and use of acoustic confusability measures

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links