Efficient empirical computation and utilization of acoustic confusability

US 9,626,965 B2
Filed: 12/17/2014
Issued: 04/18/2017
Est. Priority Date: 10/31/2002
Status: Active Grant

First Claim

Patent Images

1. In a computer implemented method for determining an empirically derived acoustic confusability measure, an iterative method for development of a probability model family Π

={p(d|t)}, comprising;

providing a recognized corpus;

establishing a termination condition which depends on any of;

a number of iterations executed; and

closeness of match between a previous and current probability family models;

defining a family of decoding costs;

setting an iteration count to 0;

setting a phoneme pair count to 0;

for each entry in the recognized corpus, performing the following steps;

constructing a lattice;

populating lattice arcs with values drawn from a current family of decoding costs;

applying a Bellman-Ford dynamic programming algorithm, or a Dijkstra'"'"'s shortest path algorithm, to find a shortest path through said lattice, from a source node to a terminal node; and

traversing said determined shortest path, wherein for each arc that is traversed, the phoneme pair count is incremented by 1;

for each transcription, computing a confidence score which is the sum of a phoneme pair value over all transcriptions paired with an utterance;

estimating said probability model family;

if the iteration count exceeds 0, testing said termination condition;

if said termination condition is satisfied, returning a desired probability model family and stopping;

if said termination condition is not satisfied, defining a new family of decoding costs and therefrom a new probability model family; and

incrementing said iteration count and repeating.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled choices about which specific phrases to make recognizable by a speech recognition application.

Citations

7 Claims

1. In a computer implemented method for determining an empirically derived acoustic confusability measure, an iterative method for development of a probability model family Π
- ={p(d|t)}, comprising;
  
  providing a recognized corpus;
  
  establishing a termination condition which depends on any of;
  
  a number of iterations executed; and
  
  closeness of match between a previous and current probability family models;
  
  defining a family of decoding costs;
  
  setting an iteration count to 0;
  
  setting a phoneme pair count to 0;
  
  for each entry in the recognized corpus, performing the following steps;
  
  constructing a lattice;
  
  populating lattice arcs with values drawn from a current family of decoding costs;
  
  applying a Bellman-Ford dynamic programming algorithm, or a Dijkstra'"'"'s shortest path algorithm, to find a shortest path through said lattice, from a source node to a terminal node; and
  
  traversing said determined shortest path, wherein for each arc that is traversed, the phoneme pair count is incremented by 1;
  
  for each transcription, computing a confidence score which is the sum of a phoneme pair value over all transcriptions paired with an utterance;
  
  estimating said probability model family;
  
  if the iteration count exceeds 0, testing said termination condition;
  
  if said termination condition is satisfied, returning a desired probability model family and stopping;
  
  if said termination condition is not satisfied, defining a new family of decoding costs and therefrom a new probability model family; and
  
  incrementing said iteration count and repeating.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, said step of estimating said probability model family comprising either of:
    - if the confidence value is non-zero for every transcription, then setting the probability to a ratio of confidence for a phoneme pair over confidence for said utterance; and
      
      if the confidence value is zero for any transcription, then applying a desired zero-count probability estimator to estimate probability.
  - 3. The method of claim 1, the steps ofconstructing said lattice, populating said lattice arcs with values drawn from a current family of decoding costs,applying said Bellman-Ford dynamic programming algorithm, or said Dijkstra'"'"'s shortest path algorithm, to find said shortest path through said lattice, andtraversing said determined shortest path, wherein for each arc that is traversed, the phoneme pair count is incremented by 1, comprising:
    - for an entry in the recognized corpus with a decoded phoneme sequence containing N phonemes, and a true phoneme sequence containing Q phonemes, constructing a rectangular lattice of dimension (N+1) rows by (Q+1) columns, and with an arc from a node (i, j) to each of nodes (i+1, j), (i, j+1), and (i+1, j+1), when present in said lattice, where “
      
      node (i, j)”
      
      refers to the node in row i, column j of the lattice;
      
      labeling;
      
      each arc from node (i, j) to node (i, j+1) with the cost δ
      
      _(m)(ε
      
      |t_j)each arc from node (i, j) to node (i+1, j) with the cost δ
      
      _(m)(d_i|ε
      
      )each arc from node (i, j) to node (i+1, j+1) with the cost δ
      
      _(m)(d_i|t_j),where δ
      
      _(m)is the associated decoding cost at the mth iteration of the algorithm, andapplying the Bellman-Ford dynamic programming algorithm or Dijkstra'"'"'s shortest path algorithm to find a shortest path from the source node, which is defined as node (0,
      
      0), to the terminal node, which is defined as node (N, Q);
      
      outputting a sequence of arcs A=a₁, a₂, . . . , a_K, in said lattice corresponding to the aforesaid minimum cost path from the source node to the terminal node; and
      
      for each arc a_iin the minimum cost path A, labeled with a phoneme pair, incrementing the associated phoneme pair count by 1.
  - 4. The method of claim 1, further comprising computing an empirically derived acoustic confusability of two phrases by:
    - determining said desired probability model family Π
      
      ={p(d|t)};
      
      using Π
      
      to compute acoustic confusability of two arbitrary phrases w and v by;
      
      computing a raw phrase acoustic confusability measure, which is a measure of the acoustic similarity of phrases v and w; and
      
      computing a grammar-relative confusion probability measure, which is an estimate of the probability that a grammar-constrained recognizer returns the phrase v as a decoding, when a true phrase is w.
  - 5. The method of claim 4, said step of computing a phrase acoustic confusability measure further comprising:
    - given pronunciations q(w) and q(v), computing the raw pronunciation acoustic confusability by;
      
      defining decoding costs for each phoneme;
      
      constructing a lattice L=q(v)×
      
      q(w), and labeling it with said phoneme decoding costs, depending upon the phonemes of q(v) and q(w);
      
      finding a minimum cost path A=a₁, a₂, . . . , a_K, from a source node to a terminal node of L;
      
      computing a cost of said minimum cost path A, as a sum of the decoding costs for each arc a∈
      
      A; and
      
      computing a raw pronunciation acoustic confusability measure of q(v) and q(w).
  - 6. The method of claim 4, further comprising:
    - computing a phrase acoustic confusability measure with no reference to pronunciations by any one of the following;
      
      worst case;
      
      most common;
      
      average case;
      
      random; and
      
      a combination of the worst case, most common, average case, and random methods into additional hybrid variants.
  - 7. The method of claim 4, said step of computing a grammar-relative pronunciation confusion probability comprising:
    - letting L(G) be a set of all phrases admissible by a grammar G, and letting Q(L(G)) be a set of all pronunciations of all such phrases;
      
      letting two pronunciations q(v), q(w)∈
      
      Q(L(G)) be given;
      
      estimating a probability that an utterance corresponding to a pronunciation q(w) is decoded by a recognizer R_Gas q(v), as follows;
      
      computing a normalizer of q(w) relative to G, written Z(q(w), G), as
      Z(q(w),G)=Σ
      
      r(q(x)|q(w)),where the sum extends over all q(x)∈
      
      Q(L(G)); and
      
      setting a probability p(q(v)|q(w), G)=r(q(v)|q(w))/Z(q(w), G).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Promptu Systems Corporation
Original Assignee
Promptu Systems Corporation
Inventors
Printz, Harry, Chittar, Naren
Primary Examiner(s)
Lerner, Martin

Application Number

US14/574,314
Publication Number

US 20150106100A1
Time in Patent Office

853 Days
Field of Search

704236, 704240, 704243
US Class Current
CPC Class Codes

G06F 16/95   Retrieval from the web

G06F 16/9535   Search customisation based ...

G06Q 30/02   Marketing; Price estimation...

G10L 15/02   Feature extraction for spee...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 17/26   Recognition of special voic...

G10L 2015/025   Phonemes, fenemes or fenone...

Efficient empirical computation and utilization of acoustic confusability

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Efficient empirical computation and utilization of acoustic confusability

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links