Searching in Audio Speech

US 20100324900A1
Filed: 06/19/2009
Published: 12/23/2010
Est. Priority Date: 06/19/2009
Status: Active Grant

First Claim

Patent Images

1. A computerized method of detecting a target word in a speech signal, the method comprising:

providing a speech recognition engine and a previously constructed phoneme model;

inputting the speech signal into the speech recognition engine;

based on the phoneme model, indexing the input speech signal, thereby storing a time-ordered list representing n-best phoneme candidates of the input speech signal and phonemes in a plurality of phoneme frames, wherein n is an integer between two and eight;

transcribing the target word into a transcription of target phonemes;

searching through said time-ordered list of n-best phoneme candidates for a locus of said target phonemes;

while said searching, scoring based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of said target phonemes found, thereby producing a composite score of the probability of an occurrence of the target word;

when said composite score is higher than a threshold, outputting start and finish times bounding said locus;

inputting said start and finish times into an algorithm adapted for sequence alignment based on dynamic programming; and

using said algorithm aligning a first sequence with a second sequence, wherein said first sequence is a portion of said phoneme frames, wherein said portion is based on said start and finish times and wherein said second sequence is said target phonemes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computerized method of detecting a target word in a speech signal. A speech recognition engine and a previously constructed phoneme model is provided. The speech signal is input into the speech recognition engine. Based on the phoneme model, the input speech signal is indexed. A time-ordered list is stored representing n-best phoneme candidates of the input speech signal and phonemes of the input speech signal in multiple phoneme frames. The target word is transcribed into a transcription of target phonemes. The time-ordered list of n-best phoneme candidates is searched for a locus of said target phonemes. While searching, scoring is based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of the target phonemes found. A composite score of the probability of an occurrence of the target word is produced. When the composite score is higher than a threshold, start and finish times are output which bound the locus. The start and finish times are input into an algorithm adapted for sequence alignment based on dynamic programming for aligning a portion of the phoneme frames with the target phonemes.

46 Citations

View as Search Results

11 Claims

1. A computerized method of detecting a target word in a speech signal, the method comprising:
- providing a speech recognition engine and a previously constructed phoneme model;
  
  inputting the speech signal into the speech recognition engine;
  
  based on the phoneme model, indexing the input speech signal, thereby storing a time-ordered list representing n-best phoneme candidates of the input speech signal and phonemes in a plurality of phoneme frames, wherein n is an integer between two and eight;
  
  transcribing the target word into a transcription of target phonemes;
  
  searching through said time-ordered list of n-best phoneme candidates for a locus of said target phonemes;
  
  while said searching, scoring based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of said target phonemes found, thereby producing a composite score of the probability of an occurrence of the target word;
  
  when said composite score is higher than a threshold, outputting start and finish times bounding said locus;
  
  inputting said start and finish times into an algorithm adapted for sequence alignment based on dynamic programming; and
  
  using said algorithm aligning a first sequence with a second sequence, wherein said first sequence is a portion of said phoneme frames, wherein said portion is based on said start and finish times and wherein said second sequence is said target phonemes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein said phoneme model explicitly includes as a parameter a length of the phonemes.
  - 3. The method of claim 1, wherein said phoneme model includes dividing a plurality of phonemes into two or three sub-phonemes and modeling said sub-phonemes acoustically using a plurality of Gaussian parameters based on a mixture model of Gaussian functions and an explicit length dependency.
  - 4. The method of claim 3, wherein said length dependency is a Poisson length dependency.
  - 5. The method of claim 1, wherein fewer than fifty phonemes are modeled in said phoneme model.
  - 6. The method of claim 1, wherein said sequence alignment algorithm is based on a Smith-Waterman algorithm.
  - 7. The method of claim 1, wherein said sequence alignment is time constrained to a portion of the speech signal substantially between said start and finish times.
  - 8. The method of claim 1, further comprising, prior to performing said sequence alignment, reducing in said first sequence repetitive frames of the same phoneme to a single frame of the same phoneme.
  - 9. The method of claim 1, wherein storing said phonemes in said phoneme frames includes storing k best candidates of said phonemes in said phoneme frames and wherein said aligning is performed over said k-best candidates, wherein k is an integer between two and eight.
  - 10. The method of claim 1, wherein said threshold is previously determined.
  - 11. A computer readable medium encoded with processing instructions for causing a processor to execute the method of claim 1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LNTS Linguistech Solutions Limited
Original Assignee
LNTS Linguistech Solutions Limited
Inventors
Simone, Adam, Cohen-Tov, Rabin, Faifkov, Ronen

Granted Patent

US 8,321,218 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 15/12 using dynamic programming t...

G10L 2015/025 Phonemes, fenemes or fenone...

Searching in Audio Speech

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

46 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Searching in Audio Speech

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

46 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links