Searching in Audio Speech
First Claim
1. A computerized method of detecting a target word in a speech signal, the method comprising:
- providing a speech recognition engine and a previously constructed phoneme model;
inputting the speech signal into the speech recognition engine;
based on the phoneme model, indexing the input speech signal, thereby storing a time-ordered list representing n-best phoneme candidates of the input speech signal and phonemes in a plurality of phoneme frames, wherein n is an integer between two and eight;
transcribing the target word into a transcription of target phonemes;
searching through said time-ordered list of n-best phoneme candidates for a locus of said target phonemes;
while said searching, scoring based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of said target phonemes found, thereby producing a composite score of the probability of an occurrence of the target word;
when said composite score is higher than a threshold, outputting start and finish times bounding said locus;
inputting said start and finish times into an algorithm adapted for sequence alignment based on dynamic programming; and
using said algorithm aligning a first sequence with a second sequence, wherein said first sequence is a portion of said phoneme frames, wherein said portion is based on said start and finish times and wherein said second sequence is said target phonemes.
1 Assignment
0 Petitions
Accused Products
Abstract
A computerized method of detecting a target word in a speech signal. A speech recognition engine and a previously constructed phoneme model is provided. The speech signal is input into the speech recognition engine. Based on the phoneme model, the input speech signal is indexed. A time-ordered list is stored representing n-best phoneme candidates of the input speech signal and phonemes of the input speech signal in multiple phoneme frames. The target word is transcribed into a transcription of target phonemes. The time-ordered list of n-best phoneme candidates is searched for a locus of said target phonemes. While searching, scoring is based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of the target phonemes found. A composite score of the probability of an occurrence of the target word is produced. When the composite score is higher than a threshold, start and finish times are output which bound the locus. The start and finish times are input into an algorithm adapted for sequence alignment based on dynamic programming for aligning a portion of the phoneme frames with the target phonemes.
46 Citations
11 Claims
-
1. A computerized method of detecting a target word in a speech signal, the method comprising:
-
providing a speech recognition engine and a previously constructed phoneme model; inputting the speech signal into the speech recognition engine; based on the phoneme model, indexing the input speech signal, thereby storing a time-ordered list representing n-best phoneme candidates of the input speech signal and phonemes in a plurality of phoneme frames, wherein n is an integer between two and eight; transcribing the target word into a transcription of target phonemes; searching through said time-ordered list of n-best phoneme candidates for a locus of said target phonemes; while said searching, scoring based on the ranking of the phoneme candidates among the n-best phoneme candidates and based on the number of said target phonemes found, thereby producing a composite score of the probability of an occurrence of the target word; when said composite score is higher than a threshold, outputting start and finish times bounding said locus; inputting said start and finish times into an algorithm adapted for sequence alignment based on dynamic programming; and using said algorithm aligning a first sequence with a second sequence, wherein said first sequence is a portion of said phoneme frames, wherein said portion is based on said start and finish times and wherein said second sequence is said target phonemes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
Specification