Wordspotting for voice editing and indexing
First Claim
Patent Images
1. A wordspotting method for determining the location of a word in recorded continuous voice-speech using a single spoken version of the word as a keyword, comprising:
- (a) creating a first hidden Markov model (HMM) representing features of the voice that recorded the speech,(b) providing the keyword,(c) creating a second hidden Markov model (HMM) representing features of the keyword to be spotted, said first and second HMMs comprising a sequence of states each having an output representative of an acoustic unit and transition probabilities to succeeding states and having a normalized forward probability for each state and having an endstate for the keyword,(d) using the first and second HMMs, scanning through the recorded speech computing the normalized forward probability for each state searching for a peak in the a posteriori probability of the endstate of the keyword thus hypothesizing that the end time of a keyword has been found,(e) if a peak is found in step (d), stopping the scanning and backtracking through the recorded speech to the most probable hypothesized keyword beginning time while computing a score for that hypothesized keyword, where the score represents a measure of how well the second HMM representing background speech fits the hypothesized keyword beginning and end times relative to how well the first HMM fits the said beginning and end times,(f) indicating the keyword has been spotted in the recorded speech when the score computed in step (e) exceeds a pre-set value.
4 Assignments
0 Petitions
Accused Products
Abstract
A technique for wordspotting based on hidden Markov models (HMM'"'"'s). The technique allows a speaker to specify keywords dynamically and to train the associated HMM'"'"'s via a single repetition of a keyword. Non-keyword speech is modeled using an HMM trained from a prerecorded sample of continuous speech. The wordspotter is intended for interactive applications, such as the editing of voice mail or mixed-media documents, and for keyword indexing in single-speaker audio or video recordings.
-
Citations
19 Claims
-
1. A wordspotting method for determining the location of a word in recorded continuous voice-speech using a single spoken version of the word as a keyword, comprising:
-
(a) creating a first hidden Markov model (HMM) representing features of the voice that recorded the speech, (b) providing the keyword, (c) creating a second hidden Markov model (HMM) representing features of the keyword to be spotted, said first and second HMMs comprising a sequence of states each having an output representative of an acoustic unit and transition probabilities to succeeding states and having a normalized forward probability for each state and having an endstate for the keyword, (d) using the first and second HMMs, scanning through the recorded speech computing the normalized forward probability for each state searching for a peak in the a posteriori probability of the endstate of the keyword thus hypothesizing that the end time of a keyword has been found, (e) if a peak is found in step (d), stopping the scanning and backtracking through the recorded speech to the most probable hypothesized keyword beginning time while computing a score for that hypothesized keyword, where the score represents a measure of how well the second HMM representing background speech fits the hypothesized keyword beginning and end times relative to how well the first HMM fits the said beginning and end times, (f) indicating the keyword has been spotted in the recorded speech when the score computed in step (e) exceeds a pre-set value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A wordspotting method for determining the location of a word in recorded continuous voiced-speech using a single spoken version of the word as a keyword, comprising:
-
(a) creating as background a first hidden Markov model (HMM) representing features of the voice that recorded the speech, said first HMM being formed by a parallel connection of states and having a duration normalized likelihood, (b) providing a spoken keyword, (c) creating a second hidden Markov model (HMM) representing features of the spoken keyword to be spotted and having a duration normalized likelihood, said first and second HMMs comprising a sequence of states each having an output representative of an acoustic unit and forward transition probabilities to succeeding states, (d) computing feature vectors for the recorded continuous speech, (e) using the first and second HMMs, scanning through the recorded speech and computing for each feature vector determined in step (d) its normalized forward probability of being in the keyword endstate to search for a peak in said forward probability for hypothesizing that a keyword has been found, (f) if a peak is found in step (e), stopping the scanning and backtracking through the recorded speech looking for the hypothesized keyword beginning time while computing a score for that hypothesized keyword, where the score represents a ratio of the duration normalized keyword likelihood to the sum of the duration normalized keyword and background likelihoods, (g) indicating the keyword has been spotted in the recorded speech when the score computed in step (f) exceeds a pre-set value. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification