Wordspotting for voice editing and indexing

US 5,199,077 A
Filed: 09/19/1991
Issued: 03/30/1993
Est. Priority Date: 09/19/1991
Status: Expired due to Fees

First Claim

Patent Images

1. A wordspotting method for determining the location of a word in recorded continuous voice-speech using a single spoken version of the word as a keyword, comprising:

(a) creating a first hidden Markov model (HMM) representing features of the voice that recorded the speech,(b) providing the keyword,(c) creating a second hidden Markov model (HMM) representing features of the keyword to be spotted, said first and second HMMs comprising a sequence of states each having an output representative of an acoustic unit and transition probabilities to succeeding states and having a normalized forward probability for each state and having an endstate for the keyword,(d) using the first and second HMMs, scanning through the recorded speech computing the normalized forward probability for each state searching for a peak in the a posteriori probability of the endstate of the keyword thus hypothesizing that the end time of a keyword has been found,(e) if a peak is found in step (d), stopping the scanning and backtracking through the recorded speech to the most probable hypothesized keyword beginning time while computing a score for that hypothesized keyword, where the score represents a measure of how well the second HMM representing background speech fits the hypothesized keyword beginning and end times relative to how well the first HMM fits the said beginning and end times,(f) indicating the keyword has been spotted in the recorded speech when the score computed in step (e) exceeds a pre-set value.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique for wordspotting based on hidden Markov models (HMM'"'"'s). The technique allows a speaker to specify keywords dynamically and to train the associated HMM'"'"'s via a single repetition of a keyword. Non-keyword speech is modeled using an HMM trained from a prerecorded sample of continuous speech. The wordspotter is intended for interactive applications, such as the editing of voice mail or mixed-media documents, and for keyword indexing in single-speaker audio or video recordings.

Citations

19 Claims

1. A wordspotting method for determining the location of a word in recorded continuous voice-speech using a single spoken version of the word as a keyword, comprising:
- (a) creating a first hidden Markov model (HMM) representing features of the voice that recorded the speech,(b) providing the keyword,(c) creating a second hidden Markov model (HMM) representing features of the keyword to be spotted, said first and second HMMs comprising a sequence of states each having an output representative of an acoustic unit and transition probabilities to succeeding states and having a normalized forward probability for each state and having an endstate for the keyword,(d) using the first and second HMMs, scanning through the recorded speech computing the normalized forward probability for each state searching for a peak in the a posteriori probability of the endstate of the keyword thus hypothesizing that the end time of a keyword has been found,(e) if a peak is found in step (d), stopping the scanning and backtracking through the recorded speech to the most probable hypothesized keyword beginning time while computing a score for that hypothesized keyword, where the score represents a measure of how well the second HMM representing background speech fits the hypothesized keyword beginning and end times relative to how well the first HMM fits the said beginning and end times,(f) indicating the keyword has been spotted in the recorded speech when the score computed in step (e) exceeds a pre-set value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein steps (d)-(f) are repeated until the recorded speech finishes.
  - 3. The method of claim 1, wherein the keyword has plural syllables.
  - 4. The method of claim 1, wherein the first HMM is formed by a parallel connection of states.
  - 5. The method of claim 4, wherein the second HMM is formed by a left to right model wherein each state has a self transition and transitions to the next two following states.
  - 6. The method of claim 1, wherein step (d) includes the steps of forming a network made up of a parallel connection of the first HMM and the second HMM, and conducting a forward search through the network to hypothesize keyword endstates.
  - 7. The method of claim 6, wherein step (e) includes the step of conducting a backward search separately through each of the first and second HMMs.
  - 8. The method of claim 1, wherein step (d) includes the step of inputting the recorded speech and searching for locations in the recorded speech where the end of a candidate keyword is probable, where such locations are determined by peaks of sufficient amplitude in the a posteriori probability that the endstate of the second HMM has been found.

9. A wordspotting method for determining the location of a word in recorded continuous voiced-speech using a single spoken version of the word as a keyword, comprising:
- (a) creating as background a first hidden Markov model (HMM) representing features of the voice that recorded the speech, said first HMM being formed by a parallel connection of states and having a duration normalized likelihood,(b) providing a spoken keyword,(c) creating a second hidden Markov model (HMM) representing features of the spoken keyword to be spotted and having a duration normalized likelihood, said first and second HMMs comprising a sequence of states each having an output representative of an acoustic unit and forward transition probabilities to succeeding states,(d) computing feature vectors for the recorded continuous speech,(e) using the first and second HMMs, scanning through the recorded speech and computing for each feature vector determined in step (d) its normalized forward probability of being in the keyword endstate to search for a peak in said forward probability for hypothesizing that a keyword has been found,(f) if a peak is found in step (e), stopping the scanning and backtracking through the recorded speech looking for the hypothesized keyword beginning time while computing a score for that hypothesized keyword, where the score represents a ratio of the duration normalized keyword likelihood to the sum of the duration normalized keyword and background likelihoods,(g) indicating the keyword has been spotted in the recorded speech when the score computed in step (f) exceeds a pre-set value.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 10. The method of claim 9, wherein the indication of step (g) is carried out on a display of the recorded utterance.
  - 11. The method of claim 9, wherein the indication of step (g) is carried out by stopping the recorded utterance where the word has been found.
  - 12. The method of claim 9, wherein the HMMs are formed by digitizing, dividing words into frames, deriving feature vectors for each frame, weighting the feature vectors, concatenating the weighted feature vectors to form the HMMs.
  - 13. The method of claim 9, wherein steps (e)-(g) are repeated until the recorded speech finishes.
  - 14. The method of claim 9, wherein step (f) includes the step of computing for each feature vector determined in step (d) starting from the determined endtime and continuing back through 1.5 times the length of the spoken keyword used to generate the second HMM, the time where the keyword computation is maximum being used as the keyword beginning time.
  - 15. The method of claim 9, wherein the second HMM is formed by a left to right model wherein each state has a self transition and transitions to the next two following states, and step (e) includes the step of forming a network made up of a parallel connection of the first HMM and the second HMM.
  - 16. The method of claim 9, wherein step (a) includes the step of analyzing previous utterances of the same talker and creating a vector quantized codebook in which each codeword in the quantization sequence represents a state in an HMM.
  - 17. The method of claim 16, wherein step (c) includes the step of using the vectorized quantized codebook to create the second HMM.
  - 18. The method of claim 9, further comprising the step of varying the pre-set value to vary the number of indications that a keyword has been spotted when step (g) is carried out.
  - 19. The method of claim 9, wherein step (a) includes the step of creating a mapping of feature vectors of the voice that recorded the speech to feature vectors of the spoken keyword for use in step (c) in creating the second HMM. s

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Bush, Marcia A., Wilcox, Lynn D.
Primary Examiner(s)
Kemeny, Emanuel S.
Assistant Examiner(s)
Tung, Kee M.

Application Number

US07/762,290
Time in Patent Office

558 Days
Field of Search

381/41-43
US Class Current

704/256
CPC Class Codes

G10L 15/142 Hidden Markov Models [HMMs]

G10L 2015/088 Word spotting

Wordspotting for voice editing and indexing

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Wordspotting for voice editing and indexing

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links