Audio search conducted through statistical pattern matching

US 20040024599A1
Filed: 07/31/2002
Published: 02/05/2004
Est. Priority Date: 07/31/2002
Status: Abandoned Application

First Claim

Patent Images

1. A system for audio searches, comprising:

a general acoustic model, representing speech sounds; and

a garbage model, representing speech and non-speech sounds, wherein the system is capable of;

performing feature extraction on an audio corpus and on an audio search term;

decoding the audio search term using a maximum likelihood search;

using a resulting state sequence from the maximum likelihood search and parameters from the general acoustic model to construct a new model with a plurality of states;

assigning state transition probabilities to the new model given maximum likelihood state occupancy durations from the maximum likelihood search;

conducting an audio corpus maximum likelihood search with respect to the new model and the garbage model;

discarding low scoring and long state sequences at each of a plurality of frames, with respect to duration of the audio search term; and

recording locations and scores of matches and presenting results of the search.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique for audio searches by statistical pattern matching is disclosed. The audio to be located is processed for feature extraction and decoded using a maximum likelihood (“ML”) search. A left-right Hidden Markov Model (“HMM”) is constructed from the ML state sequence. Transition probabilities are defined as normalized state occupancies from the most likely state sequence of the decoding operation. Utterance duration is measured from the search sample. Other model parameters are gleaned from an acoustic model. A ML search of an audio corpus is conducted with respect to the HMM and a garbage model. New start states are added at each frame. Low scoring and long state sequences (with respect to the search sample duration) are discarded at each frame. Locations where scores of the new model are higher than those of the garbage model are marked as potential matches. The highest scoring matches are presented as results.

Citations

20 Claims

1. A system for audio searches, comprising:
- a general acoustic model, representing speech sounds; and
  
  a garbage model, representing speech and non-speech sounds, wherein the system is capable of;
  
  performing feature extraction on an audio corpus and on an audio search term;
  
  decoding the audio search term using a maximum likelihood search;
  
  using a resulting state sequence from the maximum likelihood search and parameters from the general acoustic model to construct a new model with a plurality of states;
  
  assigning state transition probabilities to the new model given maximum likelihood state occupancy durations from the maximum likelihood search;
  
  conducting an audio corpus maximum likelihood search with respect to the new model and the garbage model;
  
  discarding low scoring and long state sequences at each of a plurality of frames, with respect to duration of the audio search term; and
  
  recording locations and scores of matches and presenting results of the search.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1, wherein the feature extraction converts a speech waveform into a parametric representation that is used for analysis and processing.
  - 3. The system of claim 1, wherein the maximum likelihood search is used to find a most probable sequence of hidden states given a sequence of observed data, and a maximum likelihood score is calculated with respect to the general acoustic model.
  - 4. The system of claim 1, wherein the new model is a left-right hidden Markov model.
  - 5. The system of claim 1, wherein the garbage model is trained on speech and background noise.
  - 6. The system of claim 1, wherein locations of matches are determined at places in which scores of the new model are substantially higher than scores of the garbage model.

7. A method of conducting audio searches, comprising:
- performing feature extraction on an audio corpus;
  
  processing an audio search term to perform feature extraction;
  
  decoding the audio search term using a maximum likelihood technique;
  
  generating a model, that has at least one state, from parameters of an acoustic model and from a result of the maximum likelihood technique, including state durations;
  
  allocating state transition probabilities to the model given maximum likelihood state occupancy durations from the maximum likelihood technique;
  
  performing an audio corpus maximum likelihood search with respect to the model and a garbage model;
  
  pruning low scoring and long state sequences at each of a plurality of frames, with respect to the search duration;
  
  recording locations and scores of matches; and
  
  introducing the locations of matches as results of the search.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. The method of claim 7, wherein the maximum likelihood technique is carried out with respect to the acoustic model that produces a maximum likelihood score.
  - 9. The method of claim 8, wherein the maximum likelihood technique is used to find a most probable sequence of hidden states, given a sequence of observed data, and a maximum likelihood score is calculated with respect to the acoustic model.
  - 10. The method of claim 7, wherein the model is a left-right hidden Markov model.
  - 11. The method of claim 7, wherein the garbage model is trained on speech and background noise.
  - 12. The method of claim 11, wherein the garbage model generates a score that serves as a best path point of reference.
  - 13. The method of claim 7, wherein feature extraction converts a speech waveform into a parametric representation for analysis and processing.
  - 14. The method of claim 7, wherein locations of matches are determined at places in which scores of the model are higher than scores of the garbage model.

15. An article comprising:
- a storage medium having stored thereon instructions that when executed by a machine result in the following;
  
  processing an audio search term for feature extraction;
  
  performing maximum likelihood decoding on the audio search term;
  
  generating a model, having one or more search model states, from a resulting state sequence from the maximum likelihood decoding and from an acoustic model;
  
  assigning state transition probabilities to the model, given maximum likelihood state occupancy durations from the maximum likelihood decoding;
  
  performing feature extraction on an audio corpus;
  
  performing maximum likelihood decoding on the audio corpus with respect to the model and a garbage model;
  
  removing low scoring and long state sequences with respect to search sample duration;
  
  logging locations and scores of matches; and
  
  presenting results of the matches.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The article of claim 15, wherein feature extraction converts a speech waveform into a parametric representation that is used for analysis and processing.
  - 17. The article of claim 15, wherein the maximum likelihood decoding finds a most probable sequence of hidden states from a sequence of observed data, and a maximum likelihood score is calculated with respect to the acoustic model.
  - 18. The article of claim 15, wherein the one or more search model states proceed from left to right in the model.
  - 19. The article of claim 15, wherein locations of matches are determined at places in which scores of the model are higher than scores of the garbage model.
  - 20. The article of claim 15, wherein the garbage model is trained on speech and background noise.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Deisher, Michael E.

Application Number

US10/210,754
Publication Number

US 20040024599A1
Time in Patent Office

Days
Field of Search
US Class Current

704/256
CPC Class Codes

G10L 15/142 Hidden Markov Models [HMMs]

G10L 2015/025 Phonemes, fenemes or fenone...

Audio search conducted through statistical pattern matching

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Audio search conducted through statistical pattern matching

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links