Audio search conducted through statistical pattern matching
First Claim
1. A system for audio searches, comprising:
- a general acoustic model, representing speech sounds; and
a garbage model, representing speech and non-speech sounds, wherein the system is capable of;
performing feature extraction on an audio corpus and on an audio search term;
decoding the audio search term using a maximum likelihood search;
using a resulting state sequence from the maximum likelihood search and parameters from the general acoustic model to construct a new model with a plurality of states;
assigning state transition probabilities to the new model given maximum likelihood state occupancy durations from the maximum likelihood search;
conducting an audio corpus maximum likelihood search with respect to the new model and the garbage model;
discarding low scoring and long state sequences at each of a plurality of frames, with respect to duration of the audio search term; and
recording locations and scores of matches and presenting results of the search.
1 Assignment
0 Petitions
Accused Products
Abstract
A technique for audio searches by statistical pattern matching is disclosed. The audio to be located is processed for feature extraction and decoded using a maximum likelihood (“ML”) search. A left-right Hidden Markov Model (“HMM”) is constructed from the ML state sequence. Transition probabilities are defined as normalized state occupancies from the most likely state sequence of the decoding operation. Utterance duration is measured from the search sample. Other model parameters are gleaned from an acoustic model. A ML search of an audio corpus is conducted with respect to the HMM and a garbage model. New start states are added at each frame. Low scoring and long state sequences (with respect to the search sample duration) are discarded at each frame. Locations where scores of the new model are higher than those of the garbage model are marked as potential matches. The highest scoring matches are presented as results.
-
Citations
20 Claims
-
1. A system for audio searches, comprising:
-
a general acoustic model, representing speech sounds; and
a garbage model, representing speech and non-speech sounds, wherein the system is capable of;
performing feature extraction on an audio corpus and on an audio search term;
decoding the audio search term using a maximum likelihood search;
using a resulting state sequence from the maximum likelihood search and parameters from the general acoustic model to construct a new model with a plurality of states;
assigning state transition probabilities to the new model given maximum likelihood state occupancy durations from the maximum likelihood search;
conducting an audio corpus maximum likelihood search with respect to the new model and the garbage model;
discarding low scoring and long state sequences at each of a plurality of frames, with respect to duration of the audio search term; and
recording locations and scores of matches and presenting results of the search. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of conducting audio searches, comprising:
-
performing feature extraction on an audio corpus;
processing an audio search term to perform feature extraction;
decoding the audio search term using a maximum likelihood technique;
generating a model, that has at least one state, from parameters of an acoustic model and from a result of the maximum likelihood technique, including state durations;
allocating state transition probabilities to the model given maximum likelihood state occupancy durations from the maximum likelihood technique;
performing an audio corpus maximum likelihood search with respect to the model and a garbage model;
pruning low scoring and long state sequences at each of a plurality of frames, with respect to the search duration;
recording locations and scores of matches; and
introducing the locations of matches as results of the search. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. An article comprising:
a storage medium having stored thereon instructions that when executed by a machine result in the following;
processing an audio search term for feature extraction;
performing maximum likelihood decoding on the audio search term;
generating a model, having one or more search model states, from a resulting state sequence from the maximum likelihood decoding and from an acoustic model;
assigning state transition probabilities to the model, given maximum likelihood state occupancy durations from the maximum likelihood decoding;
performing feature extraction on an audio corpus;
performing maximum likelihood decoding on the audio corpus with respect to the model and a garbage model;
removing low scoring and long state sequences with respect to search sample duration;
logging locations and scores of matches; and
presenting results of the matches. - View Dependent Claims (16, 17, 18, 19, 20)
Specification