IDENTIFYING KEYWORD OCCURRENCES IN AUDIO DATA
First Claim
1. A method for processing audio data conveying speech information, said method comprising:
- a) providing a computer based processing entity having an input, the processing entity being programmed with software to perform speech recognition on the audio data;
b) providing at the input a signal indicative of at least one keyword;
c) performing speech recognition on the audio data with the processing entity to determine if the audio data contains one or more potential occurrences of the keyword;
d) when the performing identifies a potential occurrence of a keyword in the audio data, generating location data indicative of a location of a spoken utterance in the audio data corresponding to the potential occurrence;
e) processing the location data with the processing entity to select a subset of audio data from the audio data for playing to an operator, the subset containing at least a portion of the spoken utterance corresponding to the potential occurrence;
f) playing the selected subset of audio data to the operator;
g) receiving at the input verification data from the operator confirming that the selected subset of audio data contains the keyword or indicating that the selected subset of audio data does not contain the keyword;
h) processing the verification data with the processing entity to generate a label indicating whether or not the audio data contains the keyword;
i) storing the label in a machine readable storage medium.
2 Assignments
0 Petitions
Accused Products
Abstract
Occurrences of one or more keywords in audio data are identified using a speech recognizer employing a language model to derive a transcript of the keywords. The transcript is converted into a phoneme sequence. The phonemes of the phoneme sequence are mapped to the audio data to derive a time-aligned phoneme sequence that is searched for occurrences of keyword phoneme sequences corresponding to the phonemes of the keywords. Searching includes computing a confusion matrix. The language model used by the speech recognizer is adapted to keywords by increasing the likelihoods of the keywords in the language model. For each potential occurrences keywords detected, a corresponding subset of the audio data may be played back to an operator to confirm whether the potential occurrences correspond to actual occurrences of the keywords.
82 Citations
31 Claims
-
1. A method for processing audio data conveying speech information, said method comprising:
-
a) providing a computer based processing entity having an input, the processing entity being programmed with software to perform speech recognition on the audio data; b) providing at the input a signal indicative of at least one keyword; c) performing speech recognition on the audio data with the processing entity to determine if the audio data contains one or more potential occurrences of the keyword; d) when the performing identifies a potential occurrence of a keyword in the audio data, generating location data indicative of a location of a spoken utterance in the audio data corresponding to the potential occurrence; e) processing the location data with the processing entity to select a subset of audio data from the audio data for playing to an operator, the subset containing at least a portion of the spoken utterance corresponding to the potential occurrence; f) playing the selected subset of audio data to the operator; g) receiving at the input verification data from the operator confirming that the selected subset of audio data contains the keyword or indicating that the selected subset of audio data does not contain the keyword; h) processing the verification data with the processing entity to generate a label indicating whether or not the audio data contains the keyword; i) storing the label in a machine readable storage medium. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of identifying occurrences of a keyword within audio data, the method comprising:
-
a) providing a computer based processing entity programmed with software, the software implementing a language model to perform speech recognition; b) inputting in the processing entity data conveying the keyword; c) processing the data conveying the keyword with the software to adapt the language model to the keyword and generate an adapted language model; d) processing the audio data with the adapted language model to determine if the audio data contains the keyword; e) releasing result data at an output of the processing entity conveying results of the processing of the audio data with the adapted language model. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A method of identifying occurrences of keywords within audio recordings containing speech information, the method comprising:
-
a) providing a computer based processing entity programmed with software, the software implementing a language model to perform speech recognition; b) inputting in the processing entity first data conveying a first keyword; c) processing the first data with the software to adapt the language model to the first keyword and generate a language model adapted to the first keyword; d) processing a first set of recordings with the language model adapted to the first keyword to determine if the first set of recordings contains the first keyword; e) inputting in the processing entity second data conveying a second keyword; f) processing the second data with the software to adapt the language model to the second keyword and generate a language model adapted to the second keyword; g) processing a second set of recordings with the language model adapted to the second keyword to determine if the second set of recordings contains the second keyword; h) releasing data at the output of the processing entity conveying results of the processing of the first and second sets recordings with the language models adapted to the first and second keywords, respectively.
-
Specification