Method and system for real-time keyword spotting for speech analytics
First Claim
1. A computerized method for real-time spotting of predetermined keywords in an audio stream, in an automatic speech recognition system, wherein said system comprises at least a speech recognition engine, the method comprising the steps of:
- a) developing a keyword model for the predetermined keywords;
b) comparing, in real-time, the keyword model and the audio stream to recognize candidates of the predetermined keywords in the audio stream;
c) computing, by the speech recognition engine, a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model, wherein the probability is determined utilizing a posterior based probability approach which comprises analysis of monophone models over an audio feature space;
d) comparing the computed probability of a keyword match to a predetermined threshold and declaring a match if the computed probability meets the predetermined threshold;
e) computing further data to aid in determination of mismatches, wherein said further data comprises empirical metrics, and determining if the candidates are mismatches; and
f) reporting spotted keywords if a mismatch is not identified at step (e),wherein the reporting comprises generating a report, via a microprocessor and software program, that is presented as a start and end time of the spotted keywords in the audio stream with the computed probability that the keywords were found.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method are presented for real-time speech analytics in the speech analytics field. Real time audio is fed along with a keyword model, into a recognition engine. The recognition engine computes the probability of the audio stream data matching keywords in the keyword model. The probability is compared to a threshold where the system determines if the probability is indicative of whether or not the keyword has been spotted. Empirical metrics are computed and any false alarms are identified and rejected. The keyword may be reported as found when it is deemed not to be a false alarm and passes the threshold for detection.
-
Citations
26 Claims
-
1. A computerized method for real-time spotting of predetermined keywords in an audio stream, in an automatic speech recognition system, wherein said system comprises at least a speech recognition engine, the method comprising the steps of:
-
a) developing a keyword model for the predetermined keywords; b) comparing, in real-time, the keyword model and the audio stream to recognize candidates of the predetermined keywords in the audio stream; c) computing, by the speech recognition engine, a probability that a portion of the audio stream matches one of the predetermined keywords from the keyword model, wherein the probability is determined utilizing a posterior based probability approach which comprises analysis of monophone models over an audio feature space; d) comparing the computed probability of a keyword match to a predetermined threshold and declaring a match if the computed probability meets the predetermined threshold; e) computing further data to aid in determination of mismatches, wherein said further data comprises empirical metrics, and determining if the candidates are mismatches; and f) reporting spotted keywords if a mismatch is not identified at step (e), wherein the reporting comprises generating a report, via a microprocessor and software program, that is presented as a start and end time of the spotted keywords in the audio stream with the computed probability that the keywords were found. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computerized method of speech recognition, in an automatic speech recognition system wherein said system comprises at least a speech recognition engine, for real-time spotting of predetermined keywords in an audio stream, comprising the steps of:
-
a) developing a keyword model for the predetermined keywords; b) dividing, by the speech recognition engine, the audio stream into a series of points in an acoustic space that spans all possible sounds created in a particular language; c) determining, by the speech recognition engine, a posterior probability that a first trajectory of each keyword model for the predetermined keywords in the acoustic space matches a second trajectory of a portion of the series of points in the acoustic space, wherein the posterior probability is determined utilizing the mathematical equation; - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification