METHODS AND APPARATUS FOR ENHANCING SPEECH ANALYTICS
First Claim
1. A method for enhancing the analysis of at least one test word extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the method comprising:
- a first receiving step for receiving at least one training word extracted from a training audio source;
a first feature extraction step for extracting at least one first feature from each of the at least one training word, from the environment, or from the acoustic environment;
a second receiving step for receiving an indication whether the at least one training word appears in the training audio source; and
a model generation step for generating a model using the at least one training word and the at least one first feature, and the indication;
a third receiving step for receiving at least one test word extracted from the test audio source;
a second feature extraction step for extracting at least one second feature from the test audio source, from the environment or form the acoustic environment; and
a classification step for applying the word training model on the test word and the at least one second feature, thus obtaining a confidence score for the at least one test word.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features.
121 Citations
38 Claims
-
1. A method for enhancing the analysis of at least one test word extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the method comprising:
-
a first receiving step for receiving at least one training word extracted from a training audio source; a first feature extraction step for extracting at least one first feature from each of the at least one training word, from the environment, or from the acoustic environment; a second receiving step for receiving an indication whether the at least one training word appears in the training audio source; and a model generation step for generating a model using the at least one training word and the at least one first feature, and the indication; a third receiving step for receiving at least one test word extracted from the test audio source; a second feature extraction step for extracting at least one second feature from the test audio source, from the environment or form the acoustic environment; and a classification step for applying the word training model on the test word and the at least one second feature, thus obtaining a confidence score for the at least one test word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for enhancing the analysis of at least one test word extracted from a test audio source, the method operating within an environment having an acoustic environment, the method comprising the steps of:
-
a first receiving step for receiving at least one training word extracted from a training audio source; a first key phrase extraction step for extracting a training key phrase from the at least one training word according to a linguistic rule; a first feature extraction step for extracting at least one first feature from each of the at least one training word from the environment, or from the acoustic environment; a second receiving step for receiving tagging information relating to a significance level or an importance level of the training key phrase; a key phrase model generation step for generating a key phrase training model based on the training key phrase and the at least one first feature, and the tagging; a third receiving step for receiving at least one test word extracted from a test audio source; a second key phrase extraction step for extracting a test key phrase from the at least one test word according to the linguistic rule; a second feature extraction step for extracting at least one second feature from each of the at least one test key phrase, from the environment, or from the acoustic environment; and applying the key phrase training model on the test key phrase and the at least one second feature, thus obtaining an importance indication for the test key phrase. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. An apparatus for enhancing the analysis of at least one test word extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the apparatus comprising:
-
an extraction engine for extracting at least one feature from the test audio source or from a training audio source; a training engine for receiving an indication and generating a word training model between at least one training word and the at least one feature, and the indication; and a classification engine for applying the word training model on the at least one test word and the at least one feature, thus obtaining a confidence score for the at least one test word. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. An apparatus for enhancing the analysis of at least one test word extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the apparatus comprising:
-
a key phrase extraction component for extracting a training key phrase from at least one training word extracted from a training audio source, and a test key phrase from the at least one test word according to a linguistic rule, an extraction engine for extracting at least one feature from the test audio source or from a training audio source; a key phrase training component for receiving indications and generating a key phrase training model between the training key phrase and the at least one feature, and an indication; and a classification engine for applying the key phrase training model on the test key phrase and the at least one feature, thus obtaining an importance score for the test key phrase. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36)
-
-
37. A computer readable storage medium containing a set of instructions for a general purpose computer, the set of instructions comprising:
-
receiving at least one training void extracted from a training audio source captured within an environment and having acoustic environment; a first feature extraction step for extracting at least one first feature from each of the at least one training word, from the environment, or from the acoustic environment; receiving an indication whether the at least one training word appears in the training audio source; and a model generation step for generating a model using the at least one training word and the at least one first feature, and the indication; receiving at least one test word extracted from a test audio source; a second feature extraction step for extracting at least one second feature from the test audio source or from an environment or from an acoustic environment of the test audio source; and a classification step for applying the word training model on the at least one test word and the at least one second feature, thus obtaining an confidence score for the at least one test word.
-
-
38. A computer readable storage medium containing a set of instructions for a general purpose computer, the set of instructions comprising:
-
receiving at least one training word extracted from a training audio source captured within an environment and having acoustic environment; a first key phrase extraction step for extracting a training key phrase from the at least one training word according to a linguistic rule; a first feature extraction step for extracting at least one first feature from each of the at least one training word, from the environment, or from the acoustic environment; receiving tagging information relating to a significance level or an importance level of lie training key phrase; a key phrase model generation step for generating a key phrase training model based on the training key phrase and the at least one first feature, and the tagging; receiving at least one test word extracted from a test audio source captured within an environment and having acoustic environment; a second key phrase extraction step for extracting a test key phrase from the at least one test word according to the linguistic rule; a second feature extraction step for extracting at least one second feature from the test key phrase, from the environment, or from the acoustic environment; and applying the key phrase training model on the test key phrase and the at least one second feature, thus obtaining an importance indication for the test key phrase.
-
Specification