METHODS AND APPARATUS FOR ENHANCING SPEECH ANALYTICS

US 20090292541A1
Filed: 05/25/2008
Published: 11/26/2009
Est. Priority Date: 05/25/2008
Status: Active Grant

First Claim

Patent Images

1. A method for enhancing the analysis of at least one test word extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the method comprising:

a first receiving step for receiving at least one training word extracted from a training audio source;

a first feature extraction step for extracting at least one first feature from each of the at least one training word, from the environment, or from the acoustic environment;

a second receiving step for receiving an indication whether the at least one training word appears in the training audio source; and

a model generation step for generating a model using the at least one training word and the at least one first feature, and the indication;

a third receiving step for receiving at least one test word extracted from the test audio source;

a second feature extraction step for extracting at least one second feature from the test audio source, from the environment or form the acoustic environment; and

a classification step for applying the word training model on the test word and the at least one second feature, thus obtaining a confidence score for the at least one test word.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features.

121 Citations

View as Search Results

38 Claims

1. A method for enhancing the analysis of at least one test word extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the method comprising:
- a first receiving step for receiving at least one training word extracted from a training audio source;
  
  a first feature extraction step for extracting at least one first feature from each of the at least one training word, from the environment, or from the acoustic environment;
  
  a second receiving step for receiving an indication whether the at least one training word appears in the training audio source; and
  
  a model generation step for generating a model using the at least one training word and the at least one first feature, and the indication;
  
  a third receiving step for receiving at least one test word extracted from the test audio source;
  
  a second feature extraction step for extracting at least one second feature from the test audio source, from the environment or form the acoustic environment; and
  
  a classification step for applying the word training model on the test word and the at least one second feature, thus obtaining a confidence score for the at least one test word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 further comprising a first text extraction step for extracting the at least one training word from the training audio source, or a second text extraction step for extracting the at least one test word from the test audio source.
  - 3. The method of claim 1 further comprising a natural language processing step for analyzing the at least one test word or the at least one training word.
  - 4. The method of claim 3 wherein the natural language processing step comprises part of speech analysis step for tagging the at least one test word or the at least one training word into a part of speech, or a stemming step for stemming the at least one test word or the at least one training word.
  - 5. The method of claim 1 wherein the at least one first feature relates to a second audio source.
  - 6. The method of claim 1 wherein the first feature extraction step or the second feature extraction step comprise extracting at least one item selected from the group consisting of:
    - an acoustic feature;
      
      phonetic data;
      
      computer telephony integration information;
      
      number of characters of the at least one test word or at least one training word;
      
      frequency of the at least one test word or at least one training word;
      
      accumulated frequency of the at least one test word or at least one training word in multiple audio sources;
      
      text length;
      
      word stem;
      
      phonemes that construct the at least one test word or at least one training word;
      
      adjacent words;
      
      speech to text certainty;
      
      relative position of the at least one test word in the test audio source;
      
      relative position of the at least one training word in the test audio source;
      
      speaker side in which the at least one test word or at least one training word is said;
      
      part of speech of the at least one test word or at least one training word;
      
      part of speech of adjacent words;
      
      emotional level of the at least one test word or at least one training word;
      
      overlap of the at least one test word or at least one training word with talkover;
      
      laughter or another emotion expression;
      
      conversational data, textual data, and linguistic features.
  - 7. The method of claim 1 wherein the indication comprises transcription of the training audio source or part thereof.
  - 8. The method of claim 1 wherein the indication comprises indication whether the training word was said within the training audio source or not.
  - 9. The method of claim 1 further comprising a phonetic search step for searching for the at least one test word within the test audio source.
  - 10. The method of claim 1 further comprising the steps of:
    - a first key phrase extraction step for extracting a training key phrase from the training data according to a linguistic rule;
      
      receiving tagging information relating to a significance level or an importance level of the training key phrase;
      
      a key phrase model generation step for generating a key phrase training model between the training key phrase and the at least one first feature, and the tagging;
      
      a second key phrase extraction step for extracting a test key phrase from the at least one test word according to the linguistic rule; and
      
      applying the key phrase training model on the test key phrase and the at least one second feature, thus obtaining an importance indication for the test key phrase.

11. A method for enhancing the analysis of at least one test word extracted from a test audio source, the method operating within an environment having an acoustic environment, the method comprising the steps of:
- a first receiving step for receiving at least one training word extracted from a training audio source;
  
  a first key phrase extraction step for extracting a training key phrase from the at least one training word according to a linguistic rule;
  
  a first feature extraction step for extracting at least one first feature from each of the at least one training word from the environment, or from the acoustic environment;
  
  a second receiving step for receiving tagging information relating to a significance level or an importance level of the training key phrase;
  
  a key phrase model generation step for generating a key phrase training model based on the training key phrase and the at least one first feature, and the tagging;
  
  a third receiving step for receiving at least one test word extracted from a test audio source;
  
  a second key phrase extraction step for extracting a test key phrase from the at least one test word according to the linguistic rule;
  
  a second feature extraction step for extracting at least one second feature from each of the at least one test key phrase, from the environment, or from the acoustic environment; and
  
  applying the key phrase training model on the test key phrase and the at least one second feature, thus obtaining an importance indication for the test key phrase.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11 further comprising a first text extraction step for extracting the at least one training word from the training audio source, or a second text extraction step for extracting the at least one test word from the test audio source.
  - 13. The method of claim 11 further comprising a natural language processing step for analyzing the at least one test word or the at least one training word.
  - 14. The method of claim 13 wherein the natural language processing step comprises part of speech analysis step for tagging the at least one test word or the at least one training word into a part of speech, or a stemming step for stemming the at least one test word or the at least one training word.
  - 15. The method of claim 11 wherein the at least one first feature relates to a second audio source.
  - 16. The method of claim 11 wherein the first feature extraction step or the second feature extraction step comprise extracting at least one item selected from the group consisting of:
    - number of tokens in the test key phrase or in the training key phrase;
      
      number of characters of a word in the test key phrase or in the training key phrase;
      
      test key phrase or training key phrase frequency within the test audio source or training audio source;
      
      total text length;
      
      word stems of words comprised in the test key phrase or in the training key phrase;
      
      phonemes comprised in a word in the test key phrase or in the training key phrase;
      
      adjacent words to the test key phrase or to the training key phrase;
      
      average speech-to-text certainty of words in the test key phrase or in the training key phrase;
      
      relative position of a first instance of the test key phrase or the training key phrase within the extracted text;
      
      speaker side;
      
      part of speech of a word of the test key phrase or the training key phrase;
      
      part of speech of adjacent words to a word of the test key phrase or the training key phrase;
      
      emotion degree within a word of the test key phrase or the training key phrase; and
      
      overlap with talkover or laughter indications.
  - 17. The method of claim 11 wherein the indication comprises indication whether the at least one training word was said within the training audio source or not.

18. An apparatus for enhancing the analysis of at least one test word extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the apparatus comprising:
- an extraction engine for extracting at least one feature from the test audio source or from a training audio source;
  
  a training engine for receiving an indication and generating a word training model between at least one training word and the at least one feature, and the indication; and
  
  a classification engine for applying the word training model on the at least one test word and the at least one feature, thus obtaining a confidence score for the at least one test word.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 19. The apparatus of claim 18 further comprising a speech to text engine for extracting the at least one test word or the at least one training word from the test audio source or from a training audio source.
  - 20. The apparatus of claim 18 further comprising a natural language processing engine for analyzing the at least one test word or the at least one training word.
  - 21. The apparatus of claim 20 wherein the natural language processing engine comprises a part of speech analysis engine for tagging the at least one test word or the at least one training word into a part of speech, or a stemming engine for stemming the at least one test word or the at least one training word.
  - 22. The apparatus of claim 18 wherein the at least one feature relates to a second audio source.
  - 23. The apparatus of claim 18 wherein the extraction engine extracts at least one item selected from the group consisting of:
    - an acoustic feature;
      
      phonetic data;
      
      computer telephony integration information;
      
      number of characters of the at least one test word or at least one training word;
      
      frequency of the at least one test word or at least one training word;
      
      accumulated frequency of the at least one test word or at least one training word in multiple audio sources;
      
      text length;
      
      word stem;
      
      phonemes that construct the at least one test word or at least one training word;
      
      adjacent words;
      
      speech to text certainty;
      
      relative position of the at least one test word in the test audio source;
      
      relative position of the at least one training word in the test audio source;
      
      speaker side in which the at least one test word or at least one training word is said;
      
      part of speech of the at least one test word or at least one training word;
      
      part of speech of adjacent words;
      
      emotional level of the at least one test word or at least one training word;
      
      overlap of the at least one test word or at least one training word with talkover;
      
      laughter or another emotion expression;
      
      conversational data;
      
      textual data; and
      
      linguistic features.
  - 24. The apparatus of claim 18 wherein the indication comprises transcription of the audio source or part thereof.
  - 25. The apparatus of claim 18 wherein the indication comprises indication whether the at least one training word was said within the audio source or not.
  - 26. The apparatus of claim 18 further comprising a key phrase extraction component for extracting a training key phrase from the at least one training word and a test key phrase from the at least one test word according to a linguistic rule,wherein the training engine further receives key phrase indications and generates a key phrase training model between the training key phrase and the at least one feature, and the indication,and wherein the classification engine applies the key phrase training model on the test key phrase and the at least one feature, thus obtaining an importance indication for the test key phrase.
  - 27. The apparatus of claim 18 wherein the indication indicates whether the training word was said within the audio source.
  - 28. The apparatus of claim 18 further comprising a capturing or logging component for capturing the audio source and a storage component for storing the audio source or the at least one test word or the at least one training word or a test key phrase or a training key phrase or the test word model or the key phrase model.

29. An apparatus for enhancing the analysis of at least one test word extracted from a test audio source, the test audio source captured within an environment and having an acoustic environment, the apparatus comprising:
- a key phrase extraction component for extracting a training key phrase from at least one training word extracted from a training audio source, and a test key phrase from the at least one test word according to a linguistic rule,an extraction engine for extracting at least one feature from the test audio source or from a training audio source;
  
  a key phrase training component for receiving indications and generating a key phrase training model between the training key phrase and the at least one feature, and an indication; and
  
  a classification engine for applying the key phrase training model on the test key phrase and the at least one feature, thus obtaining an importance score for the test key phrase.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36)
- - 30. The apparatus of claim 29 further comprising a speech to text engine for extracting the at least one test word or the at least one training word from the test audio source or from a training audio source.
  - 31. The apparatus of claim 29 further comprising a natural language processing engine for analyzing the at least one test word or the at least one training word or the test key phrase or the training key phrase.
  - 32. The apparatus of claim 31 wherein the natural language processing engine comprises a part of speech analysis engine for tagging the at least one test word or the at least one training word into a part of speech, or a stemming engine for stemming the test word or the training word.
  - 33. The apparatus of claim 29 wherein the at least one feature relates to a second audio source.
  - 34. The apparatus of claim 29 wherein the extraction engine extracts at least one item selected from the group consisting of:
    - number of tokens in the test key phrase or the training key phrase;
      
      number of characters of a word in the test key phrase or the training key phrase;
      
      word frequency within the test audio source or training audio source;
      
      text length;
      
      word stems of words comprised in the test key phrase or the training key phrase;
      
      phonemes comprised in a word in the test key phrase or the training key phrase;
      
      adjacent words to the test key phrase or the training key phrase;
      
      average speech-to-text certainty of word in the test key phrase or the training key phrase;
      
      relative position of a first instance of the test key phrase or the training key phrase within the extracted text;
      
      speaker side;
      
      part of speech of a word of the test key phrase or the training key phrase;
      
      part of speech of adjacent words to a word of the test key phrase or the training key phrase;
      
      emotion degree within a word of the test key phrase or the training key phrase; and
      
      overlap with talkover or laughter indications.
  - 35. The apparatus of claim 29 wherein the indication indicates to what extent the training key phrase is important or significant, and wherein the training engine further receives key phrase indications and generates a key phrase training model between the training key phrase and the at least one feature, and the indication, and wherein the classification engine applies the key phrase training model on the test key phrase and the at least one feature, thus obtaining an importance indication for the test key phrase.
  - 36. The apparatus of claim 29 further comprising a capturing or logging component for capturing the audio source and a storage component for storing the audio source or the at least one test word or the at least one training word or a test key phrase or a training key phrase or the key phrase model or the test word and key phrase model.

37. A computer readable storage medium containing a set of instructions for a general purpose computer, the set of instructions comprising:
- receiving at least one training void extracted from a training audio source captured within an environment and having acoustic environment;
  
  a first feature extraction step for extracting at least one first feature from each of the at least one training word, from the environment, or from the acoustic environment;
  
  receiving an indication whether the at least one training word appears in the training audio source; and
  
  a model generation step for generating a model using the at least one training word and the at least one first feature, and the indication;
  
  receiving at least one test word extracted from a test audio source;
  
  a second feature extraction step for extracting at least one second feature from the test audio source or from an environment or from an acoustic environment of the test audio source; and
  
  a classification step for applying the word training model on the at least one test word and the at least one second feature, thus obtaining an confidence score for the at least one test word.

38. A computer readable storage medium containing a set of instructions for a general purpose computer, the set of instructions comprising:
- receiving at least one training word extracted from a training audio source captured within an environment and having acoustic environment;
  
  a first key phrase extraction step for extracting a training key phrase from the at least one training word according to a linguistic rule;
  
  a first feature extraction step for extracting at least one first feature from each of the at least one training word, from the environment, or from the acoustic environment;
  
  receiving tagging information relating to a significance level or an importance level of lie training key phrase;
  
  a key phrase model generation step for generating a key phrase training model based on the training key phrase and the at least one first feature, and the tagging;
  
  receiving at least one test word extracted from a test audio source captured within an environment and having acoustic environment;
  
  a second key phrase extraction step for extracting a test key phrase from the at least one test word according to the linguistic rule;
  
  a second feature extraction step for extracting at least one second feature from the test key phrase, from the environment, or from the acoustic environment; and
  
  applying the key phrase training model on the test key phrase and the at least one second feature, thus obtaining an importance indication for the test key phrase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nice Ltd
Original Assignee
Nice Systems Limited (Nice Ltd)
Inventors
Pereg, Oren, Daya, Ezra, Wasserblat, Moshe, Lubowich, Yuval

Granted Patent

US 8,145,482 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/063   Training

G10L 15/18   using natural language mode...

G10L 2015/088   Word spotting

G10L 2015/226   using non-speech characteri...

METHODS AND APPARATUS FOR ENHANCING SPEECH ANALYTICS

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

121 Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS AND APPARATUS FOR ENHANCING SPEECH ANALYTICS

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

121 Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links