Material selection for language model customization in speech recognition for speech analytics

US 10,311,859 B2
Filed: 08/25/2016
Issued: 06/04/2019
Est. Priority Date: 01/16/2016
Status: Active Grant

First Claim

Patent Images

1. A method for customizing a language model for speech recognition in a context, the method comprising:

receiving, by a processor, non-speech text from the context, the context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications;

selecting, by the processor, text from the non-speech text;

converting, by the processor, the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text;

customizing, by the processor, a language model for the context using the converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and

outputting, by the processor, the language model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for extracting, from non-speech text, training data for a language model for speech recognition includes: receiving, by a processor, non-speech text; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected text to generate converted text comprising a plurality of phrases consistent with speech transcription text; training, by the processor, a language model using the converted text; and outputting, by the processor, the language model.

Citations

18 Claims

1. A method for customizing a language model for speech recognition in a context, the method comprising:
- receiving, by a processor, non-speech text from the context, the context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications;
  
  selecting, by the processor, text from the non-speech text;
  
  converting, by the processor, the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text;
  
  customizing, by the processor, a language model for the context using the converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and
  
  outputting, by the processor, the language model.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the non-speech text comprises at least one from the group consisting of:
    - an email;
      
      a forum post;
      
      a transcript of a text chat interaction;
      
      ora text message.
  - 3. The method of claim 1, wherein the converting the selected non-speech text comprises:
    - removing metadata from the non-speech text;
      
      splitting the non-speech text into a plurality of sentences;
      
      converting one or more words of the sentences to spoken form;
      
      correcting one or more spelling errors in the sentences;
      
      identifying one or more duplicate sentences; and
      
      removing duplicate sentences.
  - 4. The method of claim 1, wherein the selecting the text comprises:
    - for each in-vocabulary word in a lexicon of in-vocabulary words,identifying one or more sentences containing the in-vocabulary word;
      
      counting the one or more sentences to identify a count of the in-vocabulary word in the non-speech text;
      
      comparing the count to a first threshold; and
      
      adding the identified one or more sentences containing the in-vocabulary word in response to determining that the count satisfies the first threshold;
      
      identifying one or more out-of-vocabulary words comprising words that are in the sentences and not in the lexicon;
      
      for each out-of-vocabulary word of the out-of-vocabulary words;
      
      identifying one or more sentences containing the out-of-vocabulary word;
      
      counting the one or more sentences to identify a count of the out-of-vocabulary word in the non-speech text;
      
      comparing the count to a second threshold;
      
      computing a first likelihood of encountering the out-of-vocabulary word in the sentence among all of the identified sentences;
      
      identifying one or more spelling suggestions for the out-of-vocabulary word;
      
      computing a plurality of second likelihoods, each of the second likelihoods corresponding to a second likelihood of encountering each of the spelling suggestions in the sentence;
      
      adding the identified sentences to an output set of selected text in response to determining that the count satisfies a threshold and that at all of the second likelihoods are less than the first likelihood; and
      
      outputting the output set of selected text.
  - 5. The method of claim 4, wherein the computing the first likelihood comprises counting occurrences of the out-of-vocabulary word preceded by one or more history words in the non-speech text;
    - andwherein the computing one of the second likelihoods comprises counting occurrences of a corresponding spelling suggestion of the spelling suggestions preceded by the one or more history words in the non-speech text.

6. A method for selecting, from non-speech text, training data for a language model for speech recognition, the method comprising:
- training, by a processor, a non-speech language model based on the non-speech text;
  
  for each unique sentence of the non-speech text;
  
  computing and normalizing, by the processor, an out-of-domain score of the unique sentence based on non-speech language model;
  
  computing and normalizing, by the processor, an in-domain score of the unique sentence based on a speech transcription language model trained based on generic speech transcription training data;
  
  comparing, by the processor, the out-of-domain score to the in-domain score; and
  
  adding, by the processor, the unique sentence to an output set of selected text in response to determining that the in-domain score exceeds the out-of-domain score by a threshold; and
  
  outputting, by the processor, the output set of selected text.
- View Dependent Claims (7)
- - 7. The method of claim 6, further comprising scaling a count of each unique sentence in the output set by P(s), where:
    - P(s)=e^IDScr′where s is the unique sentence and where IDScr′
      
      is the in-domain score of the unique sentence.

8. A method for selecting, from non-speech text, training data for a language model for speech recognition, the method comprising:
- initializing, by a processor, an output set of selected text based a plurality of sentences sampled from the non-speech text;
  
  for each unique sentence of the non-speech text;
  
  computing, by the processor, a first divergence between an in-domain language model trained on generic speech transcript text the unique sentence and a language model trained on the output set;
  
  computing, by the processor, a second divergence between the in-domain language model and a language model trained on the output set combined with the unique sentence;
  
  comparing, by the processor, the first divergence and the second divergence; and
  
  adding, by the processor, the sentence to the output set in response to determining that the second divergence in less than the first divergence; and
  
  outputting, by the processor, the output set of selected text.
- View Dependent Claims (9)
- - 9. The method of claim 8, wherein the computing the second divergence comprises calculating a cross-entropy of the in-domain language model and the language model trained on the output set.

10. A system comprising:
- a processor;
  
  memory storing instructions that, when executed by the processor, cause the processor to;
  
  receive non-speech text from a context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications;
  
  select text from the non-speech text;
  
  convert the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text;
  
  customize a language model for the context using a converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and
  
  output the language model.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The system of claim 10, wherein the non-speech text comprises-at least one from the group consisting of:
    - an email;
      
      a forum post;
      
      a transcript of a text chat interaction;
      
      ora text message.
  - 12. The system of claim 10, wherein the memory further stores instructions that, when executed by the processor, cause the processor to convert the selected non-speech text by:
    - removing metadata from the non-speech text;
      
      splitting the non-speech text into a plurality of sentences;
      
      converting one or more words of the sentences to spoken form;
      
      correcting one or more spelling errors in the sentences;
      
      identifying one or more duplicate sentences; and
      
      removing duplicate sentences.
  - 13. The system of claim 10, wherein the memory further stores instructions that, when executed by the processor, cause the processor to select the text by:
    - for each in-vocabulary word in a lexicon of in-vocabulary words,identifying one or more sentences containing the in-vocabulary word;
      
      counting the one or more sentences to identify a count of the in-vocabulary word in the non-speech text;
      
      comparing the count to a first threshold; and
      
      adding the identified one or more sentences containing the in-vocabulary word in response to determining that the count satisfies the first threshold;
      
      identifying one or more out-of-vocabulary words comprising words that are in the sentences and not in the lexicon;
      
      for each out-of-vocabulary word of the out-of-vocabulary words;
      
      identifying one or more sentences containing the out-of-vocabulary word;
      
      counting the one or more sentences to identify a count of the out-of-vocabulary word in the non-speech text;
      
      comparing the count to a second threshold;
      
      computing a first likelihood of encountering the out-of-vocabulary word in the sentence among all of the identified sentences;
      
      identifying one or more spelling suggestions for the out-of-vocabulary word;
      
      computing a plurality of second likelihoods, each of the second likelihoods corresponding to a second likelihood of encountering each of the spelling suggestions in the sentence;
      
      adding the identified sentences to an output set of selected text in response to determining that the count satisfies a threshold and that at all of the second likelihoods are less than the first likelihood; and
      
      outputting the output set of selected text.
  - 14. The system of claim 13, wherein the computing the first likelihood comprises counting occurrences of the out-of-vocabulary word preceded by one or more history words in the non-speech text;
    - andwherein the computing one of the second likelihoods comprises counting occurrences of a corresponding spelling suggestion of the spelling suggestions preceded by the one or more history words in the non-speech text.

15. A system comprising:
- a processor; and
  
  memory storing instructions that, when executed by the processor, cause the processor to;
  
  train a non-speech language model based on the non-speech text;
  
  for each unique sentence of the non-speech text;
  
  compute and normalize an out-of-domain score of the unique sentence based on non-speech language model;
  
  compute and normalize an in-domain score of the unique sentence based on a speech transcription language model trained based on generic speech transcription training data;
  
  compare the out-of-domain score to the in-domain score; and
  
  add the unique sentence to an output set of selected text in response to determining that the in-domain score exceeds the out-of-domain score by a threshold; and
  
  output the output set of selected text.
- View Dependent Claims (16)
- - 16. The system of claim 15, wherein the memory further stores instructions that, when executed by the processor, cause the processor to scale a count of each unique sentence in the output set by P(s), where:
    - P(s)=e^IDScr′where s is the unique sentence and where IDScr′
      
      is the in-domain score of the unique sentence.

17. A system comprising:
- a processor; and
  
  memory storing instructions that, when executed by the processor, cause the processor to;
  
  initialize an output set of selected text based a plurality of sentences sampled from the non-speech text;
  
  for each unique sentence of the non-speech text;
  
  compute a first divergence between an in-domain language model trained on generic speech transcript text the unique sentence and a language model trained on the output set;
  
  compute a second divergence between the in-domain language model and a language model trained on the output set combined with the unique sentence;
  
  compare the first divergence and the second divergence; and
  
  add the sentence to the output set in response to determining that the second divergence in less than the first divergence; and
  
  output the output set of selected text.
- View Dependent Claims (18)
- - 18. The system of claim 17, wherein the memory further stores instructions that, when executed by the processor, cause the processor to compute the second divergence by calculating a cross-entropy of the in-domain language model and the language model trained on the output set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Genesys Cloud Services Incorporated
Original Assignee
Genesys Telecommunications Laboratories Incorporated (Genesys Cloud Services Incorporated)
Inventors
Lev-Tov, Amir, Faizakof, Avraham, Tapuhi, Tamir, Konig, Yochai
Primary Examiner(s)
Riley, Marcus T

Application Number

US15/247,656
Publication Number

US 20170206891A1
Time in Patent Office

1,013 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 40/232   Orthographic correction, e....

G06N 20/00   Machine learning

G06N 3/006   based on simulated virtual ...

G10L 15/063   Training

G10L 15/183   using context dependencies,...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0635   updating or merging of old ...

G10L 2015/0636   Threshold criteria for the ...

G10L 2015/088   Word spotting

Material selection for language model customization in speech recognition for speech analytics

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Material selection for language model customization in speech recognition for speech analytics

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links