Material selection for language model customization in speech recognition for speech analytics
First Claim
Patent Images
1. A method for customizing a language model for speech recognition in a context, the method comprising:
- receiving, by a processor, non-speech text from the context, the context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications;
selecting, by the processor, text from the non-speech text;
converting, by the processor, the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text;
customizing, by the processor, a language model for the context using the converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and
outputting, by the processor, the language model.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for extracting, from non-speech text, training data for a language model for speech recognition includes: receiving, by a processor, non-speech text; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected text to generate converted text comprising a plurality of phrases consistent with speech transcription text; training, by the processor, a language model using the converted text; and outputting, by the processor, the language model.
-
Citations
18 Claims
-
1. A method for customizing a language model for speech recognition in a context, the method comprising:
-
receiving, by a processor, non-speech text from the context, the context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications; selecting, by the processor, text from the non-speech text; converting, by the processor, the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text; customizing, by the processor, a language model for the context using the converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and outputting, by the processor, the language model. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for selecting, from non-speech text, training data for a language model for speech recognition, the method comprising:
-
training, by a processor, a non-speech language model based on the non-speech text; for each unique sentence of the non-speech text; computing and normalizing, by the processor, an out-of-domain score of the unique sentence based on non-speech language model; computing and normalizing, by the processor, an in-domain score of the unique sentence based on a speech transcription language model trained based on generic speech transcription training data; comparing, by the processor, the out-of-domain score to the in-domain score; and adding, by the processor, the unique sentence to an output set of selected text in response to determining that the in-domain score exceeds the out-of-domain score by a threshold; and outputting, by the processor, the output set of selected text. - View Dependent Claims (7)
-
-
8. A method for selecting, from non-speech text, training data for a language model for speech recognition, the method comprising:
-
initializing, by a processor, an output set of selected text based a plurality of sentences sampled from the non-speech text; for each unique sentence of the non-speech text; computing, by the processor, a first divergence between an in-domain language model trained on generic speech transcript text the unique sentence and a language model trained on the output set; computing, by the processor, a second divergence between the in-domain language model and a language model trained on the output set combined with the unique sentence; comparing, by the processor, the first divergence and the second divergence; and adding, by the processor, the sentence to the output set in response to determining that the second divergence in less than the first divergence; and outputting, by the processor, the output set of selected text. - View Dependent Claims (9)
-
-
10. A system comprising:
-
a processor; memory storing instructions that, when executed by the processor, cause the processor to; receive non-speech text from a context comprising communications with an enterprise, the communications comprising voice interactions and non-speech communications; select text from the non-speech text; convert the selected non-speech text to generate converted non-speech text comprising a plurality of phrases consistent with speech transcription text; customize a language model for the context using a converted non-speech text, the language model being customized to compute a probability that a given speech input phrase appears in voice interactions in the context of the communications with the enterprise; and output the language model. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A system comprising:
-
a processor; and memory storing instructions that, when executed by the processor, cause the processor to; train a non-speech language model based on the non-speech text; for each unique sentence of the non-speech text; compute and normalize an out-of-domain score of the unique sentence based on non-speech language model; compute and normalize an in-domain score of the unique sentence based on a speech transcription language model trained based on generic speech transcription training data; compare the out-of-domain score to the in-domain score; and add the unique sentence to an output set of selected text in response to determining that the in-domain score exceeds the out-of-domain score by a threshold; and output the output set of selected text. - View Dependent Claims (16)
-
-
17. A system comprising:
-
a processor; and memory storing instructions that, when executed by the processor, cause the processor to; initialize an output set of selected text based a plurality of sentences sampled from the non-speech text; for each unique sentence of the non-speech text; compute a first divergence between an in-domain language model trained on generic speech transcript text the unique sentence and a language model trained on the output set; compute a second divergence between the in-domain language model and a language model trained on the output set combined with the unique sentence; compare the first divergence and the second divergence; and add the sentence to the output set in response to determining that the second divergence in less than the first divergence; and output the output set of selected text. - View Dependent Claims (18)
-
Specification