Training speech recognition systems using word sequences
First Claim
1. A system comprising:
- one or more non-transitory computer readable media configured to store one or more instructions;
one or more processors communicatively coupled to the one or more non-transitory computer readable media, the one or more processors configured to execute the one or more instructions to cause or direct the system to perform operations, the operations comprising;
obtain first audio data of a communication session between two or more devices;
obtain a text string that is a transcription of the first audio data;
select a sequence of words from the text string as a first word sequence;
compare the first word sequence to a plurality of word sequences, each of the plurality of word sequences associated with a corresponding one of a plurality of counters;
in response to the first word sequence corresponding to one of the plurality of word sequences based on the comparison, increment a counter of the plurality of counters associated with the one of the plurality of word sequences; and
train a language model of an automatic transcription system using the plurality of word sequences and the plurality of counters.
3 Assignments
0 Petitions
Accused Products
Abstract
A method may include obtaining first audio data of a communication session between a first device and a second device, obtaining a text string that is a transcription of the first audio data, and selecting a contiguous sequence of words from the text string as a first word sequence. The method may further include comparing the first word sequence to multiple word sequences obtained before the communication session and in response to the first word sequence corresponding to one of the multiple word sequences, incrementing a counter of multiple counters associated with the one of the multiple word sequences. The method may also include deleting the text string and the first word sequence and training and after deleting the text string and the first word sequence, training a language model of an automatic transcription system using the multiple word sequences and the multiple counters.
304 Citations
20 Claims
-
1. A system comprising:
-
one or more non-transitory computer readable media configured to store one or more instructions; one or more processors communicatively coupled to the one or more non-transitory computer readable media, the one or more processors configured to execute the one or more instructions to cause or direct the system to perform operations, the operations comprising; obtain first audio data of a communication session between two or more devices; obtain a text string that is a transcription of the first audio data; select a sequence of words from the text string as a first word sequence; compare the first word sequence to a plurality of word sequences, each of the plurality of word sequences associated with a corresponding one of a plurality of counters; in response to the first word sequence corresponding to one of the plurality of word sequences based on the comparison, increment a counter of the plurality of counters associated with the one of the plurality of word sequences; and train a language model of an automatic transcription system using the plurality of word sequences and the plurality of counters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
obtaining first audio data of a communication session between two or more devices; obtaining a text string that is a transcription of the first audio data; selecting a sequence of words from the text string as a first word sequence; comparing the first word sequence to a plurality of word sequences, each of the plurality of word sequences associated with a corresponding one of a plurality of counters; in response to the first word sequence corresponding to one of the plurality of word sequences based on the comparison, incrementing a counter of the plurality of counters associated with the one of the plurality of word sequences; and training a language model of an automatic transcription system using the plurality of word sequences and the plurality of counters. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification