Data shredding for speech recognition language model training under data retention restrictions
First Claim
1. A method for training a language model of an automatic speech recognition system, the method comprising:
- producing segments of text in a text corpus and counts corresponding to the segments of text, the text corpus being in a depersonalized state, the producing including dynamically shredding the text corpus into the segments of text in the depersonalized state;
further depersonalizing the segments of text based on the corresponding counts, each count representing a number of occurrences of a respective segment of text in the text corpus; and
enabling an automatic speech recognition system to train a language model using the segments of text in the depersonalized state and the counts.
2 Assignments
0 Petitions
Accused Products
Abstract
Training speech recognizers, e.g., their language or acoustic models, using actual user data is useful, but retaining personally identifiable information may be restricted in certain environments due to regulations. Accordingly, a method or system is provided for enabling training of a language model which includes producing segments of text in a text corpus and counts corresponding to the segments of text, the text corpus being in a depersonalized state. The method further includes enabling a system to train a language model using the segments of text in the depersonalized state and the counts. Because the data is depersonalized, actual data may be used, enabling speech recognizers to keep up-to-date with user trends in speech and usage, among other benefits.
-
Citations
22 Claims
-
1. A method for training a language model of an automatic speech recognition system, the method comprising:
-
producing segments of text in a text corpus and counts corresponding to the segments of text, the text corpus being in a depersonalized state, the producing including dynamically shredding the text corpus into the segments of text in the depersonalized state; further depersonalizing the segments of text based on the corresponding counts, each count representing a number of occurrences of a respective segment of text in the text corpus; and enabling an automatic speech recognition system to train a language model using the segments of text in the depersonalized state and the counts. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for training a language model of an automated speech recognition system, the system comprising:
-
at least one processor configured to implement; a segmentation module configured to produce segments of text in a text corpus and counts corresponding to the segments of text, the text corpus being in a depersonalized state, the segments of text produced by dynamically shredding the text corpus into the segments of text in the depersonalized state; a depersonalization module configured to further depersonalize the segments of text based on the corresponding counts, each count representing a number of occurrences of a respective segment of text in the text corpus; and an enabling module configured to enable an automated speech recognition system to train a language model using the segments of text in the depersonalized state and the counts. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A computer program product comprising a non-transitory computer-readable medium storing instructions for performing a method for training a language model of an automatic speech recognition system, the instructions, when loaded and executed by a processor, cause the processor to:
-
produce segments of text in a text corpus and counts corresponding to the segments of text, the text corpus being in a depersonalized state, the segments of text produced by dynamically shredding the text corpus into the segments of text in the depersonalized state; further depersonalize the segments of text based on the corresponding counts, each count representing a number of occurrences of a respective segment of text in the text corpus; and enable an automated speech recognition system to train a language model using the segments of text in the depersonalized state and the counts.
-
Specification