Data shredding for speech recognition acoustic model training under data retention restrictions
First Claim
1. A method of enabling training of an acoustic model, the method comprising:
- dynamically shredding a speech corpus to produce text segments and depersonalized audio features corresponding to the text segments, the depersonalized audio features including filtered audio data remaining after speaker vocal characteristics and other audio characteristics have been removed, the speech corpus comprising a plurality of messages that each contain audio and corresponding text content, the shredding splitting each of the plurality of messages into strips, each strip comprising text segments and corresponding depersonalized audio features;
mixing up the strips of the text segments and corresponding depersonalized audio features to produce strips mixed up in randomized order; and
enabling a system to train an acoustic model using the strips mixed up in randomized order.
2 Assignments
0 Petitions
Accused Products
Abstract
Training speech recognizers, e.g., their language or acoustic models, using actual user data is useful, but retaining personally identifiable information may be restricted in certain environments due to regulations. Accordingly, a method or system is provided for enabling training of an acoustic model which includes dynamically shredding a speech corpus to produce text segments and depersonalized audio features corresponding to the text segments. The method further includes enabling a system to train an acoustic model using the text segments and the depersonalized audio features. Because the data is depersonalized, actual data may be used, enabling speech recognizers to keep up-to-date with user trends in speech and usage, among other benefits.
78 Citations
20 Claims
-
1. A method of enabling training of an acoustic model, the method comprising:
-
dynamically shredding a speech corpus to produce text segments and depersonalized audio features corresponding to the text segments, the depersonalized audio features including filtered audio data remaining after speaker vocal characteristics and other audio characteristics have been removed, the speech corpus comprising a plurality of messages that each contain audio and corresponding text content, the shredding splitting each of the plurality of messages into strips, each strip comprising text segments and corresponding depersonalized audio features; mixing up the strips of the text segments and corresponding depersonalized audio features to produce strips mixed up in randomized order; and enabling a system to train an acoustic model using the strips mixed up in randomized order. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for enabling training of an acoustic model, the system comprising:
-
a shredding module configured to shred a speech corpus dynamically to produce text segments and depersonalized audio features corresponding to the text segments, the depersonalized audio features including filtered audio data remaining after speaker vocal characteristics and other audio characteristics have been removed, the speech corpus comprising a plurality of messages that each contain audio and corresponding text content, the shredding splitting each of the plurality of messages into strips, each strip comprising text segments and corresponding depersonalized audio features; the shredding module further configured to mix up the strips of the text segments and corresponding depersonalized audio features to produce strips mixed up in randomized order; and an enabling module configured to enable a system to train an acoustic model using the strips mixed up in randomized order. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A computer program product comprising a non-transitory computer-readable medium storing instructions for performing a method of enabling training of an acoustic model, the instructions, when loaded and executed by a processor, cause the processor to:
-
dynamically shred a speech corpus to produce text segments and depersonalized audio features corresponding to the text segments, the depersonalized audio features including filtered audio data remaining after speaker vocal characteristics and other audio characteristics have been removed, the speech corpus comprising a plurality of messages that each contain audio and corresponding text content, the shredding splitting each of the plurality of messages into strips, each strip comprising text segments and corresponding depersonalized audio features; mix up the strips of the text segments and corresponding depersonalized audio features to produce strips mixed up in randomized order; and enable a system to train an acoustic model using the strips mixed up in randomized order.
-
Specification