SAMPLE CLUSTERING TO REDUCE MANUAL TRANSCRIPTIONS IN SPEECH RECOGNITION SYSTEM
First Claim
Patent Images
1. A method of processing a plurality of training samples for an automatic speech recognition (ASR) application, the method comprising acts of:
- forming at least one cluster from the plurality of training samples, the at least one cluster including a number of the plurality of training samples, wherein the number equals two or more;
selecting at least one training sample from the at least one cluster;
obtaining at least one manually-processed data sample resulting from manual processing of the selected at least one training sample in the at least one cluster; and
assigning, to the at least one manually-processed data sample, a weighting factor based, at least in part, on the number of training samples in the cluster associated with the selected at least one manually-processed data sample.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for grouping a plurality of samples automatically transcribed from a plurality of utterances. The method comprises forming clusters from the plurality of samples, wherein the clusters include two or more of the plurality of samples. One or more samples are selected from a cluster and manually-processed data samples for the one or more samples are obtained. A weighting factor may be assigned to the data samples based, at least in part, on the number of samples in the cluster associated with the selected data sample.
194 Citations
26 Claims
-
1. A method of processing a plurality of training samples for an automatic speech recognition (ASR) application, the method comprising acts of:
-
forming at least one cluster from the plurality of training samples, the at least one cluster including a number of the plurality of training samples, wherein the number equals two or more; selecting at least one training sample from the at least one cluster; obtaining at least one manually-processed data sample resulting from manual processing of the selected at least one training sample in the at least one cluster; and assigning, to the at least one manually-processed data sample, a weighting factor based, at least in part, on the number of training samples in the cluster associated with the selected at least one manually-processed data sample. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. At least one non-transitory computer readable storage medium encoded with a plurality of instructions that, when executed by a computer, perform a method of processing a plurality of training samples for an automatic speech recognition (ASR) application, the method comprising acts of:
-
forming at least one cluster from the plurality of training samples, the at least one cluster including a number of the plurality of training samples, wherein the number equals two or more; selecting at least one training sample from the at least one cluster; obtaining at least one manually-processed data sample resulting from manual processing of the selected at least one training sample in the at least one cluster; and assigning, to the at least one manually-processed data sample, a weighting factor based, at least in part, on the number of training samples in the cluster associated with the selected at least one manually-processed data sample. - View Dependent Claims (17)
-
-
18. A computer system, comprising:
-
at least one storage device configured to store a plurality of instructions; and at least one processor programmed to execute the plurality of instructions to perform a method comprising acts of; forming at least one cluster from the plurality of training samples, the at least one cluster including a number of the plurality of training samples, wherein the number equals two or more; selecting at least one training sample from the at least one cluster; obtaining at least one manually-processed data sample resulting from manual processing of the selected at least one training sample in the at least one cluster; and assigning, to the at least one manually-processed data sample, a weighting factor based, at least in part, on the number of training samples in the cluster associated with the selected at least one manually-processed data sample. - View Dependent Claims (19, 20)
-
-
21. A method for updating a grammar using a plurality of data samples, the method comprising:
-
forming, with at least one processor, a cluster including at least two data samples of the plurality of data samples based, at least in part, on a similarity between the at least two data samples; selecting at least one data sample from the cluster; determining whether the at least one data sample is covered by the grammar; and updating the grammar based, at least in part, on the at least one data sample, when it is determined that the at least one data sample is not covered by the grammar. - View Dependent Claims (22, 23, 24, 25, 26)
-
Specification