Extended recognition dictionary learning device and speech recognition system
First Claim
1. An extended recognition dictionary learning device comprising:
- an utterance variation data calculating section configured to compare an acoustic model sequence obtained from a result of speech recognition for each of a plurality of speakers and a correct acoustic model sequence to calculate a correspondence between the models as utterance variation data;
an utterance variation data classifying section configured to classify the calculated utterance variation data into widely appearing utterance variations unevenly appearing utterance variations, the widely appearing utterance variations appearing independently of speakers in the calculated utterance variation data, and the unevenly appearing utterance variations appearing dependently of speakers in the calculated utterance variation data; and
a recognition dictionary extending section configured to define a plurality of utterance variation sets by combining the classified utterance variations and to generate a plurality of extended recognition dictionaries corresponding to the plurality of utterance variation sets by extending a recognition dictionary for each utterance variation set according to the utterance variations included in each utterance variation set, whereinthe plurality of utterance variation sets comprise;
a common utterance variation set that consists of only widely appearing utterance variations; and
utterance variation sets each of which is generated by combining widely appearing utterance variations and unevenly appearing utterance variations.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech recognition of even a speaker who uses a speech recognition system is enabled by using an extended recognition dictionary suited to the speaker without requiring any previous learning using an utterance label corresponding to the speech of the speaker. An extended recognition dictionary learning device includes an utterance variation data calculating section for comparing an acoustic model sequence output from a speech recognition result and an input correct acoustic model sequence to calculate a correspondence between the models as utterance variation data; an utterance variation data classifying section for classifying the calculated utterance variation data into widely appearing utterance variations and unevenly appearing utterance variations; and a recognition dictionary extending section for defining a plurality of utterance variation sets by combining the classified utterance variations and thereby extending the recognition dictionary for each utterance variation set according to the utterance variations included in each utterance variation set. A speech recognition device uses the extended recognition dictionary for each utterance variation set to output a speech recognition result.
27 Citations
11 Claims
-
1. An extended recognition dictionary learning device comprising:
-
an utterance variation data calculating section configured to compare an acoustic model sequence obtained from a result of speech recognition for each of a plurality of speakers and a correct acoustic model sequence to calculate a correspondence between the models as utterance variation data; an utterance variation data classifying section configured to classify the calculated utterance variation data into widely appearing utterance variations unevenly appearing utterance variations, the widely appearing utterance variations appearing independently of speakers in the calculated utterance variation data, and the unevenly appearing utterance variations appearing dependently of speakers in the calculated utterance variation data; and a recognition dictionary extending section configured to define a plurality of utterance variation sets by combining the classified utterance variations and to generate a plurality of extended recognition dictionaries corresponding to the plurality of utterance variation sets by extending a recognition dictionary for each utterance variation set according to the utterance variations included in each utterance variation set, wherein the plurality of utterance variation sets comprise;
a common utterance variation set that consists of only widely appearing utterance variations; and
utterance variation sets each of which is generated by combining widely appearing utterance variations and unevenly appearing utterance variations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An extended recognition dictionary learning method, comprising:
-
a step of comparing an acoustic model sequence obtained from a result of speech recognition for each of a plurality of speakers and a correct acoustic model sequence to calculate a correspondence between the models as utterance variation data; a step of classifying the calculated utterance variation data into widely appearing utterance variations and unevenly appearing utterance variations, the widely appearing utterance variations appearing independently of speakers in the calculated utterance variation data, and the unevenly appearing utterance variations appearing dependently of speakers in the calculated utterance variation data; and a step of defining a plurality of utterance variation sets by combining the classified utterance variations and generating a plurality of extended recognition dictionaries corresponding to the plurality of utterance variation sets by extending a recognition dictionary for each utterance variation set according to the utterance variations included in each utterance variation set, wherein the plurality of utterance variation sets comprise;
a common utterance variation set that consists of only widely appearing utterance variations; and
utterance variation sets each of which is generated by combining widely appearing utterance variations and unevenly appearing utterance variations.
-
-
11. A non-transitory storage medium having recorded thereon an extended recognition dictionary learning program which, when executed by a computer, causes the computer to execute:
-
a step of comparing an acoustic model sequence obtained from a result of speech recognition for each of a plurality of speakers and a correct acoustic model sequence to calculate a correspondence between the models as utterance variation data; a step of classifying the calculated utterance variation data into widely appearing utterance variations and unevenly appearing utterance variations, the widely appearing utterance variations appearing independently of speakers in the calculated utterance variation data, and the unevenly appearing utterance variations appearing dependently of speakers in the calculated utterance variation data; and a step of defining a plurality of utterance variation sets by combining the classified utterance variations and generating a plurality of extended recognition dictionaries corresponding to the plurality of utterance variation sets by extending a recognition dictionary for each utterance variation set according to the utterance variations included in each utterance variation set, wherein the plurality of utterance variation sets comprise;
a common utterance variation set that consists of only widely appearing utterance variations; and
utterance variation sets each of which is generated by combining widely appearing utterance variations and unevenly appearing utterance variations.
-
Specification