INFORMATION PROCESSING APPARATUS, METHOD AND RECORDING MEDIUM FOR GENERATING ACOUSTIC MODEL
First Claim
1. An information apparatus that generates a first acoustic model for speech recognition, the information apparatus comprising:
- a first speech dataset storing speech data uttered by low recognition rate speakers;
a second speech dataset storing speech data uttered by a plurality of speakers;
a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset;
a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset;
a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and
an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.
169 Citations
20 Claims
-
1. An information apparatus that generates a first acoustic model for speech recognition, the information apparatus comprising:
-
a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset. - View Dependent Claims (5, 7, 9, 11, 12, 15, 16)
-
-
2. An information apparatus that generates a first acoustic model for speech recognition, the information apparatus comprising:
-
a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a fourth speech dataset storing speech data serving as a candidate to be mixed with speech data of the second speech dataset; a third speech dataset storing speech data that is derived from the fourth speech dataset and mixed with speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of speech data in the fourth speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset. - View Dependent Claims (3, 4, 6, 8, 10, 13, 14, 17, 18)
-
-
19. A computer-readable recording medium storing a computer program for causing a computer including a processor and a memory to function as an information apparatus, the computer program causing the computer to execute operations of:
-
storing a first speech dataset storing speech data uttered by low recognition rate speakers, a second speech dataset storing speech data uttered by a plurality of speakers, and a third speech dataset storing speech data that is mixed with speech data of the second speech dataset, the first, second, and third speech datasets being stored in the memory by the processor; obtaining, for each piece of speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset, by the processor; recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset, by the processor; and generating an acoustic model by the processor using the speech data recorded in the second speech dataset and the third speech dataset.
-
-
20. A method for making a computer program cause a computer including a processor and a memory to function as an information apparatus, the method comprising:
-
storing a first speech dataset storing speech data uttered by low recognition rate speakers, a second speech dataset storing speech data uttered by a plurality of speakers, and a third speech dataset storing speech data that is mixed with speech data of the second speech dataset, the first, second and third speech datasets being stored in the memory by the processor; obtaining, for each piece of speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset by the processor; recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset by the processor; and generating an acoustic model by the processor using the speech data recorded in the second speech dataset and the third speech dataset by the processor.
-
Specification