Acoustic model training device, acoustic model training method, voice recognition device, and voice recognition method
First Claim
1. An acoustic model training device comprising:
- a processor to execute a program; and
a memory to store the program which, when executed by the processor, performs processes of;
generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker;
generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and
training an acoustic model using the training data item of each speaker and the training data item of all the speakers.
1 Assignment
0 Petitions
Accused Products
Abstract
An acoustic model training device includes: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of: generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker; generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and training an acoustic model using the training data item of each speaker and the training data item of all the speakers.
-
Citations
5 Claims
-
1. An acoustic model training device comprising:
-
a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of; generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker; generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and training an acoustic model using the training data item of each speaker and the training data item of all the speakers.
-
-
2. A voice recognition device comprising:
-
a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of; analyzing an input voice and outputting first feature vectors; determining whether the voice is a first utterance, setting, based on second feature vectors obtained by analyzing utterance data items of a plurality of speakers, a mean vector of all the second feature vectors of all the speakers as a correction vector if the voice is the first utterance, setting a mean vector of the first feature vectors until a preceding utterance as the correction vector if the voice is not the first utterance, and outputting corrected vectors obtained by subtracting the correction vector from the first feature vectors; and comparing the corrected vectors with an acoustic model trained using a training data item of each speaker generated by subtracting, for each speaker, a mean vector of all the second feature vectors of the speaker from the second feature vectors of the speaker, and a training data item of all the speakers generated by subtracting a mean vector of all the second feature vectors of all the speakers from the second feature vectors of all the speakers, and outputting a recognition result of the voice. - View Dependent Claims (3)
-
-
4. An acoustic model training method of an acoustic model training device for training an acoustic model using feature vectors obtained by analyzing utterance data items of a plurality of speakers, the acoustic model training method comprising:
-
generating, based on the feature vectors, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from the feature vectors of the speaker; generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from the feature vectors of all the speakers; and training the acoustic model using the training data item of each speaker and the training data item of all the speakers.
-
-
5. A voice recognition method of a voice recognition device for performing voice recognition on an input voice, the voice recognition method comprising:
-
analyzing the input voice and outputting first feature vectors; determining whether the voice is a first utterance, setting, based on second feature vectors obtained by analyzing utterance data items of a plurality of speakers, a mean vector of all the second feature vectors of all the speakers as a correction vector if the voice is the first utterance, setting a mean vector of the first feature vectors until a preceding utterance as the correction vector if the voice is not the first utterance, and outputting corrected vectors obtained by subtracting the correction vector from the first feature vectors; and comparing the corrected vectors with an acoustic model trained using a training data item of each speaker generated by subtracting, for each speaker, a mean vector of all the second feature vectors of the speaker from the second feature vectors of the speaker, and a training data item of all the speakers generated by subtracting a mean vector of all the second feature vectors of all the speakers from the second feature vectors of all the speakers, and outputting a recognition result of the voice.
-
Specification