Acoustic model training device, acoustic model training method, voice recognition device, and voice recognition method

US 10,418,030 B2
Filed: 05/20/2016
Issued: 09/17/2019
Est. Priority Date: 05/20/2016
Status: Active Grant

First Claim

Patent Images

1. An acoustic model training device comprising:

a processor to execute a program; and

a memory to store the program which, when executed by the processor, performs processes of;

generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker;

generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and

training an acoustic model using the training data item of each speaker and the training data item of all the speakers.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An acoustic model training device includes: a processor to execute a program; and a memory to store the program which, when executed by the processor, performs processes of: generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker; generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and training an acoustic model using the training data item of each speaker and the training data item of all the speakers.

Citations

5 Claims

1. An acoustic model training device comprising:
- a processor to execute a program; and
  
  a memory to store the program which, when executed by the processor, performs processes of;
  
  generating, based on feature vectors obtained by analyzing utterance data items of a plurality of speakers, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from each of the feature vectors of the speaker;
  
  generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from each of the feature vectors of all the speakers; and
  
  training an acoustic model using the training data item of each speaker and the training data item of all the speakers.

2. A voice recognition device comprising:
- a processor to execute a program; and
  
  a memory to store the program which, when executed by the processor, performs processes of;
  
  analyzing an input voice and outputting first feature vectors;
  
  determining whether the voice is a first utterance, setting, based on second feature vectors obtained by analyzing utterance data items of a plurality of speakers, a mean vector of all the second feature vectors of all the speakers as a correction vector if the voice is the first utterance, setting a mean vector of the first feature vectors until a preceding utterance as the correction vector if the voice is not the first utterance, and outputting corrected vectors obtained by subtracting the correction vector from the first feature vectors; and
  
  comparing the corrected vectors with an acoustic model trained using a training data item of each speaker generated by subtracting, for each speaker, a mean vector of all the second feature vectors of the speaker from the second feature vectors of the speaker, and a training data item of all the speakers generated by subtracting a mean vector of all the second feature vectors of all the speakers from the second feature vectors of all the speakers, and outputting a recognition result of the voice.
- View Dependent Claims (3)
- - 3. The voice recognition device of claim 2, wherein the program performs storing the correction vector, and if the voice is not the first utterance, weights and averages a mean vector of the first feature vectors until the preceding utterance that are temporarily stored and the correction vector used one utterance before, and setting the weighted average as the correction vector.

4. An acoustic model training method of an acoustic model training device for training an acoustic model using feature vectors obtained by analyzing utterance data items of a plurality of speakers, the acoustic model training method comprising:
- generating, based on the feature vectors, a training data item of each speaker by subtracting, for each speaker, a mean vector of all the feature vectors of the speaker from the feature vectors of the speaker;
  
  generating a training data item of all the speakers by subtracting a mean vector of all the feature vectors of all the speakers from the feature vectors of all the speakers; and
  
  training the acoustic model using the training data item of each speaker and the training data item of all the speakers.

5. A voice recognition method of a voice recognition device for performing voice recognition on an input voice, the voice recognition method comprising:
- analyzing the input voice and outputting first feature vectors;
  
  determining whether the voice is a first utterance, setting, based on second feature vectors obtained by analyzing utterance data items of a plurality of speakers, a mean vector of all the second feature vectors of all the speakers as a correction vector if the voice is the first utterance, setting a mean vector of the first feature vectors until a preceding utterance as the correction vector if the voice is not the first utterance, and outputting corrected vectors obtained by subtracting the correction vector from the first feature vectors; and
  
  comparing the corrected vectors with an acoustic model trained using a training data item of each speaker generated by subtracting, for each speaker, a mean vector of all the second feature vectors of the speaker from the second feature vectors of the speaker, and a training data item of all the speakers generated by subtracting a mean vector of all the second feature vectors of all the speakers from the second feature vectors of all the speakers, and outputting a recognition result of the voice.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Mitsubishi Electric Corporation
Original Assignee
Mitsubishi Electric Corporation
Inventors
Hanazawa, Toshiyuki
Primary Examiner(s)
Leland, III, Edwin S

Application Number

US16/086,738
Publication Number

US 20190096392A1
Time in Patent Office

1,215 Days
Field of Search

704243
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/22   Procedures used during a sp...

Acoustic model training device, acoustic model training method, voice recognition device, and voice recognition method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Acoustic model training device, acoustic model training method, voice recognition device, and voice recognition method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links