INFORMATION PROCESSING APPARATUS, METHOD AND RECORDING MEDIUM FOR GENERATING ACOUSTIC MODEL

US 20100169093A1
Filed: 12/22/2009
Published: 07/01/2010
Est. Priority Date: 12/26/2008
Status: Active Grant

First Claim

Patent Images

1. An information apparatus that generates a first acoustic model for speech recognition, the information apparatus comprising:

a first speech dataset storing speech data uttered by low recognition rate speakers;

a second speech dataset storing speech data uttered by a plurality of speakers;

a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset;

a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset;

a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and

an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.

169 Citations

20 Claims

1. An information apparatus that generates a first acoustic model for speech recognition, the information apparatus comprising:
- a first speech dataset storing speech data uttered by low recognition rate speakers;
  
  a second speech dataset storing speech data uttered by a plurality of speakers;
  
  a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset;
  
  a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset;
  
  a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and
  
  an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.
- View Dependent Claims (5, 7, 9, 11, 12, 15, 16)
- - 5. The information apparatus according to claim 1, the information apparatus further comprising:
    - a mixed speech dataset generating part generating a fifth speech dataset by mixing speech data of the second speech dataset with speech data of the third speech dataset in accordance with a mixture rate,wherein the acoustic model generating part generates the first acoustic model using the fifth speech dataset.
  - 7. The information apparatus according to claim 5, the information apparatus further comprising a mixture rate deciding part adjusting a value of the mixture rate in accordance with recognition rates obtained by performing, using the first acoustic model, speech recognition on respective pieces of speech data in a sixth speech dataset and in the first speech dataset for evaluation.
  - 9. The information apparatus according to claim 5,wherein, when the first acoustic model is not improved beyond a threshold level even if the value of the mixture rate is adjusted or when a given period of time has elapsed since the start of generation of the first acoustic model, the mixture rate deciding part specifies the current first acoustic model as a final acoustic model.
  - 11. The information apparatus according to claim 5,wherein, when the first acoustic model is not improved beyond a threshold level even if the value of the mixture rate is adjusted or when a given period of time has elapsed since the start of generation of the first acoustic model, the mixture rate deciding part specifies the current first acoustic model as a final acoustic model.
  - 12. The information apparatus according to claim 7,wherein, when the first acoustic model is not improved beyond a threshold level even if the value of the mixture rate is adjusted or when a given period of time has elapsed since the start of generation of the first acoustic model, the mixture rate deciding part specifies the current first acoustic model as a final acoustic model.
  - 15. The information apparatus according to claim 7,wherein the mixture rate deciding part calculates the value of the mixture rate in accordance with the number of pieces of data in the third speech dataset.
  - 16. The information apparatus according to claim 9,wherein the mixture rate deciding part calculates the value of the mixture rate in accordance with the number of pieces of data in the third speech dataset.

2. An information apparatus that generates a first acoustic model for speech recognition, the information apparatus comprising:
- a first speech dataset storing speech data uttered by low recognition rate speakers;
  
  a second speech dataset storing speech data uttered by a plurality of speakers;
  
  a fourth speech dataset storing speech data serving as a candidate to be mixed with speech data of the second speech dataset;
  
  a third speech dataset storing speech data that is derived from the fourth speech dataset and mixed with speech data of the second speech dataset;
  
  a similarity calculating part obtaining, for each piece of speech data in the fourth speech dataset, a degree of similarity to a given average voice in the first speech dataset;
  
  a speech data selecting part recording the speech data, the degree of similarity of which is within a selection range, as selected speech data in the third speech dataset; and
  
  an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.
- View Dependent Claims (3, 4, 6, 8, 10, 13, 14, 17, 18)
- - 3. The information apparatus according to claim 2,wherein, for each piece of speech data in the fourth speech dataset, the degree of similarity calculating part obtains the degree of similarity as a first degree of similarity, and obtains a second degree of similarity to the given average voice in the second speech dataset, andwherein the speech data selecting part records the speech data, in which a difference between the first degree of similarity and the second degree of similarity is within a given selection range, as the selected speech data in the third speech dataset.
  - 4. The information apparatus according to claim 2, the information apparatus further comprising:
    - a model adapting part generating a third acoustic model of the low recognition rate speakers by performing a process for adapting a second acoustic model to the first speech dataset, the second acoustic model being generated using the second speech dataset,wherein the similarity calculating part further includes a speech recognition part performing speech recognition on respective pieces of speech data in the second speech dataset using the third acoustic model and the second acoustic model, thereby obtaining recognition scores of the respective pieces of speech data as the first degree of similarity and the second degree of similarity.
  - 6. The information apparatus according to claim 2, the information apparatus further comprising:
    - a mixed speech dataset generating part generating a fifth speech dataset by mixing speech data of the second speech dataset with speech data of the third speech dataset in accordance with a mixture rate,wherein the acoustic model generating part generates the first acoustic model using the fifth speech dataset.
  - 8. The information apparatus according to claim 6, the information apparatus further comprising a mixture rate deciding part adjusting a value of the mixture rate in accordance with recognition rates obtained by performing, using the first acoustic model, speech recognition on respective pieces of speech data in a sixth speech dataset and in the first speech dataset for evaluation.
  - 10. The information apparatus according to claim 6,wherein, when the first acoustic model is not improved beyond a threshold level even if the value of the mixture rate is adjusted or when a given period of time has elapsed since the start of generation of the first acoustic model, the mixture rate deciding part specifies the current first acoustic model as a final acoustic model.
  - 13. The information apparatus according to claim 3, the information apparatus further comprising:
    - a mixed speech dataset generating part generating a fifth speech dataset by mixing speech data of the second speech dataset with speech data of the third speech dataset in accordance with a mixture rate; and
      
      a mixture rate deciding part adjusting a value of the mixture rate in accordance with recognition rates obtained by performing, using the first acoustic model, speech recognition on respective pieces of speech data in a sixth speech dataset and in the first speech dataset for evaluation,wherein the mixture rate deciding part calculates the value of the mixture rate in accordance with the magnitude of an average value of a difference between the first degree of similarity and the second degree of similarity, andwherein the acoustic model generating part generates the first acoustic model using the fifth speech dataset.
  - 14. The information apparatus according to claim 4, the information apparatus further comprising:
    - a mixed speech dataset generating part generating a fifth speech dataset by mixing speech data of the second speech dataset with speech data of the third speech dataset in accordance with a mixture rate; and
      
      a mixture rate deciding part adjusting a value of the mixture rate in accordance with recognition rates obtained by performing, using the first acoustic model, speech recognition on respective pieces of speech data in a sixth speech dataset and in the first speech dataset for evaluation,wherein the mixture rate deciding part calculates the value of the mixture rate in accordance with the magnitude of an average value of a difference between the first degree of similarity and the second degree of similarity, andwherein the acoustic model generating part generates the first acoustic model using the fifth speech dataset.
  - 17. The information apparatus according to claim 13,wherein the mixture rate deciding part calculates the value of the mixture rate in accordance with the number of pieces of data in the third speech dataset.
  - 18. The information apparatus according to claim 4, the information apparatus further comprising:
    - a second acoustic model generated using the second speech dataset; and
      
      a model adapting part performing a process for adapting the second acoustic model to the first speech dataset for a frequency spectrum, and generating a third acoustic model of the low recognition rate speakers,wherein the similarity calculating part further includes a speech recognition part recognizing each piece of speech data in the fourth speech dataset using the third acoustic model, and obtains a resulting recognition score as the degree of similarity.

19. A computer-readable recording medium storing a computer program for causing a computer including a processor and a memory to function as an information apparatus, the computer program causing the computer to execute operations of:
- storing a first speech dataset storing speech data uttered by low recognition rate speakers, a second speech dataset storing speech data uttered by a plurality of speakers, and a third speech dataset storing speech data that is mixed with speech data of the second speech dataset, the first, second, and third speech datasets being stored in the memory by the processor;
  
  obtaining, for each piece of speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset, by the processor;
  
  recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset, by the processor; and
  
  generating an acoustic model by the processor using the speech data recorded in the second speech dataset and the third speech dataset.

20. A method for making a computer program cause a computer including a processor and a memory to function as an information apparatus, the method comprising:
- storing a first speech dataset storing speech data uttered by low recognition rate speakers, a second speech dataset storing speech data uttered by a plurality of speakers, and a third speech dataset storing speech data that is mixed with speech data of the second speech dataset, the first, second and third speech datasets being stored in the memory by the processor;
  
  obtaining, for each piece of speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset by the processor;
  
  recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset by the processor; and
  
  generating an acoustic model by the processor using the speech data recorded in the second speech dataset and the third speech dataset by the processor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Washio, Nobuyuki

Granted Patent

US 8,290,773 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G10L 15/063 Training

G10L 2015/0631 Creating reference template...

INFORMATION PROCESSING APPARATUS, METHOD AND RECORDING MEDIUM FOR GENERATING ACOUSTIC MODEL

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

169 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

INFORMATION PROCESSING APPARATUS, METHOD AND RECORDING MEDIUM FOR GENERATING ACOUSTIC MODEL

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

169 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others