Method and system for enhancing a speech database

US 8,510,112 B1
Filed: 08/31/2006
Issued: 08/13/2013
Est. Priority Date: 08/31/2006
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

labeling, via a processor, audio speech files in a primary speech database, to yield labeled audio speech files;

identifying segments in the labeled audio speech files that have varying pronunciations within a language, to yield identified segments, wherein the identified segments comprise at least one of phones, half-phones, half-phonemes, demi-syllables, and polyphones;

creating modified segments by modifying the identified segments in the primary speech database using selected mappings to an offline secondary speech database in the language of the primary speech database;

enhancing the primary speech database by substituting the modified segments for the identified segments in the primary speech database, to yield an enhanced primary speech database; and

storing the enhanced primary speech database for use in speech synthesis.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Citations

20 Claims

1. A method comprising:
- labeling, via a processor, audio speech files in a primary speech database, to yield labeled audio speech files;
  
  identifying segments in the labeled audio speech files that have varying pronunciations within a language, to yield identified segments, wherein the identified segments comprise at least one of phones, half-phones, half-phonemes, demi-syllables, and polyphones;
  
  creating modified segments by modifying the identified segments in the primary speech database using selected mappings to an offline secondary speech database in the language of the primary speech database;
  
  enhancing the primary speech database by substituting the modified segments for the identified segments in the primary speech database, to yield an enhanced primary speech database; and
  
  storing the enhanced primary speech database for use in speech synthesis.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the segments are identified as a result of at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 3. The method of claim 1, wherein the identified segments are one of syllables, diphones, triphones, and phonemes.
  - 4. The method of claim 1, further comprising:
    - identifying boundaries of the identified segments.
  - 5. The method of claim 1, wherein the enhanced primary speech database comprises the modified speech database segments and the identified segments from the primary speech database.
  - 6. The method of claim 1, further comprising:
    - converting the primary speech database to harmonic plus noise model parameters, the harmonic plus noise model parameters having a harmonic component and a noise component;
      
      modifying the noise component of the harmonic plus noise model parameters; and
      
      storing the modified harmonic plus noise model parameters in the enhanced primary speech database.
  - 7. The method of claim 6, wherein the noise components are represented by autoregression coefficients.

8. A non-transitory computer-readable storage medium having stored instructions which, when executed by a computing device, cause the computing device to perform a method comprising:
- labeling audio speech files in a primary speech database, to yield labeled audio speech files;
  
  identifying segments in the labeled audio speech files that have varying pronunciations within a same language, to yield identified segments, wherein the identified segments comprise at least one of phones, half-phones, half-phonemes, demi-syllables, and polyphones;
  
  creating modified segments by modifying the identified segments in the primary speech database using selected mappings to an offline secondary speech database in the language of the primary speech database;
  
  enhancing the primary speech database by substituting the modified segments for the identified segments in the primary speech database, to yield an enhanced primary speech database; and
  
  storing the enhanced primary speech database for use in speech synthesis.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory computer-readable storage medium of claim 8, wherein the identified segments are identified as a result of at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 10. The non-transitory computer-readable storage medium of claim 8, wherein the identified segments are one of syllables, diphones, triphones, and phonemes.
  - 11. The non-transitory computer-readable storage medium of claim 8, the non-transitory computer-readable storage medium having additional instructions stored which result in the method further comprising:
    - identifying boundaries of the identified segments.
  - 12. The non-transitory computer-readable storage medium of claim 8, wherein the enhanced primary speech database comprises the modified segments and the identified segments from the primary speech database.
  - 13. The non-transitory computer-readable storage medium of claim 8, the non-transitory computer-readable storage medium having additional instructions stored which result in the method further comprising:
    - converting the primary speech database to harmonic plus noise model parameters, the harmonic plus noise model parameters having a harmonic component and a noise component;
      
      modifying the noise component of the harmonic plus noise model parameters; and
      
      storing the modified harmonic plus noise model parameters in the enhanced primary speech database.
  - 14. The non-transitory computer-readable storage medium of claim 13, wherein the noise components are represented by autoregression coefficients.

15. A system that enhances a speech database for speech synthesis, comprising:
- a processor;
  
  a primary speech database in a language; and
  
  a computer-readable medium to store instructions which, when executed by the processor, perform a method comprising;
  
  labeling audio speech files in the primary speech database, to yield labeled audio speech files;
  
  identifying segments in the labeled audio speech files that have varying pronunciations within the language, to yield identified segments, wherein the identified segments comprise at least one of phones, half-phones, half-phonemes, demi-syllables, and polyphones;
  
  creating modified segments by modifying the identified segments in the primary speech database using selected mappings to an offline secondary speech database in the language of the primary speech database, to yield modified segments;
  
  enhancing the primary speech database by substituting the modified segments for the identified segments in the primary speech database, to yield an enhanced primary speech database; and
  
  storing the enhanced primary speech database for use in speech synthesis.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein the segments are identified as a result of at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 17. The system of claim 15, wherein the identified segments are one of syllables, diphones, triphones, and phonemes.
  - 18. The system of claim 15, the computer-readable storage medium having additional instructions stored which result in the method further comprising identifying boundaries of the identified segments.
  - 19. The system of claim 15, wherein the enhanced primary speech database comprises the modified speech database segments and the corresponding identified segments from the primary speech database.
  - 20. The system of claim 15, the computer-readable storage medium having additional instructions stored which result in the method further comprise converting the primary speech database to harmonic plus noise model parameters, the harmonic plus noise model parameters having a harmonic component and a noise component, modifies the noise component of the harmonic plus noise model parameters, and store the modified harmonic plus noise model parameters in the primary speech database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Conkie, Alistair, Syrdal, Ann
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US11/469,129
Time in Patent Office

2,539 Days
Field of Search

704/258, 704/260, 704/267, 704/268, 704/266, 704/278, 704/261, 704/270, 704/270.1, 704/275
US Class Current

704/258
CPC Class Codes

G10L 13/06 Elementary speech units use...

G10L 2021/0135 Voice conversion or morphing

Method and system for enhancing a speech database

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for enhancing a speech database

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links