Method and system for enhancing a speech database

US 8,510,113 B1
Filed: 08/31/2006
Issued: 08/13/2013
Est. Priority Date: 08/31/2006
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying, as part of a text-to-speech process, a primary speech database associated with a single language;

identifying primary speech segments in the primary speech database which do not meet a need of the text-to-speech process, wherein the primary speech segments comprise at least one of half-phones, half-phonemes, demi-syllables, and polyphones;

identifying replacement speech segments which satisfy the need in a secondary speech database of the single language; and

enhancing the primary speech database by substituting, in the primary database, the primary speech segments with the replacement speech segments.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

85 Citations

View as Search Results

20 Claims

1. A method comprising:
- identifying, as part of a text-to-speech process, a primary speech database associated with a single language;
  
  identifying primary speech segments in the primary speech database which do not meet a need of the text-to-speech process, wherein the primary speech segments comprise at least one of half-phones, half-phonemes, demi-syllables, and polyphones;
  
  identifying replacement speech segments which satisfy the need in a secondary speech database of the single language; and
  
  enhancing the primary speech database by substituting, in the primary database, the primary speech segments with the replacement speech segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the need is based on at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 3. The method of claim 1, wherein the primary speech segments are one of syllables, diphones, triphones, and phonemes.
  - 4. The method of claim 1, further comprising:
    - identifying boundaries of the primary speech segments.
  - 5. The method of claim 1, wherein post-enhancement, the primary speech database comprises the replacement speech segments and the identified primary segments.
  - 6. The method of claim 1, wherein the primary speech database comprises voice recordings in a first dialect, and the secondary speech database comprises voice recordings in a second dialect, wherein the first dialect and the second dialect differ by at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 7. The method of claim 1, wherein the primary speech segments are identified based on at least one of obstruents and nasals.

8. A non-transitory computer-readable storage medium having stored therein instructions which, when executed by a processor, cause the processor to perform a method comprising:
- identifying, as part of a text-to-speech process, a primary speech database associated with a single language;
  
  identifying primary speech segments in the primary speech database which do not meet a need of the text-to-speech process, wherein the primary speech segments comprise at least one of half-phones, half-phonemes, demi-syllables, and polyphones;
  
  identifying replacement speech segments which satisfy the need in a secondary speech database of the single language; and
  
  enhancing the primary speech database by substituting, in the primary database, the primary speech segments with the replacement speech segments.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory computer-readable storage medium of claim 8, wherein the need is based on at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 10. The non-transitory computer-readable storage medium of claim 8, wherein the primary speech segments are one of syllables, diphones, triphones, and phonemes.
  - 11. The non-transitory computer-readable storage medium of claim 8, the non-transitory computer-readable storage medium storing additional instructions which result in the method further comprising:
    - identifying boundaries of the primary speech segments.
  - 12. The non-transitory computer-readable storage medium of claim 8, wherein post-enhancement, the primary speech database comprises the replacement speech segments and the primary speech segments.
  - 13. The non-transitory computer-readable storage medium of claim 8, wherein the primary speech database comprises voice recordings in a first dialect, and the secondary speech database comprises voice recordings in a second dialect, wherein the first dialect and the second dialect differ by at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 14. The non-transitory computer-readable storage medium of claim 8, wherein the primary speech segments are identified based on at least one of obstruents and nasals.

15. A system comprising:
- a processor; and
  
  a computer-readable medium having stored therein instructions which, when executed by the processor, cause the processor to perform a method comprising;
  
  identifying, as part of a text-to-speech process, a primary speech database associated with a single language;
  
  identifying primary speech segments in the primary speech database which do not meet a need of the text-to-speech process, wherein the primary speech segments comprise at least one of half-phones, half-phonemes, demi-syllables, and polyphones;
  
  identifying replacement speech segments which satisfy the need in a secondary speech database of the single language; and
  
  enhancing the primary speech database by substituting, in the primary database, the primary speech segments with the replacement speech segments.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, wherein the primary speech segments are identified based on at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 17. The system of claim 15, wherein the primary speech segments are one of syllables, diphones, triphones, and phonemes.
  - 18. The system of claim 15, the computer-readable medium storing additional instructions which result in the method further comprising identifying boundaries of the identified primary speech segments.
  - 19. The system of claim 15, the computer-readable medium storing additional instructions which result in the method further comprising storing the primary speech database, post enhancement, for use in future unit selection concatenative speech synthesis.
  - 20. The system of claim 15, wherein the primary speech database comprises voice recordings in a first dialect, and the secondary speech database comprises voice recordings in a second dialect, wherein the first dialect and the second dialect differ by at least one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Conkie, Alistair, Syrdal, Ann K.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US11/469,134
Time in Patent Office

2,539 Days
Field of Search

704/258, 704/260, 704/267, 704/268, 704/266, 704/278, 704/261, 704/270, 704/270.1, 704/275
US Class Current

704/258
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/02   Methods for producing synth...

G10L 13/06   Elementary speech units use...

G10L 13/08   Text analysis or generation...

Method and system for enhancing a speech database

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

85 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for enhancing a speech database

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

85 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links