Method and system for enhancing a speech database

US 9,218,803 B2
Filed: 03/04/2015
Issued: 12/22/2015
Est. Priority Date: 08/31/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

receiving, on a device having a processor, text from a user for conversion to speech via a text-to-speech process;

identifying, via the processor, a primary speech segment in a primary speech database which does not meet a need of the text-to-speech process;

identifying, via the processor, a replacement speech segment which satisfies the need in a secondary speech database; and

adding replacement speech segment to the primary database such that the primary database meets the need of the text-to-speech process.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Citations

20 Claims

1. A method comprising:
- receiving, on a device having a processor, text from a user for conversion to speech via a text-to-speech process;
  
  identifying, via the processor, a primary speech segment in a primary speech database which does not meet a need of the text-to-speech process;
  
  identifying, via the processor, a replacement speech segment which satisfies the need in a secondary speech database; and
  
  adding replacement speech segment to the primary database such that the primary database meets the need of the text-to-speech process.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising generating, via the processor, speech using the replacement speech segment from the primary speech database.
  - 3. The method of claim 1, wherein the primary speech database has been further modified by identifying boundaries of the primary speech segment.
  - 4. The method of claim 3, wherein phone boundaries of the primary speech segment are identified using a zero-crossing calculation.
  - 5. The method of claim 1, wherein the need is based on one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 6. The method of claim 1, wherein the primary speech segment comprises one of diphones, triphones, and phonemes.
  - 7. The method of claim 1, wherein the primary speech database comprises first voice recordings in a first dialect, and the secondary speech database comprises second voice recordings in a second dialect, wherein the first dialect and the second dialect differ by one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 8. The method of claim 1, wherein the primary speech segment is identified based on one of obstruents and nasals.

9. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising;
  
  receiving, on the system, text from a user for conversion to speech via a text-to-speech process;
  
  identifying a primary speech segment in a primary speech database which does not meet a need of the text-to-speech process;
  
  identifying a replacement speech segment which satisfies the need in a secondary speech database; and
  
  adding replacement speech segment to the primary database such that the primary database meets the need of the text-to-speech process.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in operations comprising generating speech using the replacement speech segment from the primary speech database.
  - 11. The system of claim 9, wherein the primary speech database has been further modified by identifying boundaries of the primary speech segment.
  - 12. The system of claim 11, wherein phone boundaries of the primary speech segment are identified using a zero-crossing calculation.
  - 13. The system of claim 9, wherein the need is based on one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 14. The system of claim 9, wherein the primary speech segment comprises one of diphones, triphones, and phonemes.
  - 15. The system of claim 9, wherein the primary speech database comprises first voice recordings in a first dialect, and the secondary speech database comprises second voice recordings in a second dialect, wherein the first dialect and the second dialect differ by one of dialect differences, geographic language differences, regional language differences, accent differences, national language differences, idiosyncratic speech differences, and database coverage differences.
  - 16. The system of claim 9, wherein the primary speech segment is identified based on one of obstruents and nasals.

17. A computer-readable storage device having instructions stored which, when executed by the computing device, result in the computing device performing operations comprising:
- receiving, on the computing device, text from a user for conversion to speech via a text-to-speech process;
  
  identifying a primary speech segment in a primary speech database which does not meet a need of a text-to-speech process;
  
  identifying a replacement speech segment which satisfies the need in a secondary speech database; and
  
  adding replacement speech segment to the primary database such that the primary database meets the need of the text-to-speech process.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable storage device of claim 17, having additional instructions stored which, when executed by the computing device, result in operations comprising generating, via the computing device, speech using the replacement speech segment from the primary speech database.
  - 19. The computer-readable storage device of claim 17, wherein the primary speech database has been further modified by identifying boundaries of the primary speech segment.
  - 20. The computer-readable storage device of claim 19, wherein phone boundaries of the primary speech segment are identified using a zero-crossing calculation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Conkie, Alistair D., Syrdal, Ann K.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US14/638,038
Publication Number

US 20150179162A1
Time in Patent Office

293 Days
Field of Search

704/258, 704/260, 704/261, 704/267, 704/268, 704/266, 704/278, 704/270, 704/270.1, 704/275
US Class Current

1/1
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/02   Methods for producing synth...

G10L 13/06   Elementary speech units use...

G10L 13/08   Text analysis or generation...

Method and system for enhancing a speech database

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for enhancing a speech database

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links