Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

US 20060229876A1
Filed: 04/07/2005
Published: 10/12/2006
Est. Priority Date: 04/07/2005
Status: Active Grant

First Claim

Patent Images

1. A method to generate an audible speech word that corresponds to text, comprising:

providing a text word; and

in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

194 Citations

19 Claims

1. A method to generate an audible speech word that corresponds to text, comprising:
- providing a text word; and
  
  in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method as in claim 1, where each pre-recorded speech segment comprises an attribute vector, and each attribute vector comprises a vector element that identifies the speaker from which the speech segment was derived.
  - 3. A method as in claim 1, where each attribute vector further comprises another vector element that identifies a style of speech from which the speech segment was derived.
  - 4. A method as in claim 1, where said speech segments are pre-recorded by a process that comprises designating one speaker as a target speaker, examining an input speech segment to determine if it is similar to a corresponding speech segment of the target speaker and, if it is not, modifying at least one characteristic of the input speech segment so as to make it more similar to the corresponding speech segment of the target speaker.
  - 5. A method as in claim 4, where modifying comprises altering at least one of a temporal or a spectral characteristic of the input speech segment.
  - 6. A method as in claim 1, where a speech segment comprises at least one of a phoneme, a syllable, and a word.

7. A concatenative text-to-speech system comprising:
- a data processor coupled to a memory that stores a database of speech segments derived from a plurality of speakers, said data processor being responsive to an input text word to selectively concatenate together speech segments from said database based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. A concatenative text-to-speech system as in claim 7, where each pre-recorded speech segment comprises an attribute vector, and each attribute vector comprises a vector element that identifies the speaker from which the speech segment was derived.
  - 9. A concatenative text-to-speech system as in claim 7, where each attribute vector further comprises another vector element that identifies a style of speech from which the speech segment was derived.
  - 10. A concatenative text-to-speech system as in claim 7, where said speech segments are pre-recorded by a process that comprises designating one speaker as a target speaker, examining an input speech segment to determine if it is similar to a corresponding speech segment of the target speaker and, if it is not, modifying at least one characteristic of the input speech segment so as to make it more similar to the corresponding speech segment of the target speaker.
  - 11. A concatenative text-to-speech system as in claim 10, where said system modifies a speech segment by using at least one of a temporal or a spectral characteristic of the input speech segment.
  - 12. A concatenative text-to-speech system as in claim 7, where a speech segment comprises at least one of a phoneme, a syllable, and a word.

13. A data structure embodied in a computer readable medium for use in a concatenative text-to-speech system, comprising a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
- View Dependent Claims (14, 15, 16, 17)
- - 14. A data structure as in claim 13, where each attribute vector is further comprised of a style element.
  - 15. A data structure as in claim 13, where at least some speech segments are derived from a speaker by sampling, digitizing and partitioning spoken words into word units.
  - 16. A data structure as in claim 15, where a word unit comprises at least one of a phoneme, a syllable, and a word.
  - 17. A data structure as in claim 15, where at least some speech segments are derived from a speaker by sampling, digitizing, processing the digitized speech samples, and partitioning the processed speech samples into word units.

18. A program storage device readable by machine and tangibly embodying a program of instructions executable by the machine to operate a concatenative text-to-speech apparatus, comprising operations of:
- responsive to a text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function; and
  
  forming audio data for generating an audible speech word that corresponds to the text word, where each pre-recorded speech segment comprises an attribute vector, and each attribute vector comprises a vector element that identifies the speaker from which the speech segment was derived.
- View Dependent Claims (19)
- - 19. A program storage device as in claim 18, where said speech segments are pre-recorded by operations that comprise designating one speaker as a target speaker, examining an input speech segment to determine if it is similar to a corresponding speech segment of the target speaker and, if it is not, modifying at least one temporal or spectral characteristic of the input speech segment so as to make it more similar to the corresponding speech segment of the target speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Inc., Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Eide, Ellen M., Aaron, Andrew S., Shuang, Zhi Wei, Picheny, Michael A., Hamza, Wael M., Smith, Maria E., Rutherfoord, Charles T.

Granted Patent

US 7,716,052 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/263
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 2021/0135 Voice conversion or morphing

Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

194 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

194 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links