Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora

US 8,155,963 B2
Filed: 01/17/2006
Issued: 04/10/2012
Est. Priority Date: 01/17/2006
Status: Active Grant

First Claim

Patent Images

1. A method of generating a script to be read by a speaker to produce a speech corpus for concatenative text-to-speech using a text database storing at least a dictionary of words and a pronunciation guide indicating pronunciations of at least some of the words in the dictionary, the method comprising:

obtaining a list of phonemes to be uttered when the script is read by a speaker;

automatically selecting a first plurality of words from the dictionary based on the pronunciation guide such that the plurality of words, when uttered by the speaker, produces at least the phonemes in the list of phonemes;

obtaining at least one template defining structural properties of at least one grammar; and

generating a cohesive script based, at least in part, on the at least one template and the first plurality of words, wherein the cohesive script comprises multiple sentences, and wherein at least two of the multiple sentences have conceptual coherence when considered together.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method (and system) which autonomously generates a cohesive script from a text database for creating a speech corpus for concatenative text-to-speech, and more particularly, which generates cohesive scripts having fluency and natural prosody that can be used to generate compact text-to-speech recordings that cover a plurality of phonetic events.

Citations

21 Claims

1. A method of generating a script to be read by a speaker to produce a speech corpus for concatenative text-to-speech using a text database storing at least a dictionary of words and a pronunciation guide indicating pronunciations of at least some of the words in the dictionary, the method comprising:
- obtaining a list of phonemes to be uttered when the script is read by a speaker;
  
  automatically selecting a first plurality of words from the dictionary based on the pronunciation guide such that the plurality of words, when uttered by the speaker, produces at least the phonemes in the list of phonemes;
  
  obtaining at least one template defining structural properties of at least one grammar; and
  
  generating a cohesive script based, at least in part, on the at least one template and the first plurality of words, wherein the cohesive script comprises multiple sentences, and wherein at least two of the multiple sentences have conceptual coherence when considered together.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method according to claim 1, wherein the list of phonemes includes at least one phoneme sequence comprising a plurality of phonemes in a prescribed order, and wherein automatically selecting a first plurality of words comprises selecting at least one word that, when uttered by the speaker, produces the at least one phoneme sequence.
  - 3. The method according to claim 2, wherein the at least one phoneme sequence comprises a diphone, a triphone, a quadphone, a syllable, and/or a bisyllable.
  - 4. The method according to claim 1, wherein the list of phonemes includes a plurality of phoneme sequences and wherein obtaining the list of phonemes includes obtaining the list of phonemes, at least in part, by analyzing the text database.
  - 5. The method according to claim 4, wherein the plurality of phoneme sequences comprise a plurality of diphones, a plurality of triphones, a plurality of quadphones, a plurality of syllables, and/or a plurality of bisyllables.
  - 6. The method according to claim 1, wherein the text database comprises a vocabulary list, an unstructured vocabulary list, an inventory of occurrences of at least one phonemic unit, and/or an inventory of occurrences of at least one phonemic sequence.
  - 7. The method according to claim 1, wherein the at least one template comprises a character template, a concept template, a location template, a story line template, and/or a script template that each include structural properties that assist in forming the cohesive script.
  - 8. The method according to claim 4, further comprising generating the speech corpus by having the speaker utter the cohesive script.
  - 9. The method according to claim 4, further comprising controlling format mechanics of the cohesive script.
  - 10. The method according to claim 9, wherein said format mechanics comprise a script size, a sentence structure, and/or a target sentence length of the cohesive script.
  - 11. The method of claim 1, wherein all of the sentences in the coherent script have conceptual coherence.

12. At least one non-transitory machine-readable storage medium encoded with machine-readable instructions that, when executed by at least one processor, perform a method of generating a script to be read by a speaker to produce a speech corpus for concatenative text-to-speech using a text database storing at least a dictionary of words and a pronunciation guide indicating a pronunciation of each of the words in the dictionary, the method comprising:
- obtaining a list of phonemes to be uttered when the script is read by a speaker;
  
  automatically selecting a first plurality of words from the dictionary based on the pronunciation guide such that the plurality of words, when uttered by the speaker, produces at least the phonemes in the list of phonemes;
  
  obtaining at least one template defining structural properties of at least one grammar; and
  
  generating a cohesive script based, at least in part, on the at least one template and the first plurality of words, wherein the cohesive script comprises multiple sentences, and wherein at least two of the multiple sentences have conceptual coherence when considered together.
- View Dependent Claims (13)
- - 13. The at least one non-transitory machine-readable storage medium of claim 12, wherein all of the sentences in the coherent script have conceptual coherence.

14. A system for generating a script to be read by a speaker to produce a speech corpus for concatenative text-to-speech, the system comprising:
- a text database storing at least a dictionary of words and a pronunciation guide indicating a pronunciation of each of the words in the dictionary;
  
  at least one processor capable of accessing the text database, the at least one processor configured to implement;
  
  an extracting unit to obtain a list of phonemes to be uttered when the script is read by a speaker;
  
  a selecting unit to automatically select a first plurality of words from the dictionary based on the pronunciation guide such that the plurality of words, when uttered by the speaker, produces at least the phonemes in the list of phonemes; and
  
  an autonomous language generating unit to obtain at least one template defining structural properties of at least one grammar, and to automatically generate a cohesive script based, at least in part, on the at least one template and the first plurality of words, wherein the cohesive script comprises multiple sentences, and wherein at least two of the multiple sentences have conceptual coherence when considered together.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
- - 15. The system according to claim 14, wherein the list of phonemes includes a plurality of phoneme sequences each comprising a plurality of phonemes in a prescribed order, and wherein the first plurality of words, when uttered by the speaker, produces the plurality of phoneme sequences, and wherein the plurality of phoneme sequences together comprise a plurality of diphones, a plurality of triphones, a plurality of quadphones, a plurality of syllables defined in terms of phones, and/or a plurality of bisyllables.
  - 16. The system according to claim 14, wherein the at least one template comprises a character template, a concept template, a location template, a story line template, and/or a script template that each includes structural properties that assist in forming the cohesive script.
  - 17. The system according to claim 14, wherein the at least one processor is configured to implement a control unit that controls format mechanics of the cohesive script.
  - 18. The system according to claim 17, wherein said format mechanics comprise a script size, a sentence structure, and/or a target sentence length of the cohesive script generated by said autonomous language generating unit.
  - 19. The system according to claim 14, further comprising a recording unit capable of recording the speaker uttering the cohesive script to generate the speech corpus.
  - 20. The system according to claim 14, wherein the text database comprises a vocabulary list, an unstructured vocabulary list, an inventory of occurrences of at least one phonemic unit, and/or an inventory of occurrences of at least one phonemic sequence.
  - 21. The system of claim 14, wherein all of the sentences in the coherent script have conceptual coherence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Ferrucci, David Angelo, Pitrelli, John Ferdinand, Aaron, Andrew Stephen
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
ADESANYA, OLUJIMI A

Application Number

US11/332,292
Publication Number

US 20070168193A1
Time in Patent Office

2,275 Days
Field of Search

704/260
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links