Autonomous system and method for creating readable scripts for concatenative text-to-speech synthesis (TTS) corpora
First Claim
Patent Images
1. A method of generating a script to be read by a speaker to produce a speech corpus for concatenative text-to-speech using a text database storing at least a dictionary of words and a pronunciation guide indicating pronunciations of at least some of the words in the dictionary, the method comprising:
- obtaining a list of phonemes to be uttered when the script is read by a speaker;
automatically selecting a first plurality of words from the dictionary based on the pronunciation guide such that the plurality of words, when uttered by the speaker, produces at least the phonemes in the list of phonemes;
obtaining at least one template defining structural properties of at least one grammar; and
generating a cohesive script based, at least in part, on the at least one template and the first plurality of words, wherein the cohesive script comprises multiple sentences, and wherein at least two of the multiple sentences have conceptual coherence when considered together.
8 Assignments
0 Petitions
Accused Products
Abstract
A method (and system) which autonomously generates a cohesive script from a text database for creating a speech corpus for concatenative text-to-speech, and more particularly, which generates cohesive scripts having fluency and natural prosody that can be used to generate compact text-to-speech recordings that cover a plurality of phonetic events.
-
Citations
21 Claims
-
1. A method of generating a script to be read by a speaker to produce a speech corpus for concatenative text-to-speech using a text database storing at least a dictionary of words and a pronunciation guide indicating pronunciations of at least some of the words in the dictionary, the method comprising:
-
obtaining a list of phonemes to be uttered when the script is read by a speaker; automatically selecting a first plurality of words from the dictionary based on the pronunciation guide such that the plurality of words, when uttered by the speaker, produces at least the phonemes in the list of phonemes; obtaining at least one template defining structural properties of at least one grammar; and generating a cohesive script based, at least in part, on the at least one template and the first plurality of words, wherein the cohesive script comprises multiple sentences, and wherein at least two of the multiple sentences have conceptual coherence when considered together. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. At least one non-transitory machine-readable storage medium encoded with machine-readable instructions that, when executed by at least one processor, perform a method of generating a script to be read by a speaker to produce a speech corpus for concatenative text-to-speech using a text database storing at least a dictionary of words and a pronunciation guide indicating a pronunciation of each of the words in the dictionary, the method comprising:
-
obtaining a list of phonemes to be uttered when the script is read by a speaker; automatically selecting a first plurality of words from the dictionary based on the pronunciation guide such that the plurality of words, when uttered by the speaker, produces at least the phonemes in the list of phonemes; obtaining at least one template defining structural properties of at least one grammar; and generating a cohesive script based, at least in part, on the at least one template and the first plurality of words, wherein the cohesive script comprises multiple sentences, and wherein at least two of the multiple sentences have conceptual coherence when considered together. - View Dependent Claims (13)
-
-
14. A system for generating a script to be read by a speaker to produce a speech corpus for concatenative text-to-speech, the system comprising:
-
a text database storing at least a dictionary of words and a pronunciation guide indicating a pronunciation of each of the words in the dictionary; at least one processor capable of accessing the text database, the at least one processor configured to implement; an extracting unit to obtain a list of phonemes to be uttered when the script is read by a speaker; a selecting unit to automatically select a first plurality of words from the dictionary based on the pronunciation guide such that the plurality of words, when uttered by the speaker, produces at least the phonemes in the list of phonemes; and an autonomous language generating unit to obtain at least one template defining structural properties of at least one grammar, and to automatically generate a cohesive script based, at least in part, on the at least one template and the first plurality of words, wherein the cohesive script comprises multiple sentences, and wherein at least two of the multiple sentences have conceptual coherence when considered together. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
Specification