Corpus-based speech synthesis based on segment recombination
First Claim
Patent Images
1. A speech synthesis system for producing synthesized speech comprising:
- a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a segmental transcription database referencing segmental transcriptions associated with sequences of one or more segment designators and accessed by message designators, each message designator being associated with a fixed message;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of a sequence of segment designators corresponding to a segmental transcription generated responsive to a message designator input; and
a speech segment concatenator in communication with the large speech segment database for concatenating the sequence of speech segments selected by the speech segment selector to produce a speech signal output corresponding to the message designator input.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method generate synthesized speech through concatenation of speech segments that are derived from a large prosodically-rich corpus of speech segments including using an additional dictionary of speech segment identifier sequences.
74 Citations
30 Claims
-
1. A speech synthesis system for producing synthesized speech comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; a segmental transcription database referencing segmental transcriptions associated with sequences of one or more segment designators and accessed by message designators, each message designator being associated with a fixed message; a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of a sequence of segment designators corresponding to a segmental transcription generated responsive to a message designator input; and a speech segment concatenator in communication with the large speech segment database for concatenating the sequence of speech segments selected by the speech segment selector to produce a speech signal output corresponding to the message designator input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A speech synthesis system for producing synthesized speech from input text and from input message designators, the system comprising:
-
first and second large speech segment databases referencing speech segments and accessed by segment designators, each speech segment designator being associated with a sequence of one or more speech segments; a segmental transcription database referencing segmental transcriptions associated with sequences of one or more segment designators of the first large speech segment database and accessed by message designators, each message designator being associated with a fixed message; a text message database referencing text messages that correspond to orthographic representations of the segmental transcriptions referenced by the segmental transcription database; a first speech segment selector for selecting a sequence of speech segments referenced by the first large speech segment database and representative of a sequence of segment designators corresponding to a segmental transcription generated responsive to a message designator input;
a text analyzer for converting an input text into a representative sequence of symbolic segment identifiers;a second speech segment selector for selecting, based at least in part on prosodic and acoustic features, a sequence of speech segments from the second large speech segment database and representative of a sequence of symbolic identifiers generated responsive to a text input;
a message decoder for activatingi. the first speech segment selector if a text input corresponds to a text message referenced by the text message database, or ii. the second speech segment selector if a text input does not correspond to a message from the text message database; and a speech segment concatenator in communication with the first and second large speech segment databases for concatenating the sequence of speech segments designated by a segmental transcription from the segmental transcription database to produce a speech signal output. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A system to create compound speech units from an input text comprising:
-
a speech segment database referencing speech waveform segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; a speech segment selector for selecting a sequence of speech segments referenced by the speech segment database and representative of an input text; and
a speech segment sequence validator for validating the selected sequence of speech segments; anda linguistic feature vector extractor for extracting linguistic feature vectors from the validated sequence of speech segments; and a segment descriptor generator for linking an extracted linguistic feature vector to a speech waveform segment from the speech segment database. - View Dependent Claims (16, 17, 18)
-
-
19. A speech synthesis system for producing synthesized speech from input text comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; a basic speech unit descriptor database including linguistic feature vectors descriptive of individual speech segments referenced by the speech segment database; a compound speech unit database including linguistic feature vectors descriptive of speech segments referenced by the speech segment database, at least one speech segment from the speech segment database has two or more linguistic feature vectors as linguistic descriptors; a speech segment selector for selecting, based on a reduced set of features and cost functions, a sequence of speech segments referenced by the speech segment database and representative of an input text; and a speech segment concatenator, in communication with the speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text. - View Dependent Claims (20, 21, 22)
-
-
23. A speech synthesis system for producing synthesized speech from input text comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; a speech segment selector for selecting among candidate sequences of speech segments referenced by the speech segment database and representative of an input text, the selecting including use of a composition table containing pairs of segment designators to minimize adjacency feature mismatch effects; and a speech segment concatenator, in communication with the speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text.
-
-
24. A speech synthesis system for producing synthesized speech from input text comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; a user dictionary of compound speech units referenced by the speech segment database and accessed by phoneme sequences; a speech segment selector for selecting among candidate sequences of speech segments referenced by the speech segment database and representative of an input text, the selecting including use of compound speech units from the user dictionary; and a speech segment concatenator, in communication with the speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text. - View Dependent Claims (25)
-
-
26. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; a carrier database containing carriers for a carrier and slot speech synthesis application, each carrier represented as a sequence of segment descriptors; and a speech carrier selector for selecting the carrier from the carrier database; a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of a slot argument in a carrier and slot speech synthesis message; and a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments with the carrier portion of a carrier and slot speech synthesis message to produce a speech signal output corresponding to the carrier and slot speech synthesis message.
-
-
27. A restricted domain speech synthesis system for producing synthesized speech from a restricted domain input comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; and a segment sequence database containing sequences of speech segment designators; a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database from the segment sequence database; and a speech segment concatenator, in communication with the large speech segment database and the segment sequence database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the restricted domain input. - View Dependent Claims (28)
-
-
29. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text; and a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text; wherein compound speech units are used to increase the match between a grapheme-to-phoneme conversion of the input text and the segment designators.
-
-
30. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments, where coding of the speech segments approximates the variation of the prosody parameters over time by piece-wise linear functions that are stored as breakpoint-slope pairs; a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text; and a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text.
-
Specification