Corpus-based speech synthesis based on segment recombination
First Claim
Patent Images
1. A speech synthesis system for producing synthesized speech comprising:
- a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a segmental transcription database referencing segmental transcriptions associated with sequences of one or more segment designators and accessed by message designators, each message designator being associated with a fixed message;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of a sequence of segment designators corresponding to a segmental transcription generated responsive to a message designator input; and
a speech segment concatenator in communication with the large speech segment database for concatenating the sequence of speech segments selected by the speech segment selector to produce a speech signal output corresponding to the message designator input.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method generate synthesized speech through concatenation of speech segments that are derived from a large prosodically-rich corpus of speech segments including using an additional dictionary of speech segment identifier sequences.
-
Citations
74 Claims
-
1. A speech synthesis system for producing synthesized speech comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a segmental transcription database referencing segmental transcriptions associated with sequences of one or more segment designators and accessed by message designators, each message designator being associated with a fixed message;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of a sequence of segment designators corresponding to a segmental transcription generated responsive to a message designator input; and
a speech segment concatenator in communication with the large speech segment database for concatenating the sequence of speech segments selected by the speech segment selector to produce a speech signal output corresponding to the message designator input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A speech synthesis system for producing synthesized speech from input text and from input message designators, the system comprising:
-
first and second large speech segment databases referencing speech segments and accessed by segment designators, each speech segment designator being associated with a sequence of one or more speech segments;
a segmental transcription database referencing segmental transcriptions associated with sequences of one or more segment designators of the first large speech segment database and accessed-by message designators, each message designator being associated with a fixed message;
a text message database referencing text messages that correspond to orthographic representations of the segmental transcriptions referenced by the segmental transcription database;
a first speech segment selector for selecting a sequence of speech segments referenced by the first large speech segment database and representative of a sequence of segment designators corresponding to a segmental transcription generated responsive to a message designator input;
a text analyzer for converting an input text into a representative sequence of symbolic segment identifiers;
a second speech segment selector for selecting, based at least in part on prosodic and acoustic features, a sequence of speech segments from the second large speech segment database and representative of a sequence of symbolic identifiers generated responsive to a text input;
a message decoder for activating the first speech segment selector if a text input corresponds to a text message referenced by the text message database, or the second speech segment selector if a text input does not correspond to a message from the text message database; and
a speech segment concatenator in communication with the first and second large speech segment databases for concatenating the sequence of speech segments designated by a segmental transcription from the segmental transcription database to produce a speech signal output. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A system to create compound speech units from an input text comprising:
-
a speech segment database referencing speech waveform segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting a sequence of speech segments referenced by the speech segment database and representative of an input text; and
a speech segment sequence validator for validating the selected sequence of speech segments; and
a linguistic feature vector extractor for extracting linguistic feature vectors from the validated sequence of speech segments; and
a segment descriptor generator for linking an extracted linguistic feature vector to a speech waveform segment from the speech segment database. - View Dependent Claims (16, 17, 18)
-
-
19. A speech synthesis system for producing synthesized speech from input text comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a basic speech unit descriptor database including linguistic feature vectors descriptive of individual speech segments referenced by the speech segment database;
a compound speech unit database including linguistic feature vectors descriptive of speech segments referenced by the speech segment database, at least one speech segment from the speech segment database has two or more linguistic feature vectors as linguistic descriptors;
a speech segment selector for selecting, based on a reduced set of features and cost functions, a sequence of speech segments referenced by the speech segment database and representative of an input text; and
a speech segment concatenator, in communication with the speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text. - View Dependent Claims (20, 21, 22)
-
-
23. A method for training a corpus-based speech synthesizer comprising:
-
feeding at least one text corpus to the corpus-based speech synthesizer to produce synthesized speech; and
validating speech synthesis data based on at least one of listening experiments and automatic perceptual distance measures; and
augmenting a compound speech unit database with compound speech units derived from the validated speech synthesis data.
-
-
24. A method for minimizing the size of a speech segment database comprising:
-
determining acoustically redundant speech segment in the speech segment database; and
removing acoustically redundant speech segments that have the same linguistic feature vector replacing the acoustically redundant speech segments from a speech segment database and their descriptors by compound speech unit representations and their descriptors. - View Dependent Claims (25)
-
-
26. A speech synthesis system for producing more than one alternative of synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; and
a set of two or more speech segment selectors selecting two or more sequences of speech segments referenced by the large speech segment database and representative an input text; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating one of the selected sequence of speech segments to produce a speech signal output corresponding to the input text. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text, the selecting being based at least in part on introduction of stochastic variation on at least one of an individual cost function and a masking function associated to a cost; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text. - View Dependent Claims (36, 37, 38)
-
-
39. A self tuning speech segment selector for producing speech segment sequences from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text, the selecting being based at least in part on iterative searching, where at each iteration step at least one of unit selector weights and cost functions are adjusted.
-
-
40. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text, the selecting being based at least in part on iterative searching, where at each iteration step at least one of unit selector weights and cost functions are adjusted; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text. - View Dependent Claims (41, 42)
-
-
43. A speech synthesis system for producing synthesized speech from input text comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting among candidate sequences of speech segments referenced by the speech segment database and representative of an input text, the selecting being based on evaluating by a cost obtained through dynamic time warping of the spectral representation of the candidate sequences with the spectral representation of one or more recorded speech signals; and
a speech segment concatenator, in communication with the speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text.
-
-
44. A speech synthesis system for producing synthesized speech from input text comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting among candidate sequences of speech segments referenced by the speech segment database and representative of an input text, the selecting including use of a composition table containing pairs of segment designators to minimize adjacency feature mismatch effects; and
a speech segment concatenator, in communication with the speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text.
-
-
45. A speech synthesis system for producing synthesized speech from input text comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a user dictionary of compound speech units referenced by the speech segment database and accessed by phoneme sequences;
a speech segment selector for selecting among candidate sequences of speech segments referenced by the speech segment database and representative of an input text, the selecting including use of compound speech units from the user dictionary; and
a speech segment concatenator, in communication with the speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text. - View Dependent Claims (46)
-
-
47. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a carrier database containing carriers for a carrier and slot speech synthesis application, each carrier represented as a sequence of segment descriptors; and
a speech carrier selector for selecting the carrier from the carrier database;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of a slot argument in a carrier and slot speech synthesis message; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments with the carrier portion of a carrier and slot speech synthesis message to produce a speech signal output corresponding to the carrier and slot speech synthesis message.
-
-
48. A restricted domain speech synthesis system for producing synthesized speech from a restricted domain input comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments; and
a segment sequence database containing sequences of speech segment designators;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database from the segment sequence database; and
a speech segment concatenator, in communication with the large speech segment database and the segment sequence database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the restricted domain input. - View Dependent Claims (49)
-
-
50. A segment database construction system for corpus based speech synthesis comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a set of two or more speech segment selectors selecting two or more sequences of speech segments referenced by the large speech segment database and representative an input text;
a speech segment concatenator, in communication with the speech segment database, for concatenating one of the selected sequence of speech segments to produce a speech signal output corresponding to the input text; and
an automatic segment sequence validator that automatically selects between the outputs of the different speech segment selectors. - View Dependent Claims (51)
-
-
52. A segment database construction system for corpus based speech synthesis comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector using introduction of stochastic variation on at least one of an individual cost function and a masking function to select a sequence of speech segments; and
a speech segment concatenator, in communication with the speech segment database, for concatenating one of the selected sequence of speech segments to produce a speech signal output corresponding to the input text.
-
-
53. A segment database construction system for corpus based speech synthesis comprising:
-
a speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for generating an N-best list of speech segment sequences;
a speech segment concatenator, in communication with the speech segment database, for concatenating one of the selected sequence of speech segments to produce a speech signal output corresponding to a synthesis input; and
an automatic speech segment sequence validator that automatically selects a speech segment sequence from the N-best list. - View Dependent Claims (54, 55, 56, 57)
-
-
58. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text;
wherein compound speech units are used to increase the match between a grapheme-to-phoneme conversion of the input text and the segment designators.
-
-
59. A method for speech synthesis comprising:
-
using speech synthesis to create a sequence of segment designators referencing speech segments in a database that are representative of an input text;
validating the sequence of segment designators for synthesis quality; and
storing the sequence of validated segment designators for use by an application in synthesizing speech corresponding to the input text. - View Dependent Claims (60)
-
-
61. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text;
wherein the database includes at least one spectral segment that is linked to a plurality of one stored trajectories for at least one of pitch, energy, and rate so as to generate from the spectral segment more than one speech segment during synthesis. - View Dependent Claims (62, 63, 64, 65, 66)
-
-
67. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments, where at least one speech segment includes spectral parameters which are represented differentially with respect to at least one other speech segment having a full spectral representation;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text.
-
-
68. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments, where spectral representation of each speech segment uses variable frame rate compression;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text.
-
-
69. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments, where coding of the speech segments approximates the variation of the prosody parameters over time by piece-wise linear functions that are stored as breakpoint-slope pairs;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text.
-
-
70. A method for speech synthesis comprising:
-
exciting a time sequence of digital filters with a synthetic pulse, the synthetic pulse being applied at every pitch period in voiced speech;
calculating the time-domain pulse response of at least one of the filters;
weighting the time domain pulse response by a monotonically decaying function; and
truncating the pulse response length to a predetermined length. - View Dependent Claims (71, 72, 73)
-
-
74. A speech synthesis system for producing synthesized speech from input text comprising:
-
a large speech segment database referencing speech segments and accessed by segment designators, each segment designator being associated with a sequence of one or more speech segments;
a speech segment selector for selecting a sequence of speech segments referenced by the large speech segment database and representative of an input text; and
a speech segment concatenator, in communication with the large speech segment database, for concatenating the selected sequence of speech segments to produce a speech signal output corresponding to the input text;
wherein voice characteristics of the speech signal output can be changed by applying different spectral warping functions on the spectrum of the selected speech segments depending on their segment designators or on segment designator classes to which they belong.
-
Specification