SYSTEM AND METHOD FOR HYBRID SPEECH SYNTHESIS
First Claim
1. A method for synthesizing a target voice, the method comprising:
- receiving symbolic input descriptive of an utterance to be synthesized;
selecting one or more portions of the utterance to be constructed from prototype speech units of a target voice corpus, the target voice corpus including speech units recorded from a human speaker, the target voice corpus configured to provide characteristics of the target voice;
applying adaptations to selected ones of the prototype speech units of the target voice corpus, to produce adapted units that are contextually appropriate for the utterance;
obtaining at least some speech units from a source other than the target voice corpus; and
concatenating at least the adapted speech units from the target voice corpus and the speech units from the source other than the target voice corpus to produce a speech waveform for the utterance.
3 Assignments
0 Petitions
Accused Products
Abstract
A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance.
240 Citations
58 Claims
-
1. A method for synthesizing a target voice, the method comprising:
-
receiving symbolic input descriptive of an utterance to be synthesized; selecting one or more portions of the utterance to be constructed from prototype speech units of a target voice corpus, the target voice corpus including speech units recorded from a human speaker, the target voice corpus configured to provide characteristics of the target voice; applying adaptations to selected ones of the prototype speech units of the target voice corpus, to produce adapted units that are contextually appropriate for the utterance; obtaining at least some speech units from a source other than the target voice corpus; and concatenating at least the adapted speech units from the target voice corpus and the speech units from the source other than the target voice corpus to produce a speech waveform for the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method for speech synthesis, the method comprising:
-
receiving symbolic input descriptive of an utterance to be synthesized; selecting one or more portions of the utterance to be constructed from prototype speech units of a speech corpus, the speech corpus including speech units recorded from a human speaker; applying Phone-and-Transition (P&
T) adaptations to selected ones of the prototype speech units of the speech corpus, to produce adapted speech units that are contextually appropriate for the utterance; andconcatenating at least the adapted speech units from the speech corpus to produce a speech waveform for the utterance. - View Dependent Claims (19)
-
-
20. A system for synthesizing a target voice, comprising:
-
a front end module configured to receive symbolic input descriptive of an utterance to be synthesized; a back end module configured to select one or more portions of the utterance to be constructed from prototype speech units of a target voice corpus, the target voice corpus including speech units recorded from a human speaker, the target voice corpus configured to provide characteristics of the target voice; a unit engine of the back end module configured to apply adaptations to selected ones of the prototype speech units of the target voice corpus, to produce adapted speech units that are contextually appropriate for the utterance; and a concatenation engine of the back end module configured to concatenate at least the adapted speech units from the target voice corpus and speech units from a source other than the target voice corpus, to produce a speech waveform for the utterance. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A system for speech synthesis comprising:
-
a front end module configured to receive symbolic input descriptive of an utterance to be synthesized; a back end module configured to select one or more portions of the utterance to be constructed from prototype speech units of a speech corpus, the speech corpus including speech units recorded from a human speaker; a unit engine of the back end module configured to apply Phone-and-Transition (P&
T) adaptations to selected ones of the prototype speech units of the speech corpus, to produce adapted speech units that are contextually appropriate for the utterance; anda concatenation engine of the back end module configure to concatenate at least the adapted speech units from the speech corpus to produce a speech waveform for the utterance. - View Dependent Claims (38)
-
-
39. A method for speech synthesis comprising:
-
receiving symbolic input descriptive of an utterance to be synthesized; selecting a portion of the utterance to be constructed from a speech unit of a speech corpus, the speech unit recorded from a human speaker, the speech unit lacking transitions at one or both of the speech unit'"'"'s edges; synthesizing a transition for use at an edge of the speech unit using Rule-Based Speech Synthesis (RBSS) rules; and concatenating the speech unit with the synthesized transition in producing a speech waveform for the utterance. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47, 48)
-
-
49. A system for speech synthesis comprising:
-
a front end module configured to receive symbolic input descriptive of an utterance to be synthesized; a back end module configured to select a portion of the utterance to be constructed from a speech unit of a speech corpus, the speech unit recorded from a human speaker, the speech unit lacking transitions at one or both of the speech unit'"'"'s edges; a synthesis module configured to synthesize a transition for use at an edge of the speech unit by use of Rule-Based Speech Synthesis (RBSS) rules; and a concatenation engine of the back end module configured to concatenate the speech unit with the synthesized transition in production of a speech waveform for the utterance. - View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58)
-
Specification