Multi-unit approach to text-to-speech synthesis
First Claim
1. A method, including:
- matching phrase units of a received input string to audio segments from a plurality of audio segments including using properties of or between phrase units to locate matching audio segments from a plurality of selections;
parsing unmatched phrase units into word units;
matching the word units to audio segments using properties of or between words to locate matching audio segments from a plurality of selections; and
synthesizing the input string, including combining the audio segments associated with the phrase and word units.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, apparatus, systems, and computer program products are provided for synthesizing speech. One method includes matching a first level of units of a received input string to audio segments from a plurality of audio segments including using properties of or between first level units to locate matching audio segments from a plurality of selections, parsing unmatched first level units into second level units, matching the second level units to audio segments using properties of or between the units to locate matching audio segments from a plurality of selections and synthesizing the input string, including combining the audio segments associated with the first and second units.
218 Citations
33 Claims
-
1. A method, including:
-
matching phrase units of a received input string to audio segments from a plurality of audio segments including using properties of or between phrase units to locate matching audio segments from a plurality of selections; parsing unmatched phrase units into word units; matching the word units to audio segments using properties of or between words to locate matching audio segments from a plurality of selections; and synthesizing the input string, including combining the audio segments associated with the phrase and word units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method, including:
-
receiving a stream of textual input; matching portions of the input textual stream to audio segments derived from one or more voice samples at multiple levels; and synthesizing matching audio segments into speech output. - View Dependent Claims (17)
-
-
18. A computer program product including instructions tangibly stored on a computer-readable medium, the product including instructions for causing a computing device to:
-
match phrase units of an input string to audio segments from a plurality of audio segments; parse unmatched phrase units into word units; match the word units to audio segments; and synthesize the input string, including combining the audio segments associated with the phrase and word units.
-
-
19. A system, including:
-
an input capture routine to receive an input string that includes phrase units; a unit matching engine, in communication with the input capture routine, to match the phrase units to audio segments from a plurality of audio segments including using properties of or between audio segments for matching phrase units; a parsing engine, in communication with the unit matching engine, to parse unmatched phrase units into word units, the unit matching engine configured to match the word units to audio segments including using properties of or between the audio segments for matching word units; a synthesis block, in communication with the unit matching engine, to synthesize the input string, including combining the audio segments associated with the phrase and word units; and a storage unit to store audio segments and properties of or between the audio segments.
-
-
20. A method including
providing a library of audio segments and associated metadata defining properties of or between a given segment and another segment, the library including one or more levels of units in accordance with a hierarchy; -
matching, at a first level of the hierarchy, units of a received input string to audio segments, the received input string having one or more units at a first level; parsing unmatched units to units at a second level in the hierarchy; matching one or more units at the second level of the hierarchy to audio segments; and synthesizing the input string including combining the audio segments associated with the first and second levels.
-
-
21. A method including
receiving audio segments; -
parsing the audio segments into units of a first level in a hierarchy of levels; defining properties of or between units; storing the units and the properties; parsing the units into sub-units; defining properties of or between the sub-units; and storing the sub-units and properties. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A method including
receiving audio segments; -
parsing the audio segments into units of a first level in a hierarchy of levels; defining properties of or between units; storing the units and the properties; parsing the units into units of a next level in the hierarchy of levels; defining properties of or between units in the next level; storing the units and properties; and continuing to parse units at a given level into units at a next level in the hierarchy until a final parsing is performed; at each level, defining properties of or between units and storing the units and the properties; and at a final level in the hierarchy storing units. - View Dependent Claims (33)
-
Specification