Distributed speech unit inventory for TTS systems
First Claim
Patent Images
1. A computing device for performing text-to-speech (TTS) processing, comprising:
- at least one processor;
a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor;
to access a local database of speech units to be used in unit selection speech synthesis, wherein the local database is comprised from a larger database of speech units;
to receive text data for TTS processing;
to determine desired speech units to synthesize the received text data;
to identify first desired speech units in the local database;
to determine the second desired speech units are not in the local database;
to determine that the second desired speech units are in the larger database located at a remote device;
to receive the second desired speech units;
to concatenate audio segments corresponding to the first desired speech units in the local database and audio segments corresponding to the second desired speech units; and
to output audio data comprising speech corresponding to the received text data.
2 Assignments
0 Petitions
Accused Products
Abstract
In a text-to-speech (TTS) system, a database including sample speech units for unit selection may be configured for use by a local device. The local unit database may be created from a more comprehensive unit database. The local unit database may include units which provide sufficient TTS results for frequently input text. Speech synthesis may then be performed by concatenating locally available units with units from a remote device including the comprehensive unit database. Aspects of the speech synthesis may be performed by the remote device and/or the local device.
13 Citations
25 Claims
-
1. A computing device for performing text-to-speech (TTS) processing, comprising:
-
at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor; to access a local database of speech units to be used in unit selection speech synthesis, wherein the local database is comprised from a larger database of speech units; to receive text data for TTS processing; to determine desired speech units to synthesize the received text data; to identify first desired speech units in the local database; to determine the second desired speech units are not in the local database; to determine that the second desired speech units are in the larger database located at a remote device; to receive the second desired speech units; to concatenate audio segments corresponding to the first desired speech units in the local database and audio segments corresponding to the second desired speech units; and to output audio data comprising speech corresponding to the received text data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
receiving text data for text-to-speech processing; determining first desired speech units and second desired speech units from the received text data; determining that a local database does not include the first desired speech units; receiving first audio segments corresponding to the first desired speech units from a remote database; receiving second audio segments corresponding to the second desired speech units from the local database; and creating audio corresponding to the received text data using the first audio segments and the second audio segments. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A computing device, comprising:
-
at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor; to receive text data for text-to-speech processing; to determine first desired speech units and second desired speech units from the received text data to determine that a local database does not include the first desired speech units; to identify the first desired speech units in a remote database for use in synthesizing the received text data; to identify the second desired speech units in the local database for use in synthesizing the received text data; to send first audio segments corresponding to the first desired speech units to a local device comprising the local database; and to send instructions to the local device to concatenate the first audio segments with second audio segments corresponding to the second desired speech units stored at the local device. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising:
-
program code to receive text data for text-to-speech processing; program code to determine first desired speech units and second desired speech units from the received text data; program code to determine that a local database does not include the first desired speech units; program code to identify the first desired speech units in a remote database for use in synthesizing the received text data; program code to identify the second desired speech units in the local database for use in synthesizing the received text data; program code to send first audio segments corresponding to the first desired speech units to a local device comprising the local database; and program code to send instructions to the local device to concatenate the first audio segments with second audio segments corresponding to the second desired speech units stored at the local device. - View Dependent Claims (21, 22, 23, 24, 25)
-
Specification