Reduced latency text-to-speech system
First Claim
Patent Images
1. A method for reducing a time for delivery of initial results of text-to-speech (TTS) processing, comprising:
- receiving a TTS request including text for TTS processing, the text comprising a first portion of the text and a second portion of the text and wherein the first portion corresponds to a beginning of the text;
matching the first portion of text to text of a previously stored text sample;
retrieving speech unit identifiers associated with the previously stored text sample;
identifying speech units associated with the speech unit identifiers;
retrieving audio corresponding to the speech units;
generating first audio data by synthesizing first speech corresponding to the first portion of the text using the retrieved audio;
providing the first audio data;
generating second audio data by synthesizing second speech corresponding to the second portion of the text using a unit selection TTS technique; and
providing the second audio data.
1 Assignment
0 Petitions
Accused Products
Abstract
In delivering text-to-speech (TTS) results to a user, the time between the user request and delivery of initial TTS results is reduced using one or more of various techniques. Caching of TTS results may be reconfigured to cache unit indices rather than full speech synthesis results. More powerful computing resources may be dedicated to early TTS processing. A user may be notified of TTS results prior to complete processing of a TTS request. Early TTS processing may be performed by a local device and then passed to a remote device.
12 Citations
21 Claims
-
1. A method for reducing a time for delivery of initial results of text-to-speech (TTS) processing, comprising:
-
receiving a TTS request including text for TTS processing, the text comprising a first portion of the text and a second portion of the text and wherein the first portion corresponds to a beginning of the text; matching the first portion of text to text of a previously stored text sample; retrieving speech unit identifiers associated with the previously stored text sample; identifying speech units associated with the speech unit identifiers; retrieving audio corresponding to the speech units; generating first audio data by synthesizing first speech corresponding to the first portion of the text using the retrieved audio; providing the first audio data; generating second audio data by synthesizing second speech corresponding to the second portion of the text using a unit selection TTS technique; and providing the second audio data. - View Dependent Claims (2, 3)
-
-
4. A method comprising:
-
receiving a text-to-speech (TTS) request including text for TTS processing; identifying a stored text sample corresponding to a first portion of the text; retrieving speech unit identifiers associated with the stored text sample; identifying speech units associated with the speech unit identifiers; retrieving audio corresponding to the speech units; synthesizing a first portion of speech based at least in part on the retrieved audio; and synthesizing a second portion of speech based at least in part on the text. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor; to receive a text-to-speech (TTS) request including text for TTS processing; to identify a stored text sample corresponding to a first portion of the text; to retrieve speech unit identifiers associated with the stored text sample; to identify speech units associated with the speech unit identifiers; to retrieve audio corresponding to the speech units; to synthesize a first portion of speech based at least in part on the retrieved audio; and to synthesize a second portion of speech based at least in part on the text. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
Specification