SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS
First Claim
1. A method comprising:
- receiving, from a client, text associated with a request for text-to-speech synthesis;
identifying a plurality of intonational phrases in the text;
generating a file containing text-to-speech data for a first intonational phrase of the plurality of intonational phrases, wherein the first intonational phrase is indexed by a unique identifier;
transmitting the file to the client in response to the request; and
generating files containing additional text-to-speech data for remaining intonational phrases of the plurality of intonational phrases, wherein each of the files is indexed by the unique identifier plus a respective offset.
8 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving, from a client, text associated with a request for text-to-speech synthesis; identifying a plurality of intonational phrases in the text; generating a file containing text-to-speech data for a first intonational phrase of the plurality of intonational phrases, wherein the first intonational phrase is indexed by a unique identifier; transmitting the file to the client in response to the request; and generating files containing additional text-to-speech data for remaining intonational phrases of the plurality of intonational phrases, wherein each of the files is indexed by the unique identifier plus a respective offset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
a processor; a network interface; a memory having stored therein instructions for controlling the processor to perform steps comprising; receiving text from a user; transmitting the text to a server as part of a request for text-to-speech synthesis; receiving, from the server, a file containing a first intonational phrase indexed by a unique identifier; playing the first intonational phrase; and fetching an additional file for playback, wherein the additional file contains an additional intonational phrase indexed by the unique identifier plus an offset. - View Dependent Claims (15, 16, 17, 18)
-
-
19. A non-transitory computer-readable storage medium having stored therein instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
-
receiving, from a client, text associated with a request for text-to-speech synthesis; identifying a plurality of intonational phrases in the text; generating a file containing text-to-speech data for a first intonational phrase of the plurality of intonational phrases, wherein the first intonational phrase is indexed by a unique identifier; transmitting the file to the client in response to the request; and generating files containing additional text-to-speech data for remaining intonational phrases of the plurality of intonational phrases, wherein each of the files is indexed by the unique identifier plus a respective offset. - View Dependent Claims (20)
-
Specification