SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS

US 20130144624A1
Filed: 12/01/2011
Published: 06/06/2013
Est. Priority Date: 12/01/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, from a client, text associated with a request for text-to-speech synthesis;

identifying a plurality of intonational phrases in the text;

generating a file containing text-to-speech data for a first intonational phrase of the plurality of intonational phrases, wherein the first intonational phrase is indexed by a unique identifier;

transmitting the file to the client in response to the request; and

generating files containing additional text-to-speech data for remaining intonational phrases of the plurality of intonational phrases, wherein each of the files is indexed by the unique identifier plus a respective offset.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.

Citations

20 Claims

1. A method comprising:
- receiving, from a client, text associated with a request for text-to-speech synthesis;
  
  identifying a plurality of intonational phrases in the text;
  
  generating a file containing text-to-speech data for a first intonational phrase of the plurality of intonational phrases, wherein the first intonational phrase is indexed by a unique identifier;
  
  transmitting the file to the client in response to the request; and
  
  generating files containing additional text-to-speech data for remaining intonational phrases of the plurality of intonational phrases, wherein each of the files is indexed by the unique identifier plus a respective offset.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein an intonational phrase is a phrase in which intonation within the phrase only depends on text inside the phrase.
  - 3. The method of claim 1, wherein the file is indexed by a unique identifier.
  - 4. The method of claim 1, wherein the file contains notification information.
  - 5. The method of claim 1, wherein the unique identifier comprises a text identifier and an offset index.
  - 6. The method of claim 1, wherein the additional file contains additional notification information.
  - 7. The method of claim 1, wherein generating the additional file occurs while the web browser plays the text-to-speech data in the file.
  - 8. The method of claim 1, wherein the file and the second file are stored in a cache.
  - 9. The method of claim 1, further comprising transmitting one of the files to the web browser in response to an additional request.
  - 10. The method of claim 1, wherein the notification information comprises synchronization data.
  - 11. The method of claim 1, wherein boundaries between intonational phrases comprise silence.
  - 12. The method of claim 1, further comprising:
    - receiving text-to-speech settings from the client; and
      
      generating the file and the files based on the text-to-speech settings.
  - 13. The method of claim 1, further comprising:
    - generating parallel versions of the file and the files using different text-to-speech voices.

14. A system comprising:
- a processor;
  
  a network interface;
  
  a memory having stored therein instructions for controlling the processor to perform steps comprising;
  
  receiving text from a user;
  
  transmitting the text to a server as part of a request for text-to-speech synthesis;
  
  receiving, from the server, a file containing a first intonational phrase indexed by a unique identifier;
  
  playing the first intonational phrase; and
  
  fetching an additional file for playback, wherein the additional file contains an additional intonational phrase indexed by the unique identifier plus an offset.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The system of claim 14, wherein the instructions are associated with a web browser.
  - 16. The system of claim 15, wherein playing the first intonational phrase does not rely on a browser plugin.
  - 17. The system of claim 14, fetching the additional file is based on client-side scripting.
  - 18. The system of claim 14, further comprising:
    - receiving user input navigating to a different position within the text;
      
      identifying a new offset for the different position; and
      
      fetching a corresponding file from the server for playback based on the unique identifier and the new offset.

19. A non-transitory computer-readable storage medium having stored therein instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
- receiving, from a client, text associated with a request for text-to-speech synthesis;
  
  identifying a plurality of intonational phrases in the text;
  
  generating a file containing text-to-speech data for a first intonational phrase of the plurality of intonational phrases, wherein the first intonational phrase is indexed by a unique identifier;
  
  transmitting the file to the client in response to the request; and
  
  generating files containing additional text-to-speech data for remaining intonational phrases of the plurality of intonational phrases, wherein each of the files is indexed by the unique identifier plus a respective offset.
- View Dependent Claims (20)
- - 20. The non-transitory computer-readable storage medium of claim 19, the instructions further causing the computing device to perform steps comprising:
    - generating parallel versions of the file and the files using different text-to-speech voices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
CONKIE, Alistair D., Mishra, Taniya, Charles Beutnagel, Mark

Granted Patent

US 9,240,180 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/10 Prosody rules derived from ...

SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links