Cost efficient distributed text-to-speech processing

US 9,311,912 B1
Filed: 07/22/2013
Issued: 04/12/2016
Est. Priority Date: 07/22/2013
Status: Active Grant

First Claim

Patent Images

1. A method for performing text-to-speech (TTS) processing, comprising:

receiving, at a server, a TTS request for TTS processing of text data into speech, wherein the TTS request is sent by a local device remote from the server and includes text data originating from the local device;

receiving a user preference for TTS processing performance factors, the TTS processing performance factors including at least one of a cost of TTS processing, a quality of TTS processing or a length of time until delivery of TTS results;

determining a plurality of processing options for completion of the TTS request based at least in part on the user preference, wherein the plurality of processing options vary over at least one of cost, quality and delivery time;

providing the plurality of processing options to the local device;

receiving a user selection of a processing option from the plurality of processing options;

scheduling TTS resources for processing the TTS request based at least in part on the user selection;

synthesizing the text data into speech based at least in part on the TTS resources; and

providing audio data to the local device, the audio data including the synthesized speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Text-to-speech (TTS) processing systems may be divided among remote TTS servers which are accessible through a network connection to local user devices. The costs for performing processing on these servers may vary according to time. To improve efficiency of TTS processing certain requests may be scheduled during low cost server times. A user may indicate a preference for such low cost delivery. A user may also indicate a preference for quick turnaround time, permitting scheduling of TTS processing during higher cost server times. A TTS processing system may also consider quality of TTS results when scheduling server processing time for a particular TTS request and may allocate more server time when higher quality results are desired.

213 Citations

20 Claims

1. A method for performing text-to-speech (TTS) processing, comprising:
- receiving, at a server, a TTS request for TTS processing of text data into speech, wherein the TTS request is sent by a local device remote from the server and includes text data originating from the local device;
  
  receiving a user preference for TTS processing performance factors, the TTS processing performance factors including at least one of a cost of TTS processing, a quality of TTS processing or a length of time until delivery of TTS results;
  
  determining a plurality of processing options for completion of the TTS request based at least in part on the user preference, wherein the plurality of processing options vary over at least one of cost, quality and delivery time;
  
  providing the plurality of processing options to the local device;
  
  receiving a user selection of a processing option from the plurality of processing options;
  
  scheduling TTS resources for processing the TTS request based at least in part on the user selection;
  
  synthesizing the text data into speech based at least in part on the TTS resources; and
  
  providing audio data to the local device, the audio data including the synthesized speech.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the plurality of processing options are based upon a minimum cost to perform TTS processing within one or more delivery times of speech resulting from the TTS processing.
  - 3. The method of claim 1, further comprising dividing the TTS request into sections for parallel processing.
  - 4. The method of claim 1, wherein the user preference for TTS processing performance factors comprises a maximum cost for completion of the TTS request within a certain time period.

5. A system comprising:
- at least one processor;
  
  a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor;
  
  to receive a TTS request for TTS processing of text data into speech, wherein the TTS request is sent by a local device remote from the system and includes text data originating from the local device;
  
  to estimate delivery conditions for completion of the TTS request, wherein the delivery conditions include an estimated cost;
  
  to receive a user preference for TTS processing based on the estimated delivery conditions;
  
  to schedule TTS resources for processing the TTS request based on the user preference; and
  
  to synthesize the text data into speech based at least in part on the TTS resources.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The system of claim 5, wherein the user preference comprises at least one of cost of TTS processing, quality of TTS processing or length of time until delivery of TTS results.
  - 7. The system of claim 5, wherein the delivery conditions are estimated based upon a minimum cost to perform TTS processing within one or more delivery times of speech resulting from the TTS processing.
  - 8. The system of claim 5, wherein the at least one processor is further configured to divide the TTS request into sections for parallel processing.
  - 9. The system of claim 8, wherein the sections comprise one or more of a logical sentence, sentence or paragraph.
  - 10. The system of claim 8, wherein the at least one processor is further configured to schedule a plurality of TTS processing devices to process at least two sections at different times based at least in part on a cost for TTS processing time by a TTS processing device.
  - 11. The system of claim 5, wherein the delivery conditions are estimated based on at least one of a cost of TTS processing, a quality of speech resulting from the TTS processing, a delivery time of speech resulting from the TTS processing, and a delivery location for speech resulting from the TTS processing.
  - 12. The system of claim 5, wherein the user preference further comprises a maximum price for completion of the TTS request within a certain time period.

13. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising:
- program code to receive a TTS request for TTS processing of text data into speech, wherein the TTS request is sent by a local device remote from the computing device and includes text data originating from the local device;
  
  program code to estimate delivery conditions for completion of the TTS request, wherein the delivery conditions include an estimated cost;
  
  program code to receive a user preference for TTS processing based on the estimated delivery conditions;
  
  program code to schedule TTS resources for processing the TTS request based on the user preference; and
  
  program code to synthesize the text data into speech based at least in part on the TTS resources.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The non-transitory computer-readable storage medium of claim 13, wherein the user preference comprises at least one of cost of TTS processing, quality of TTS processing or length of time until delivery of TTS results.
  - 15. The non-transitory computer-readable storage medium of claim 13, wherein the delivery conditions are estimated based upon a minimum cost to perform TTS processing within one or more delivery times of speech resulting from the TTS processing.
  - 16. The non-transitory computer-readable storage medium of claim 13, further comprising program code to divide the TTS request into sections for parallel processing.
  - 17. The non-transitory computer-readable storage medium of claim 16, wherein the sections comprise one or more of a logical sentence, sentence or paragraph.
  - 18. The non-transitory computer-readable storage medium of claim 16, further comprising program code to schedule a plurality of TTS processing devices to process at least two sections at different times based at least in part on a cost for TTS processing time by a TTS processing device.
  - 19. The non-transitory computer-readable storage medium of claim 13, wherein the delivery conditions are estimated based on at least one of a cost of TTS processing, a quality resulting from the TTS processing, a delivery time of speech resulting from the TTS processing, and delivery location for speech resulting from the TTS processing.
  - 20. The non-transitory computer-readable storage medium of claim 13, wherein the user preference further comprises a maximum price for completion of the TTS request within a certain time period.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Swietlinski, Krzysztof Franciszek, Kaszczuk, Michal Tadeusz
Primary Examiner(s)
Neway, Samuel G

Application Number

US13/947,354
Time in Patent Office

995 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 13/00 Speech synthesis; Text to s...

G10L 13/04 Details of speech synthesis...

Cost efficient distributed text-to-speech processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

213 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Cost efficient distributed text-to-speech processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

213 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links