SYSTEM AND METHOD FOR CLOUD-BASED TEXT-TO-SPEECH WEB SERVICES
First Claim
1. A method of generating speech, the method comprising:
- receiving, at a network-based automatic speech processing system, a request, from a network client independent of knowledge of internal operations of the network-based automatic speech processing system, to generate a text-to-speech voice, the request comprising speech samples, transcriptions of the speech samples, and metadata describing the speech samples;
extracting sound units from the speech samples based on the transcriptions;
generating a demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the demonstration hides a back end processing implementation from the network client; and
providing access to the demonstration to the network client.
8 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.
22 Citations
20 Claims
-
1. A method of generating speech, the method comprising:
-
receiving, at a network-based automatic speech processing system, a request, from a network client independent of knowledge of internal operations of the network-based automatic speech processing system, to generate a text-to-speech voice, the request comprising speech samples, transcriptions of the speech samples, and metadata describing the speech samples; extracting sound units from the speech samples based on the transcriptions; generating a demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the demonstration hides a back end processing implementation from the network client; and providing access to the demonstration to the network client. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for requesting a text-to-speech voice, the system comprising:
-
a processor; a first module configured to control the processor to transmit to a network-based automatic speech processing system a request to generate the text-to-speech voice, the request comprising speech samples, transcriptions of the speech samples, and metadata describing the speech samples; a second module configured to control the processor to receive a notification from the network-based automatic speech processing system that the text-to-speech voice is generated; and a third module configured to control the processor to test, via a network, the text-to-speech voice independent of knowledge of internal operations of the network-based automatic speech processing system. - View Dependent Claims (13, 14, 15)
-
-
16. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to generate speech, the instructions comprising:
-
receiving, at a network-based automatic speech processing system, a request, from a network client independent of knowledge of internal operations of the network-based automatic speech processing system, to generate a text-to-speech voice, the request comprising speech samples, transcriptions of the speech samples, and metadata describing the speech samples; extracting sound units from the speech samples based on the transcriptions; generating an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client; and providing access to the interactive demonstration to the network client. - View Dependent Claims (17, 18, 19, 20)
-
Specification