System and method for cloud-based text-to-speech web services

US 9,412,359 B2
Filed: 04/13/2015
Issued: 08/09/2016
Est. Priority Date: 11/30/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, at a network-based automatic speech processing system and from a network client not having access to information of internal operations of the network-based automatic speech processing system, a request to generate a text-to-speech voice, the request comprising a transcription;

extracting sound units from speech samples based on the transcription;

generating a demonstration of the text-to-speech voice based only on the sound units and the transcriptions, wherein the text-to-speech voice is language agnostic; and

providing access to the demonstration to the network client.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.

Citations

20 Claims

1. A method comprising:
- receiving, at a network-based automatic speech processing system and from a network client not having access to information of internal operations of the network-based automatic speech processing system, a request to generate a text-to-speech voice, the request comprising a transcription;
  
  extracting sound units from speech samples based on the transcription;
  
  generating a demonstration of the text-to-speech voice based only on the sound units and the transcriptions, wherein the text-to-speech voice is language agnostic; and
  
  providing access to the demonstration to the network client.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, the request further comprising the speech samples and metadata describing the speech samples.
  - 3. The method of claim 2, wherein the transcription is of the speech samples.
  - 4. The method of claim 1, further comprising:
    - receiving an additional request from the network client for the text-to-speech voice; and
      
      providing the text-to-speech voice to the network client.
  - 5. The method of claim 1, wherein the request is received via a web interface.
  - 6. The method of claim 1, wherein the speech samples are required to meet a minimum quality threshold.
  - 7. The method of claim 1, wherein the network-based speech processing system comprises a language analysis module, a database, and an acoustic synthesis module.
  - 8. The method of claim 1, wherein the text-to-speech voice is language agnostic.
  - 9. The method of claim 1, further comprising:
    - analyzing the speech samples;
      
      determining a coverage hole in the speech samples for a particular purpose; and
      
      suggesting, to the network client, a type of additional speech sample intended to address the coverage hole.
  - 10. The method of claim 9, wherein the analyzing, the determining, and the suggesting is done iteratively until a threshold coverage for the particular purpose is reached.
  - 11. The method of claim 1, further comprising generating a log associated with the demonstration.
  - 12. The method of claim 11, further comprising transmitting the log to the network client.
  - 13. The method of claim 1, further comprising modifying one of the sound units and the demonstration based on an intervention from a human expert.

14. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  receiving, at a network-based automatic speech processing system and from a network client not having access to information of internal operations of the network-based automatic speech processing system, a request to generate a text-to-speech voice, the request comprising a transcription;
  
  extracting sound units from speech samples based on the transcription;
  
  generating a demonstration of the text-to-speech voice based only on the sound units and the transcriptions, wherein the text-to-speech voice is language agnostic; and
  
  providing access to the demonstration to the network client.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The system of claim 14, the request further comprising the speech samples and metadata describing the speech samples.
  - 16. The system of claim 15, wherein the transcription is of the speech samples.
  - 17. The system of claim 14, the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising:
    - receiving an additional request from the network client for the text-to-speech voice; and
      
      providing the text-to-speech voice to the network client.
  - 18. The system of claim 14, wherein the request is received via a web interface.
  - 19. The system of claim 14, wherein the speech samples are required to meet a minimum quality threshold.

20. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- receiving, at a network-based automatic speech processing system and from a network client not having access to information of internal operations of the network-based automatic speech processing system, a request to generate a text-to-speech voice, the request comprising a transcription;
  
  extracting sound units from speech samples based on the transcription;
  
  generating a demonstration of the text-to-speech voice based only on the sound units and the transcriptions, wherein the text-to-speech voice is language agnostic; and
  
  providing access to the demonstration to the network client.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Beutnagel, Mark Charles, Conkie, Alistair D., Kim, Yeon-Jun, Schroeter, Horst Juergen
Primary Examiner(s)
AZAD, ABUL K

Application Number

US14/684,893
Publication Number

US 20150221298A1
Time in Patent Office

484 Days
Field of Search

704/260, 704/270.1
US Class Current

1/1
CPC Class Codes

G10L 13/00 Speech synthesis; Text to s...

G10L 13/04 Details of speech synthesis...

System and method for cloud-based text-to-speech web services

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for cloud-based text-to-speech web services

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links