Voice persona service for embedding text-to-speech features into software programs

US 7,689,421 B2
Filed: 06/27/2007
Issued: 03/30/2010
Est. Priority Date: 06/27/2007
Status: Active Grant

First Claim

Patent Images

1. In a computing environment, a system comprising, a service that includes a user interface accessible to clients via a network, a text-to-speech engine, and a data store of user-defined voice personas, a user-defined voice persona specifying one of a plurality of base voices and a plurality of voice morphing parameters associated with the base voice, the service configured to receive definitions of the voice personas from users and store the user-defined voice personas in the store of voice personas, where the users use the user interface to input new voice morphing parameters to modify the morphing parameters of the voice personas, the service configured to obtain via the network a user-provided text-to-speech input script comprised of portions of text comprised of respective voice persona identifiers, each voice persona identifier identifying one of the user-defined voice personas including a voice persona having the voice morphing parameters modified by the new voice morphing parameters inputted through the user interface, and the service converting the text-to-speech input script to a speech waveform via a text-to-speech engine based on the identified user-defined voice personas in the data store of voice personas, where portions of text in the text-to-speech script are converted to speech portions of the speech waveform using the user-defined voice personas identified by the voice persona identifiers, respectively.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store. The service may be remotely accessed, such as via the Internet. The user may provide text tagged with parameters, with the text sent to a text-to-speech engine along with base or custom voice data, and the resulting waveform morphed based on the tags. The user may also provide speech. Once created, a voice persona corresponding to the speech waveform may be persisted, exchanged, made public, shared and so forth. In one example, the voice persona service receives user input and parameters, and retrieves a base or custom voice that may be edited by the user via a morphing algorithm. The service outputs a waveform, such as a .wav file for embedding in a software program, and persists the voice persona corresponding to that waveform.

308 Citations

18 Claims

1. In a computing environment, a system comprising, a service that includes a user interface accessible to clients via a network, a text-to-speech engine, and a data store of user-defined voice personas, a user-defined voice persona specifying one of a plurality of base voices and a plurality of voice morphing parameters associated with the base voice, the service configured to receive definitions of the voice personas from users and store the user-defined voice personas in the store of voice personas, where the users use the user interface to input new voice morphing parameters to modify the morphing parameters of the voice personas, the service configured to obtain via the network a user-provided text-to-speech input script comprised of portions of text comprised of respective voice persona identifiers, each voice persona identifier identifying one of the user-defined voice personas including a voice persona having the voice morphing parameters modified by the new voice morphing parameters inputted through the user interface, and the service converting the text-to-speech input script to a speech waveform via a text-to-speech engine based on the identified user-defined voice personas in the data store of voice personas, where portions of text in the text-to-speech script are converted to speech portions of the speech waveform using the user-defined voice personas identified by the voice persona identifiers, respectively.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1 further comprising a voice morphing engine that modifies the speech portions based on the morphing parameters of the identified voice personas.
  - 3. The system of claim 1 herein the service allows users to share user-defined voice personas with other users via the network.
  - 4. The system of claim 1 wherein the voice persona identifiers comprise tags embedded in the user input text-to-speech script.
  - 5. The system of claim 4 wherein at least one tag comprises an XML-based tag that describes a characteristic of the identified voice persona.
  - 6. The system of claim 1 wherein service receives user-provided binary audio speech data, and the service creates and stores a personal base voice from the user-provided binary audio speech data, the personal base voice being available to be specified as a base voice for a user defined voice persona.

7. A computer-readable storage medium having computer-executable instructions, which when executed perform steps, comprising:
- storing a plurality of voice personas in a data store, each voice persona comprising a base voice and voice morphing parameters, the voice personas accessible to clients from a voice persona service via a network;
  
  receiving at the voice persona service, via the network, user input identifying one of the stored voice personas and the user input comprising voice morphing parameters;
  
  retrieving the base voice and the voice morphing parameters of the voice persona identified by the user input;
  
  modifying the retrieved voice morphing parameters of the voice persona based on the received voice morphing parameters inputted by the user;
  
  saving the modified voice persona in the data store as a new voice persona; and
  
  receiving text from a user via the network at the voice persona service, retrieving the new voice persona and outputting a waveform corresponding to the voice persona by performing text-to-speech conversion and speech morphing using the modified morphing parameters.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The computer-readable storage medium of claim 7 having further computer-executable instructions comprising, receiving the morphing parameters in an editing operation that modifies, the morphing parameters in the voice persona identified by the user input.
  - 9. The computer-readable storage medium of claim 7 having further computer-executable instructions comprising, at the service, playing the waveform.
  - 10. The computer-readable storage medium of claim 7 wherein outputting the waveform comprises downloading an audio file to a user.
  - 11. The computer-readable storage medium of claim 7 wherein the text comprises tagged text which includes the text and a tag accompanying the text, and parsing the tagged text to send the text to a speech-to-text engine to generate the waveform and to apply a morphing algorithm to the waveform based on the tag.
  - 12. The computer-readable storage medium of claim 7 wherein the user input comprises speech and text corresponding to the speech, and wherein saving the parameter data in a voice persona comprises saving the text in a name card and saving the speech and text in association with a script.

13. A computer-implemented method for a network service allowing users to create and use voice personas in a text-to-speech system, the method comprising:
- maintaining a database of voice persona records, each voice persona record specifying an identifier of a voice persona, a base voice of the voice persona, and a plurality of voice morphing parameters of the voice persona;
  
  receiving from clients, via a network, specifications for voice persona records, the specifications comprising voice morphing parameters inputted by users, and in response modifying or creating voice persona records in the database that have the voice morphing parameters by modifying the voice persona records with the voice morphing parameters inputted by the users;
  
  receiving from clients, via the network, text-to-speech scripts, a text-to-speech script comprising portions of text and identifiers identifying voice personas that have the voice morphing parameters received from the clients, and in response;
  
  using the identifiers to retrieve corresponding voice persona records identified by the identifiers,for each retrieved voice persona record, given such a retrieved voice persona record, performing text-to-speech conversion on a corresponding portion of text in the text-to-speech script using the base voice specified by the given voice and morphing the base voice according to the voice morphing parameters specified by the given voice persona record, the conversions of the portions together producing an audio speech data unit comprised of portions of audio speech data of the text portions in voice according to the respective voice persona records.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. A method according to claim 13further comprising providing a user interface including one or more interfaces by which a user interacts with the network service to generate a waveform from voice data persisted via a data access mechanism and from a speech-to-text engine, and to modify the waveform with at least one morphing algorithm.
  - 15. A method according to claim 14, wherein the user interface includes a voice persona creation interface, a voice persona management interface, or a voice persona employment interface, or any combination of a voice persona creation interface, a voice persona management interface, or a voice persona employment interface;
    - wherein the network service includes a voice persona parser, a voice persona creation mechanism or a voice persona implementation mechanism, or any combination of a voice persona parser, a voice persona creation mechanism, or a voice persona implementation mechanism; and
      
      wherein the data access mechanism includes a base voice persona data store and a voice persona collection data store.
  - 16. A method according to claim 13, further comprising persisting a voice persona corresponding to the waveform, and sharing the voice persona.
  - 17. A method according to claim 13, wherein the speech-to-text conversion uses a hidden Markov model-based system, and wherein the morphing is performed using a sinusoidal model based morphing algorithm, a source-filter model based morphing algorithm, or a phonetic transition morphing algorithm.
  - 18. A computer-implemented method according to claim 13, wherein the text-to-speech conversion comprises automatically selecting a text-to-speech engine from among a plurality of text-to-speech engines.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Li, Yusheng, Soong, Frank Kao-ping, Zou, Xin, Chu, Min
Primary Examiner(s)
Vo; Huyen X.

Application Number

US11/823,169
Publication Number

US 20090006096A1
Time in Patent Office

1,007 Days
Field of Search

704/258, 704/260, 704/268, 704/206, 704/275, 704/261, 704/266, 704/267, 704/243, 704/244, 704/270, 704/270.1
US Class Current

704/260
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

G10L 13/08 Text analysis or generation...

Voice persona service for embedding text-to-speech features into software programs

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

308 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Voice persona service for embedding text-to-speech features into software programs

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

308 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links