SMALL FOOTPRINT TEXT-TO-SPEECH ENGINE

US 20110071835A1
Filed: 09/22/2009
Published: 03/24/2011
Est. Priority Date: 09/22/2009
Status: Abandoned Application

First Claim

Patent Images

1. A computer readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:

generating a set of feature parameters for an input text, the set of feature parameters including static feature parameters and delta feature parameters;

deriving a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta feature parameters;

producing a smoothed trajectory from the saw-tooth stochastic trajectory; and

generating synthesized speech based on the smoothed trajectory.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of small footprint text-to-speech engine are disclosed. In operation, the small footprint text-to-speech engine generates a set of feature parameters for an input text. The set of feature parameters includes static feature parameters and delta feature parameters. The small footprint text-to-speech engine then derives a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta parameters. Finally, the small footprint text-to-speech engine produces a smoothed trajectory from the saw-tooth stochastic trajectory, and generates synthesized speech based on the smoothed trajectory.

12 Citations

View as Search Results

20 Claims

1. A computer readable medium storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising:
- generating a set of feature parameters for an input text, the set of feature parameters including static feature parameters and delta feature parameters;
  
  deriving a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta feature parameters;
  
  producing a smoothed trajectory from the saw-tooth stochastic trajectory; and
  
  generating synthesized speech based on the smoothed trajectory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer readable medium of claim 1, further storing an instruction that, when executed, cause the one or more processors to perform an act comprising outputting the synthesized speech to at least one of an acoustic speaker or a data storage.
  - 3. The computer readable medium of claim 1, wherein the generating includes using trained stream-dependent Hidden Markov Models (HMMs) to generate the set of feature parameters.
  - 4. The computer readable medium of claim 1, wherein the deriving includes inputting the static feature parameters and the delta feature parameters into equations that are solved via Cholesky decomposition.
  - 5. The computer readable medium of claim 1, wherein the deriving includes using at least a square root version of Cholesky decomposition or a no-square root version of Cholesky decomposition to derive the saw-tooth stochastic trajectory.
  - 6. The computer readable medium of claim 1, wherein the deriving includes using at least a no-square root version of Cholesky decomposition that includes a one-division optimization to derive the saw-tooth stochastic trajectory.
  - 7. The computer readable medium of claim 1, wherein the producing includes using an average window algorithm or an envelope generation algorithm to smooth the saw-tooth stochastic trajectory.
  - 8. The computer-readable medium of claim 1, wherein the smoothed trajectory encompasses speech patterns, line spectral pair (LSP) coefficients, fundamental frequency, and a gain, and wherein the producing includes producing the synthesized speech based on the speech patterns, the line spectral pair (LSP) coefficients, the fundamental frequency, and the gain.

9. A computer implemented method, comprising:
- under control of one or more computing systems configured with executable instructions,generating a set of feature parameters for an input text using trained stream-dependent Hidden Markov Models (HMMs), the set of feature parameters including static feature parameters and delta feature parameters;
  
  deriving a saw-tooth stochastic trajectory that represents the speech characteristics of the input text based on the static feature parameters and the delta feature parameters.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The computer implemented method of claim 9, further comprising producing a smoothed trajectory from the saw-tooth stochastic trajectory.
  - 11. The computer implemented method of claim 9, wherein deriving includes inputting the static feature parameters and the delta feature parameters into equations that are solved via Cholesky decomposition.
  - 12. The computer implemented method of claim 9, wherein the deriving includes using a no-square root version of Cholesky decomposition to eliminate square root calculations during the derivation of the saw-tooth stochastic trajectory.
  - 13. The computer implemented method of claim 9, wherein the deriving includes using a no-square root version of Cholesky decomposition and a one-division optimization to eliminate square root and division calculations during the derivation of the saw-tooth stochastic trajectory.
  - 14. The computer implemented method of claim 9, wherein the smoothed trajectory encompasses speech patterns, line spectral pair (LSP) coefficients, a fundamental frequency, and a gain, and wherein the producing includes producing the synthesized speech based on the speech patterns, the line spectral pair (LSP) coefficients, the fundamental frequency, and the gain.
  - 15. The computer implemented method of claim 10, wherein the producing includes using an average window algorithm or an envelope generation algorithm to smooth the saw-tooth stochastic trajectory.

16. A system, comprising:
- one or more processors;
  
  a memory that includes a plurality of computer-executable components, the plurality of computer-executable components comprising;
  
  a parameter generator to generate a set of feature parameters for an input text, the set of feature parameters including static feature parameters and delta feature parameters, and to derive a saw-tooth stochastic trajectory based on the static feature parameters and the delta feature parameters; and
  
  an audio smoother to producing a smoothed trajectory from the saw-tooth stochastic trajectory.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16, further comprising a linear predicative coding (LPC) synthesizer to generate synthesized speech based on the smoothed trajectory.
  - 18. The system of claim 16, wherein the parameter generator is to use at least a square root version of Cholesky decomposition or a no-square root version of the Cholesky decomposition to derive the saw-tooth stochastic trajectory.
  - 19. The system of claim 16, wherein the parameter generator is to use at least a no-square root version of Cholesky decomposition that includes a one-division optimization to derive the saw-tooth stochastic trajectory.
  - 20. The system of claim 19, wherein the audio smoother is to use an average window algorithm or an envelope generation algorithm to smooth the saw-tooth stochastic trajectory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Yan, Zhi-Jie, Chen, Yi-Ning, Soong, Frank Kao-Ping

Application Number

US12/564,326
Publication Number

US 20110071835A1
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/047 Architecture of speech synt...

G10L 13/08 Text analysis or generation...

SMALL FOOTPRINT TEXT-TO-SPEECH ENGINE

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

12 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

SMALL FOOTPRINT TEXT-TO-SPEECH ENGINE

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

12 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others