Pre-saved data compression for TTS concatenation cost

US 8,798,998 B2
Filed: 04/05/2010
Issued: 08/05/2014
Est. Priority Date: 04/05/2010
Status: Active Grant

First Claim

Patent Images

1. A computing device for performing concatenative speech synthesis by a processing unit of the computing device, the computing device comprising:

a memory;

a processor coupled to the memory, the processor executing a text to speech (TTS) application in conjunction with instructions stored in the memory, wherein the TTS application is configured to;

determine, based on a matrix of concatenation costs, feature vectors for speech segments, wherein some of the speech segments occur at asynchronous time intervals;

apply distance weighting to one of;

the speech segments and at least two consecutive speech segments, wherein the distance weighting is based on feature vectors associated with the speech segments or is based on feature vectors associated with the at least two consecutive speech segments;

cluster the speech segments into a predefined number of groups such that an average distance between speech segments within each group is minimized;

select a representative speech segment for each group; and

generate a compressed concatenation cost matrix based on the representative speech segments.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Pre-saved concatenation cost data is compressed through speech segment grouping. Speech segments are assigned to a predefined number of groups based on their concatenation cost values with other speech segments. A representative segment is selected for each group. The concatenation cost between two segments in different groups may then be approximated by that between the representative segments of their respective groups, thereby reducing an amount of concatenation cost data to be pre-saved.

27 Citations

View as Search Results

18 Claims

1. A computing device for performing concatenative speech synthesis by a processing unit of the computing device, the computing device comprising:
- a memory;
  
  a processor coupled to the memory, the processor executing a text to speech (TTS) application in conjunction with instructions stored in the memory, wherein the TTS application is configured to;
  
  determine, based on a matrix of concatenation costs, feature vectors for speech segments, wherein some of the speech segments occur at asynchronous time intervals;
  
  apply distance weighting to one of;
  
  the speech segments and at least two consecutive speech segments, wherein the distance weighting is based on feature vectors associated with the speech segments or is based on feature vectors associated with the at least two consecutive speech segments;
  
  cluster the speech segments into a predefined number of groups such that an average distance between speech segments within each group is minimized;
  
  select a representative speech segment for each group; and
  
  generate a compressed concatenation cost matrix based on the representative speech segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computing device of claim 1, wherein the TTS application is further configured to:
    - pre-save the compressed concatenation cost matrix for real time computations in synthesizing speech.
  - 3. The computing device of claim 1, wherein the distance weighting is applied employing one of:
    - a Euclidean distance function and a city block distance function.
  - 4. The computing device of claim 1, wherein the compressed concatenation cost matrix is constructed along a preceding speech segment and a following speech segment, wherein the preceding speech segment and the following speech segment are the at least two consecutive speech segments.
  - 5. The computing device of claim 4, wherein a concatenation cost between the at least two consecutive speech segments is different from another concatenation cost between at least two similar consecutive speech segments with an order of the speech segments reversed.
  - 6. The computing device of claim 1, wherein the representative speech segment for each group is selected such that an average distance between the representative speech segment and other speech segments within a similar group is minimized.
  - 7. The computing device of claim 1, wherein a number of the groups is determined based on at least one from a set of:
    - a total number of speech segments, distances between the speech segments, and a desired reduction in concatenation cost data.
  - 8. The computing device of claim 1, wherein the representative speech segment for each group is selected based on one of a median concatenation cost and a mean concatenation cost of each group.
  - 9. The computing device of claim 1, wherein the speech segments include one of:
    - individual phones, diphones, half-phones, and syllables.

10. A computing device for generating speech employing compressed concatenation cost data, the computing device comprising:
- a memory;
  
  a processor coupled to the memory, the processor executing a text to speech (TTS) application in conjunction with instructions stored in the memory, wherein the TTS application is configured to;
  
  determine feature vectors for speech segments, wherein the feature vectors comprise concatenation cost values, and wherein the concatenation cost values are costs of concatenating the speech segments with at least two consecutive speech segments;
  
  apply distance weighting to one of;
  
  the speech segments and the at least two consecutive speech segments, wherein the distance weighting is based on feature vectors associated with the speech segments or is based on feature vectors associated with the at least two consecutive speech segmentscluster the speech segments into a predefined number of groups such that an average distance between speech segments within each group is minimized;
  
  select a representative speech segment for each group such that an average distance between the representative speech segment and other speech segments within a similar group are minimized;
  
  generate a compressed concatenation cost matrix based on the representative speech segments; and
  
  pre-save the compressed concatenation cost matrix for real time computations in synthesizing speech.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The computing device of claim 10, wherein the distance weighting is applied such that a sensitivity to compression errors is reduced.
  - 12. The computing device of claim 10, wherein the representative speech segment for each group is further selected based on center re-estimation.
  - 13. The computing device of claim 10, wherein a speech segment data store is configured to receive the speech segments from at least one of:
    - a user input and a set of prerecorded speech patterns.
  - 14. The computing device of claim 10, wherein an analysis engine is configured to:
    - perform at least one from a set of;
      
      text analysis, prosody analysis, and phonetic analysis; and
      
      provide input to a speech synthesis engine for segment selection based on a plurality of performed analyses.

15. A computer-readable memory device with instructions stored thereon for generating speech employing compressed concatenation cost data, the instructions comprising:
- determining, based on a matrix of concatenation costs, feature vectors for speech segments, wherein the matrix of concatenation costs is constructed along a preceding speech segment and a following speech segment for each segmentapplying distance weighting to one of;
  
  the speech segments and at least two consecutive speech segments, wherein the distance weighting is based on feature vectors associated with the speech segments or is based on feature vectors associated with the at least two consecutive speech segmentsclustering the speech segments into M preceding segment and N following segment groups such that an average distance between speech segments within each group is minimized;
  
  selecting a representative speech segment for each group;
  
  generating a compressed concatenation cost matrix such that a concatenation cost between the speech segments and the at least two consecutive speech segments is approximated by a concatenation cost between a representative segment associated with the speech segments and another representative speech segment associated with the at least two consecutive speech segments; and
  
  pre-saving the compressed concatenation cost matrix for real time computations in synthesizing speech.
- View Dependent Claims (16, 17, 18)
- - 16. The computer-readable memory device of claim 15, wherein the distance weighting is applied employing distance function:
  - 17. The computer-readable memory device of claim 15, wherein the instructions further comprise:
    - determining M and N based on at least one from a set of;
      
      a total number of speech segments, distances between the speech segments, and a desired reduction in concatenation cost data.
  - 18. The computer-readable memory device of claim 15, wherein a size of pre-saved concatenation data is reduced by [n²/(M×
    - N)], where n is a total number of the speech segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Song, Huicheng, Zhang, Guoliang, Weng, Zhiwei
Primary Examiner(s)
YEN, ERIC L

Application Number

US12/754,045
Publication Number

US 20110246200A1
Time in Patent Office

1,583 Days
Field of Search

704/258, 704/260
US Class Current

704/258
CPC Class Codes

G10L 13/07 Concatenation rules

Pre-saved data compression for TTS concatenation cost

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

27 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Pre-saved data compression for TTS concatenation cost

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links