Global boundary-centric feature extraction and associated discontinuity metrics

US 7,930,172 B2
Filed: 12/08/2009
Issued: 04/19/2011
Est. Priority Date: 10/23/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A machine-implemented method comprising:

extracting portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions;

creating feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the creating feature vectors comprises constructing a mathematical representation of the time domain portions in the vector space;

determining at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and

storing information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.

34 Citations

View as Search Results

31 Claims

1. A machine-implemented method comprising:
- extracting portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions;
  
  creating feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the creating feature vectors comprises constructing a mathematical representation of the time domain portions in the vector space;
  
  determining at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and
  
  storing information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The machine-implemented method of claim 1, wherein the creating feature vectors comprises:
    - constructing a matrix W from the portions; and
      
      decomposing the matrix W.
  - 3. The machine-implemented method of claim 2, wherein decomposing the matrix W comprises performing a pitch synchronous singular value analysis on the pitch periods of the time-domain segments.
  - 4. The machine-implemented method of claim 1, wherein the creating the feature vectors comprises extracting global boundary-centric features from the portions.
  - 5. The machine-implemented method of claim 1, wherein the speech segments each include a segment boundary within a phoneme.
  - 6. The machine-implemented method of claim 1, wherein at least one of the pitch periods is zero padded to N samples.
  - 7. The machine-implemented method of claim 1 wherein the at least one distance between the feature vectors is determined by a metric comprising a cosine of an angle between the feature vectors.
  - 8. The machine-implemented method of claim 1, wherein a difference between two segments in the table, S₁and S₂, is associated with the discontinuity between S₁and S₂.
  - 9. The machine-implemented method of claim 8, wherein the difference d(S₁,S₂) between two segments in the voice table, S₁and S₂, is calculated as
    d(S₁,S₂)=d₀(p₁,q₁)=1−
    - C(ū
      
      _p1,ū
      
      _q1)where d₀is the distance between pitch periods p₁and q₁, p₁is the last pitch period of S₁, q₁is the first pitch period of S₂, ū
      
      _p1is a feature vector associated with pitch period p₁, and ū
      
      _q1is a feature vector associated with pitch period q₁.
  - 10. The machine-implemented method of claim 8, wherein the difference d(S₁,S₂) between two segments in the table, S₁and S₂, is calculated as
  - 11. The machine-implemented method of claim 1, further comprising associating the distance between the feature vectors with speech segments in the table.
  - 12. The machine-implemented method of claim 1, further comprising:
    - selecting speech segments from the voice table based on the distance between the feature vectors.

13. A Non-Transitory machine-readable medium having instructions to cause a machine to perform operations comprising:
- extracting portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions;
  
  creating feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the creating feature vectors comprises constructing a mathematical representation of the time domain portions in the vector space;
  
  determining at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and
  
  storing information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The Non-Transitory machine-readable medium of claim 13, wherein the creating feature vectors comprises:
    - constructing a matrix W from the portions; and
      
      decomposing the matrix W.
  - 15. The Non-Transitory machine-readable medium of claim 14, wherein decomposing the matrix W comprises performing a pitch synchronous singular value analysis on the pitch periods of the time-domain segments.
  - 16. The Non-Transitory machine-readable medium of claim 13, wherein the creating the feature vectors comprises extracting global boundary-centric features from the portions.
  - 17. The Non-Transitory machine-readable medium of claim 13, wherein the speech segments each include a segment boundary within a phoneme.
  - 18. The Non-Transitory machine-readable medium of claim 13, wherein at least one of the pitch periods is zero padded to N samples.
  - 19. The Non-Transitory machine-readable medium of claim 13 wherein the at least one distance between the feature vectors is determined by a metric comprising a cosine of an angle between the feature vectors.
  - 20. The Non-Transitory machine-readable medium of claim 13, wherein a difference between two segments in the table, S₁and S₂, is associated with the discontinuity between S₁and S₂.
  - 21. The Non-Transitory machine-readable medium of claim 20, wherein the difference d(S₁,S₂) between two segments in the voice table, S₁and S₂, is calculated as
    d(S₁,S₂)=d₀(p₁,q₁)=1−
    - C(ū
      
      _p1,ū
      
      _q1)where d₀is the distance between pitch periods p₁and q₁, p₁is the last pitch period of S₁, q₁is the first pitch period of S₂, ū
      
      _p1is a feature vector associated with pitch period p₁, and ū
      
      _q1is a feature vector associated with pitch period q₁.
  - 22. The Non-Transitory machine-readable medium of claim 20, wherein the difference d(S₁,S₂) between two segments in the table, S₁and S₂, is calculated as
  - 23. The Non-Transitory machine-readable medium of claim 13, further having instructions to cause the machine to perform operations comprising associating the distance between the feature vectors with speech segments in the table.
  - 24. The Non-Transitory machine-readable medium of claim 13, further having instructions to cause the machine to perform operations comprisingselecting speech segments from the voice table based on the distance between the feature vectors.

25. An apparatus comprising:
- a memory; and
  
  a processor coupled to the memory, wherein the processor is configured to extract portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions;
  
  the processor configured to create feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the processor is further configured to construct a mathematical representation of the time domain portions to create the feature vectors in the vector space;
  
  the processor configured to determine at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and
  
  the processor configured to store information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process.
- View Dependent Claims (26, 27, 28, 29, 30)
- - 26. The apparatus of claim 25, wherein the processor is further configured toconstruct a matrix W from the portions;
    - anddecompose the matrix W.
  - 27. The apparatus of claim 26, wherein decomposing the matrix W comprises performing a pitch synchronous singular value analysis on the pitch periods of the time-domain segments.
  - 28. The apparatus of claim 25, wherein the processor is further configured to extract global boundary-centric features from the portions.
  - 29. The apparatus of claim 25, wherein the speech segments each include a segment boundary within a phoneme.
  - 30. The apparatus of claim 25, wherein the processor is further configured to associate the distance between the feature vectors with speech segments in the table.

31. An apparatus comprising:
- means for extracting portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions;
  
  means for creating feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the creating feature vectors comprises constructing a mathematical representation of the time domain portions in the vector space;
  
  means for determining at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and
  
  means for storing information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Bellegarda, Jerome R.
Primary Examiner(s)
Rider; Justin W

Application Number

US12/633,712
Publication Number

US 20100145691A1
Time in Patent Office

497 Days
Field of Search

704/211, 704/216, 704/245
US Class Current

704/211
CPC Class Codes

G10L 15/187 Phonemic context, e.g. pron...

G10L 25/90 Pitch determination of spee...

Global boundary-centric feature extraction and associated discontinuity metrics

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Global boundary-centric feature extraction and associated discontinuity metrics

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links