Data-driven global boundary optimization

US 7,409,347 B1
Filed: 10/23/2003
Issued: 08/05/2008
Est. Priority Date: 10/23/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A machine-implemented method comprising:

extracting portions from segment boundary region of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary;

creating feature vectors that represent the portions in a vector space;

for each of a plurality of potential unit boundaries within each segment boundary region, determining an average discontinuity based on distances between the feature vectors; and

for each segment, selecting the potential unit boundary associated with a minimum average discontinuity as a new unit boundary;

wherein the portions include centered pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein creating feature vectors comprises;

constructing a matrix W from the portions; and

decomposing the matrix W, andwherein the matrix W is a (2(K−

1)+1)M×

N matrix represented by W=UΣ

V^Twhere K−

1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K−

1)+1)M×

R left singular matrix with row vectors u_i(1≦

i≦

(2(K−

1)+1)M), Σ

is the R×

R diagonal matrix of singular values s₁≧

s₂≧

. . . ≧

s_R>

0, V is the N×

R right singular matrix with row vectors v_j(1≦

j≦

N), R<

<

(2(K−

1)+1)M), and ^Tdenotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Portions from segment boundary regions of a plurality of speech segments are extracted. Each segment boundary region is based on a corresponding initial unit boundary. Feature vectors that represent the portions in a vector space are created. For each of a plurality of potential unit boundaries within each segment boundary region, an average discontinuity based on distances between the feature vectors is determined. For each segment, the potential unit boundary associated with a minimum average discontinuity is selected as a new unit boundary.

241 Citations

24 Claims

1. A machine-implemented method comprising:
- extracting portions from segment boundary region of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary;
  
  creating feature vectors that represent the portions in a vector space;
  
  for each of a plurality of potential unit boundaries within each segment boundary region, determining an average discontinuity based on distances between the feature vectors; and
  
  for each segment, selecting the potential unit boundary associated with a minimum average discontinuity as a new unit boundary;
  
  wherein the portions include centered pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein creating feature vectors comprises;
  
  constructing a matrix W from the portions; and
  
  decomposing the matrix W, andwherein the matrix W is a (2(K−
  
  1)+1)M×
  
  N matrix represented by W=UΣ
  
  V^Twhere K−
  
  1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K−
  
  1)+1)M×
  
  R left singular matrix with row vectors u_i(1≦
  
  i≦
  
  (2(K−
  
  1)+1)M), Σ
  
  is the R×
  
  R diagonal matrix of singular values s₁≧
  
  s₂≧
  
  . . . ≧
  
  s_R>
  
  0, V is the N×
  
  R right singular matrix with row vectors v_j(1≦
  
  j≦
  
  N), R<
  
  <
  
  (2(K−
  
  1)+1)M), and ^Tdenotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The machine-implemented method of claim 1, wherein the centered pitch periods are symmetrically zero padded to N samples.
  - 3. The machine-implemented method of claim 1, wherein a feature vector ū
    - _iis calculated as
      ū
      
      _i=u_iΣ
      
      where u_iis a row vector associated with a centered pitch period i, and Σ
      
      is the singular diagonal matrix.
  - 4. The machine-implemented method of claim 3, wherein the distance between two feature vectors is determined by a metric comprising a closeness measure, C, between two feature vectors, ū
    - _kand ū
      
      _l, wherein C is calculated as $C ({\overline{u}}_{k}, {\overline{u}}_{l}) = \cos (u_{k} Σ, u_{l} Σ) = \frac{u_{k} \sum^{2} u_{l}^{T}}{\langle \langle u_{k} Σ \rangle \rangle \langle \langle u_{l} Σ \rangle \rangle}$ for any 1≦
      
      k,l≦
      
      (2(K−
      
      1)+1)M.
  - 5. The machine-implemented method of claim 4, wherein a discontinuity d(S₁, S₂) between two candidate units, S₁and S₂, is calculated as
    d(S₁,S₂)=C(u_π
    - −
      
      1uδ
      
      ₀)+C(uδ
      
      ₀, u_σ
      
      1)−
      
      C(u_π
      
      −
      
      1,u_π
      
      0)−
      
      C(u_σ
      
      0,u_σ
      
      1)where u_π
      
      −
      
      1is a feature vector associated with a centered pitch period π
      
      −
      
      1, uδ
      
      ₀is a feature vector associated with a centered pitch period δ
      
      ₀, u_σ
      
      1is a feature vector associated with a centered pitch period σ
      
      1, u_{90 0}is a feature vector associated with a centered pitch period π
      
      0, and u_σ
      
      0is a feature vector associated with a centered pitch period σ
      
      0.
  - 6. The machine-implemented method of claim 5, wherein same closeness measure, C, is used for optimizing unit boundaries and for unit selection.

7. A non-volatile computer-readable storage medium having computer-executable instructions that when executed by a computer cause the computer to perform a computer-implemented method comprising:
- extracting a portion from segment boundary regions of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary;
  
  creating feature vectors that represent the portions in a vector space;
  
  for each of a plurality of potential unit boundaries within each segment boundary region, determining an average discontinuity based on distances between the feature vectors; and
  
  for each segment, selecting the potential unit boundary associated with a minimum average discontinuity as a new unit boundary;
  
  wherein the portions include center pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein creating feature vectors comprises;
  
  constructing a matrix W from the portions; and
  
  decomposing the matrix W, andwherein the matrix W is a (2(K−
  
  1)+1)M×
  
  N matrix represented by W=UΣ
  
  V^Twhere K−
  
  1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K−
  
  1)+1)M×
  
  R left singular matrix with row vectors u_i(1≦
  
  i≦
  
  (2(K−
  
  1)+1)M), Σ
  
  is the R×
  
  R diagonal matrix of singular values s₁≧
  
  s₂≧
  
  . . . ≧
  
  s_R>
  
  0, V is the N×
  
  R right singular matrix with row vectors v_j(1≦
  
  j≦
  
  N), R<
  
  <
  
  (2(K−
  
  1)+1)M), and ^Tdenotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The non-volatile computer-readable storage medium of claim 7, wherein the centered pitch periods are symmetrically zero padded to N samples.
  - 9. The non-volatile computer-readable storage medium of claim 7, wherein a feature vector ū
    - ₁iscalculated asū
      
      _i=u_iΣ
      
      where u_iis a row vector associated with a centered pitch period i, and Σ
      
      is the singular diagonal matrix.
  - 10. The non-volatile computer-readable storage medium of claim 9, wherein the distance betweentwo featured vectors is determined by a metric comprising a closeness measure, C, between two feature vectors, ū
    - _kand ū
      
      _l, wherein C is calculated as $C ({\overline{u}}_{k}, {\overline{u}}_{l}) = \cos (u_{k} Σ, u_{l} Σ) = \frac{u_{k} \sum^{2} u_{l}^{T}}{\langle \langle u_{k} Σ \rangle \rangle \langle \langle u_{l} Σ \rangle \rangle}$ for any 1≦
      
      k,l≦
      
      (2(K−
      
      1)+1)M.
  - 11. The non-volatile computer-readable storage medium of claim 10, wherein a discontinuityd(S₁,S₂) between two candidate units, S₁and S₂, is calculated as
    d(S₁,S₂)=C(u_π
    - −
      
      1, uδ
      
      ₀)+C(uδ
      
      ₀, u_σ
      
      1)−
      
      C(u_π
      
      −
      
      1, u_π
      
      0)−
      
      C(u_σ
      
      0, u_σ
      
      1)where u_π
      
      −
      
      1is a feature vector associated with a centered pitch period π
      
      −
      
      1, uδ
      
      ₀is a feature vector associated with a centered pitch period δ
      
      ₀, u_σ
      
      1is a feature vector associated with a centered pitch period σ
      
      1, u_π
      
      0is a feature vector associated with a centered pitch period π
      
      0, and u_σ
      
      0is a feature vector associated with a centered pitch period σ
      
      0.
  - 12. The non-volatile computer-readable storage medium of claim 11, wherein the same closenessmeasure, C, is used for optimizing unit boundaries and for unit selection.

13. An apparatus comprising:
- means for extracting from segment boundary regions of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary;
  
  means for creating feature vectors that represent the portions in a vector space;
  
  for each of a plurality of potential unit boundaries within each segment boundary region, means for determining an average discontinuity based on distances between the feature vectors; and
  
  for each segment, means for selecting the potential unit boundary associated with a minimum average discontinuity as a new unit boundary,wherein the portions include centered pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein creating feature vectors comprises;
  
  means for constructing a matrix W from the portions; and
  
  means for decomposing the matrix W, andwherein the matrix W is a (2(K−
  
  1)+1)M×
  
  N matrix represented by W=UΣ
  
  V^Twhere K−
  
  1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K+1)+1)M×
  
  R left singular matrix with row vectors u_i(1≦
  
  i≦
  
  (2(K−
  
  1)+1)M), Σ
  
  is the R×
  
  R diagonal matrix of singular values s₁≧
  
  s₂≧
  
  . . . ≧
  
  s_R>
  
  0, V is the N×
  
  R right singular matrix with row vectors v_f(1≦
  
  j≦
  
  N), R<
  
  <
  
  (2(K−
  
  1)+1)M), and ^Tdenotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The apparatus of claim 13, wherein the centered pitch periods are symmetrically zero padded to N samples.
  - 15. The apparatus of claim 13, wherein a feature vector ū
    - _iis calculated as
      ū
      
      _i=u_iΣ
      
      wherein u_iis a row vector associated with a centered pitch period i, and Σ
      
      is the singular diagonal matrix.
  - 16. The apparatus of claim 15, wherein the distance between two feature vectors is determined by a metric comprising a closeness measure, C, between two feature vectors, ū
    - _kand ū
      
      _l, wherein C is calculated as $C ({\overline{u}}_{k}, {\overline{u}}_{l}) = \cos (u_{k} Σ, u_{l} Σ) = \frac{u_{k} \sum^{2} u_{l}^{T}}{\langle \langle u_{k} Σ \rangle \rangle \langle \langle u_{l} Σ \rangle \rangle}$ for any 1≦
      
      k,l≦
      
      (2(K−
      
      1)+1)M.
  - 17. The apparatus of claim 16, wherein a discontinuity d(S₁,S₂) between two candidate units, S₁and S₂, is calculated as
    d(S₁,S₂)=C(u_π
    - −
      
      1, uδ
      
      ₀)+C(uδ
      
      ₀, u_σ
      
      1)−
      
      C(u_π
      
      −
      
      1, u_π
      
      0)−
      
      C(u_σ
      
      0, u_σ
      
      1)where u_π
      
      −
      
      1is a feature vector associated with a centered pitch period π
      
      −
      
      1, uδ
      
      0 is a feature vector associated with a centered pitch period δ
      
      ₀, u_σ
      
      1is a feature vector associated with a centered pitch period σ
      
      ₁, u_{90 0}is a feature vector associated with a centered pitch period π
      
      ₀, and u_σ
      
      0is a feature vector associated with a centered pitch period σ
      
      ₀.
  - 18. The apparatus of claim 17, wherein the same closeness measure, C, is used for optimizing unit boundaries and for unit selection.

19. A system comprising:
- a processing unit coupled to a memory through a bus; and
  
  a memory unit storing a process executed by the processing unit to cause the processing unit to;
  
  extract portions from segment boundary regions of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary;
  
  create feature vectors that represent the portions in a vector space;
  
  for each of a plurality of potential unit boundaries within each segment boundary region, determine an average discontinuity based on distances between the feature vectors; and
  
  for each segment, select the potential unit boundary associated with a minimum average discontinuity as a new unit boundary,wherein the portions include centered pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein the process further causes the processing unit, when creating feature vectors, to;
  
  construct a matrix W from the portions; and
  
  decompose the matrix W, andwherein the matrix W is a (2(K−
  
  1)+1)M×
  
  N matrix represented by W=UΣ
  
  V^Twhere K−
  
  1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K−
  
  1)+1)M×
  
  R left singular matrix with row vectors u_i(1≦
  
  i≦
  
  (2(K−
  
  1)+1)M), Σ
  
  is the R×
  
  R diagonal matrix of singular values s₁≧
  
  s₂≧
  
  . . . ≧
  
  s_R>
  
  0, V is the N×
  
  R right singular matrix with row vectors v_j(1≦
  
  j≦
  
  N), R<
  
  <
  
  (2(K−
  
  1)+1)M), and ^Tdenotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The system of claim 19, wherein the centered pitch periods are symmetrically zero padded to N samples.
  - 21. The system of claim 19, wherein a feature vector ū
    - _iis calculated asū
      
      _i=u_iΣ
      
      where u_iis a row vector associated with a centered pitch period i, and Σ
      
      is the singular diagonal matrix.
  - 22. The system of claim 21, wherein the distance between two feature vectors is determined by a metric comprising a closeness measure, C, between two feature vectors, ū
    - _kand ū
      
      _i, wherein C is calculated as $C ({\overline{u}}_{k}, {\overline{u}}_{l}) = \cos (u_{k} Σ, u_{l} Σ) = \frac{u_{k} \sum^{2} u_{l}^{T}}{\langle \langle u_{k} Σ \rangle \rangle \langle \langle u_{l} Σ \rangle \rangle}$ for any 1≦
      
      k,l≦
      
      (2(K−
      
      1)+1)M.
  - 23. The system of claim 22, wherein a discontinuity d(S₁,S₂) between two candidate units, S₁and S₂, is calculated as
    d(S₁,S₂)=C(u_π
    - −
      
      1, uδ
      
      ₀)+C(uδ
      
      ₀, u_σ
      
      1)−
      
      C(u_π
      
      −
      
      1, u_π
      
      0)−
      
      C(u_σ
      
      0, u_σ
      
      1)where u_π
      
      −
      
      1is a feature vector associated with a centered pitch period π
      
      −
      
      1, uδ
      
      ₀is a feature vector associated with a centered pitch period δ
      
      ₀, u_σ₁is a feature vector associated with a centered pitch period σ
      
      ₁, u_π₀is a feature vector associated with a centered pitch period π
      
      ₀, and u_σ
      
      0is a feature vector associated with a centered pitch period σ
      
      ₀.
  - 24. The system of claim 23, wherein the same closeness measure, C, is used for optimizing unit boundaries and for unit selection.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Bellegarda, Jerome R.
Primary Examiner(s)
Edouard; Patrick N.
Assistant Examiner(s)
Wozniak; James S.

Application Number

US10/692,994
Time in Patent Office

1,748 Days
Field of Search

704/258, 704/260, 704/265, 704/267
US Class Current

704/267
CPC Class Codes

G10L 13/06 Elementary speech units use...

Data-driven global boundary optimization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

241 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Data-driven global boundary optimization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

241 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links