Data-driven global boundary optimization
First Claim
Patent Images
1. A machine-implemented method comprising:
- extracting portions from segment boundary region of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary;
creating feature vectors that represent the portions in a vector space;
for each of a plurality of potential unit boundaries within each segment boundary region, determining an average discontinuity based on distances between the feature vectors; and
for each segment, selecting the potential unit boundary associated with a minimum average discontinuity as a new unit boundary;
wherein the portions include centered pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein creating feature vectors comprises;
constructing a matrix W from the portions; and
decomposing the matrix W, andwherein the matrix W is a (2(K−
1)+1)M×
N matrix represented by W=UΣ
VT where K−
1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K−
1)+1)M×
R left singular matrix with row vectors ui(1≦
i≦
(2(K−
1)+1)M), Σ
is the R×
R diagonal matrix of singular values s1≧
s2≧
. . . ≧
sR>
0, V is the N×
R right singular matrix with row vectors vj(1≦
j≦
N), R<
<
(2(K−
1)+1)M), and T denotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W.
2 Assignments
0 Petitions
Accused Products
Abstract
Portions from segment boundary regions of a plurality of speech segments are extracted. Each segment boundary region is based on a corresponding initial unit boundary. Feature vectors that represent the portions in a vector space are created. For each of a plurality of potential unit boundaries within each segment boundary region, an average discontinuity based on distances between the feature vectors is determined. For each segment, the potential unit boundary associated with a minimum average discontinuity is selected as a new unit boundary.
241 Citations
24 Claims
-
1. A machine-implemented method comprising:
-
extracting portions from segment boundary region of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary; creating feature vectors that represent the portions in a vector space; for each of a plurality of potential unit boundaries within each segment boundary region, determining an average discontinuity based on distances between the feature vectors; and for each segment, selecting the potential unit boundary associated with a minimum average discontinuity as a new unit boundary; wherein the portions include centered pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein creating feature vectors comprises; constructing a matrix W from the portions; and decomposing the matrix W, and wherein the matrix W is a (2(K−
1)+1)M×
N matrix represented by W=UΣ
VTwhere K−
1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K−
1)+1)M×
R left singular matrix with row vectors ui(1≦
i≦
(2(K−
1)+1)M), Σ
is the R×
R diagonal matrix of singular values s1≧
s2≧
. . . ≧
sR>
0, V is the N×
R right singular matrix with row vectors vj(1≦
j≦
N), R<
<
(2(K−
1)+1)M), and T denotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-volatile computer-readable storage medium having computer-executable instructions that when executed by a computer cause the computer to perform a computer-implemented method comprising:
-
extracting a portion from segment boundary regions of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary; creating feature vectors that represent the portions in a vector space; for each of a plurality of potential unit boundaries within each segment boundary region, determining an average discontinuity based on distances between the feature vectors; and for each segment, selecting the potential unit boundary associated with a minimum average discontinuity as a new unit boundary; wherein the portions include center pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein creating feature vectors comprises; constructing a matrix W from the portions; and decomposing the matrix W, and wherein the matrix W is a (2(K−
1)+1)M×
N matrix represented by W=UΣ
VT where K−
1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K−
1)+1)M×
R left singular matrix with row vectors ui (1≦
i≦
(2(K−
1)+1)M), Σ
is the R×
R diagonal matrix of singular values s1≧
s2≧
. . . ≧
sR>
0, V is the N×
R right singular matrix with row vectors vj(1≦
j≦
N), R<
<
(2(K−
1)+1)M), and T denotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An apparatus comprising:
-
means for extracting from segment boundary regions of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary; means for creating feature vectors that represent the portions in a vector space; for each of a plurality of potential unit boundaries within each segment boundary region, means for determining an average discontinuity based on distances between the feature vectors; and for each segment, means for selecting the potential unit boundary associated with a minimum average discontinuity as a new unit boundary, wherein the portions include centered pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein creating feature vectors comprises; means for constructing a matrix W from the portions; and means for decomposing the matrix W, and wherein the matrix W is a (2(K−
1)+1)M×
N matrix represented by W=UΣ
VT where K−
1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K+1)+1)M×
R left singular matrix with row vectors ui (1≦
i≦
(2(K−
1)+1)M), Σ
is the R×
R diagonal matrix of singular values s1≧
s2≧
. . . ≧
sR>
0, V is the N×
R right singular matrix with row vectors vf(1≦
j≦
N), R<
<
(2(K−
1)+1)M), and T denotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
a processing unit coupled to a memory through a bus; and a memory unit storing a process executed by the processing unit to cause the processing unit to; extract portions from segment boundary regions of a plurality of speech segments, each segment boundary region based on a corresponding initial unit boundary; create feature vectors that represent the portions in a vector space; for each of a plurality of potential unit boundaries within each segment boundary region, determine an average discontinuity based on distances between the feature vectors; and for each segment, select the potential unit boundary associated with a minimum average discontinuity as a new unit boundary, wherein the portions include centered pitch periods, the centered pitch periods derived from pitch periods of the segments, wherein the feature vectors incorporate phase information of the portions, wherein the process further causes the processing unit, when creating feature vectors, to; construct a matrix W from the portions; and decompose the matrix W, and wherein the matrix W is a (2(K−
1)+1)M×
N matrix represented by W=UΣ
VT where K−
1 is the number of centered pitch periods near the potential unit boundary extracted from each segment, N is the maximum number of samples among the centered pitch periods, M is the number of segments, U is the (2(K−
1)+1)M×
R left singular matrix with row vectors ui(1≦
i≦
(2(K−
1)+1)M), Σ
is the R×
R diagonal matrix of singular values s1≧
s2≧
. . . ≧
sR>
0, V is the N×
R right singular matrix with row vectors vj(1≦
j≦
N), R<
<
(2(K−
1)+1)M), and T denotes matrix transposition, wherein decomposing the matrix W comprises performing a singular value decomposition of W. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification