Global boundary-centric feature extraction and associated discontinuity metrics
First Claim
Patent Images
1. A machine-implemented method comprising:
- i. extracting, via a microprocessor, portions from speech segments, the portions surrounding a segment boundary within a phoneme;
ii. identifying time samples from the portions;
iii. constructing a matrix W containing first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and second data corresponding to the portions;
iv. deriving feature vectors that represent the portions in a vector space by decomposing the matrix W containing the first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and the second data corresponding to the portions; and
v. determining a distance between the feature vectors in the vector space.
2 Assignments
0 Petitions
Accused Products
Abstract
Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.
279 Citations
100 Claims
-
1. A machine-implemented method comprising:
-
i. extracting, via a microprocessor, portions from speech segments, the portions surrounding a segment boundary within a phoneme; ii. identifying time samples from the portions; iii. constructing a matrix W containing first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and second data corresponding to the portions; iv. deriving feature vectors that represent the portions in a vector space by decomposing the matrix W containing the first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and the second data corresponding to the portions; and v. determining a distance between the feature vectors in the vector space. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A machine-readable medium storing instructions to cause a machine to perform a machine-implemented method comprising:
-
i. extracting portions from speech segments that surround a segment boundary within a phoneme; ii. identifying time samples from the portions; iii. constructing a matrix W containing first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and second data corresponding to the portions; iv. deriving feature vectors that represent the portions in a vector space by decomposing the matrix W containing the first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and the second data corresponding to the portions; and v. determining a distance between the feature vectors in the vector space. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. An apparatus comprising:
-
means for extracting portions from speech segments, the portions surrounding a segment boundary within a phoneme; means for identifying time samples from the portions; means for constructing a matrix W containing first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and second data corresponding to the portions; and
means for deriving feature vectors that represent the portions in a vector space by decomposing the matrix W containing the first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and the second data corresponding to the portions; andmeans for determining a distance between the feature vectors in the vector space. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69)
-
-
70. A system comprising:
-
a processing unit coupled to a memory through a bus; and wherein the processing unit is configured, for a process, to extract portions from speech segments, the portions surrounding a segment boundary within a phoneme, identify time samples from the portions;
construct a matrix W containing first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and second data corresponding to the portions, and derive feature vectors that represent the portions in a vector space by decomposing the matrix W containing the first data corresponding to the time samples from the portions surrounding the segment boundary within the phoneme and the second data corresponding to the portions, and determine a distance between the feature vectors in the vector space. - View Dependent Claims (71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92)
-
-
93. A machine-implemented method comprising:
-
i. gathering, via a microprocessor, time-domain samples from recorded speech segments, wherein the time-domain samples include time samples of pitch periods surrounding a segment boundary within a phoneme; ii. constructing a matrix containing first data corresponding to the time samples of the pitch periods surrounding the segment boundary within the phoneme and second data corresponding to the pitch periods and deriving feature vectors that represent the time samples in a vector space by decomposing the matrix containing the first data corresponding to the time samples of the pitch periods surrounding the segment boundary within the phoneme and the second data corresponding to the pitch periods; and iii. determining a discontinuity between the segments, the discontinuity based on a distance between the features. - View Dependent Claims (94)
-
-
95. A machine-readable medium storing instructions to cause a machine to perform a machine-implemented method comprising:
-
i. gathering time-domain samples from recorded speech segments, wherein the time-domain samples include time samples of pitch periods surrounding a segment boundary within a phoneme; ii. constructing a matrix containing first data corresponding to the time samples of the pitch periods surrounding the segment boundary within the phoneme and second data corresponding to the pitch periods and deriving feature vectors that represent the time samples in a vector space by decomposing the matrix containing the first data corresponding to the time samples of the pitch periods surrounding the segment boundary within the phoneme and the second data corresponding the pitch periods; and iii. determining a discontinuity between the segments, the discontinuity based on a distance between the features. - View Dependent Claims (96)
-
-
97. An apparatus comprising:
-
means for gathering time-domain samples from recorded speech segments, wherein the time-domain samples include time samples of pitch periods surrounding a segment boundary within a phoneme; means for constructing a matrix containing first data corresponding to the time samples of the pitch periods surrounding the segment boundary within the phoneme and second data corresponding to the pitch periods and deriving feature vectors that represent the time samples in a vector space by decomposing the matrix containing the first data corresponding to the time domain samples of the pitch periods surrounding the segment boundary within the phoneme and the second data corresponding to the pitch periods; means for determining a discontinuity between the segments, the discontinuity based on a distance between the features. - View Dependent Claims (98)
-
-
99. A system comprising:
-
a processing unit coupled to a memory through a bus; and a process executed from the memory by the processing unit to cause the processing unit to gather time-domain samples from recorded speech segments, wherein the time-domain samples include time samples of pitch periods surrounding a segment boundary within a phoneme, constructing a matrix containing first data corresponding to the time-domain samples of the pitch periods surrounding the segment boundary within the phoneme and second data corresponding to the pitch periods and deriving feature vectors that represent the time samples in a vector space by decomposing the matrix containing the first data corresponding to the time domain samples of the pitch periods surrounding the segment boundary within the phoneme and the second data corresponding to the pitch periods; and
determine a discontinuity between the segments, the discontinuity based on a distance between the features. - View Dependent Claims (100)
-
Specification