Global boundary-centric feature extraction and associated discontinuity metrics
First Claim
Patent Images
1. A machine-implemented method comprising:
- extracting portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions;
creating feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the creating feature vectors comprises constructing a mathematical representation of the time domain portions in the vector space;
determining at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and
storing information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process.
0 Assignments
0 Petitions
Accused Products
Abstract
Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.
34 Citations
31 Claims
-
1. A machine-implemented method comprising:
-
extracting portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions; creating feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the creating feature vectors comprises constructing a mathematical representation of the time domain portions in the vector space; determining at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and storing information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A Non-Transitory machine-readable medium having instructions to cause a machine to perform operations comprising:
-
extracting portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions; creating feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the creating feature vectors comprises constructing a mathematical representation of the time domain portions in the vector space; determining at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and storing information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. An apparatus comprising:
-
a memory; and a processor coupled to the memory, wherein the processor is configured to extract portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions; the processor configured to create feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the processor is further configured to construct a mathematical representation of the time domain portions to create the feature vectors in the vector space; the processor configured to determine at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and the processor configured to store information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process. - View Dependent Claims (26, 27, 28, 29, 30)
-
-
31. An apparatus comprising:
-
means for extracting portions from time-domain speech segments, wherein the portions include one or more pitch periods of at least one phoneme, wherein the portions are time domain portions; means for creating feature vectors that represent the portions in a vector space, the feature vectors preserving phase information of the time domain portions, wherein the creating feature vectors comprises constructing a mathematical representation of the time domain portions in the vector space; means for determining at least one distance between the feature vectors in the vector space, the at least one distance representing a discontinuity between the portions; and means for storing information representing the discontinuity in a discontinuity table that is configured to be used in a speech synthesis process.
-
Specification