Refining of segmental boundaries in speech waveforms using contextual-dependent models
First Claim
Patent Images
1. A method of ascertaining phoneme speech unit boundaries of adjacent speech units in speech data, the method comprising:
- receiving training data of speech waveforms with known boundary locations of phoneme speech units contained therein;
processing the speech waveforms to obtain multi-frame acoustic feature pseudo-triphone representations of a plurality of pseudo-triphones in the speech data, each pseudo-triphone comprising a boundary location, a first phoneme speech unit preceding the boundary location and a second phoneme speech unit following the boundary location;
clustering the multi-frame acoustic feature pseudo-triphone representations as a function of acoustic similarity in a plurality of clusters;
training a refining model for each cluster;
receiving a second set of data of speech waveforms with initial boundary locations of adjacent phoneme speech units contained therein;
identifying pseudo-triphones in the second set of data and corresponding refining models for each of the pseudo-triphones; and
using the refining model for each corresponding pseudo-triphone for the second set of data to locate a new boundary location different than the initial boundary and provide output indicating the new boundary locations.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.
-
Citations
34 Claims
-
1. A method of ascertaining phoneme speech unit boundaries of adjacent speech units in speech data, the method comprising:
-
receiving training data of speech waveforms with known boundary locations of phoneme speech units contained therein; processing the speech waveforms to obtain multi-frame acoustic feature pseudo-triphone representations of a plurality of pseudo-triphones in the speech data, each pseudo-triphone comprising a boundary location, a first phoneme speech unit preceding the boundary location and a second phoneme speech unit following the boundary location; clustering the multi-frame acoustic feature pseudo-triphone representations as a function of acoustic similarity in a plurality of clusters; training a refining model for each cluster; receiving a second set of data of speech waveforms with initial boundary locations of adjacent phoneme speech units contained therein; identifying pseudo-triphones in the second set of data and corresponding refining models for each of the pseudo-triphones; and using the refining model for each corresponding pseudo-triphone for the second set of data to locate a new boundary location different than the initial boundary and provide output indicating the new boundary locations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage medium having computer-executable instructions for processing speech data, the computer-readable medium comprising:
-
an acoustic feature generator adapted to receive training data of speech waveforms with known boundary locations of phoneme speech units contained therein and generate multi-frame acoustic feature pseudo-triphone representations of a plurality of pseudo-triphones in the training data, each pseudo-triphone comprising a boundary location, a first phoneme speech unit preceding the boundary location and a second phoneme speech unit following the boundary location; a clustering module adapted to receive the multi-frame acoustic feature pseudo-triphone representations of the plurality of pseudo-triphones and cluster the representations based on acoustic similarity; and a refining module generator adapted to operate on each cluster of representations and generate a statistical model therefor indicative of the location of the boundary for each cluster. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification