Blind clustering of data with application to speech processing systems
First Claim
1. A method for segmenting speech without knowledge of linguistic information into a plurality of segments comprising the steps of:
- estimating a range of a number of said segments in said speech;
dynamically determining locations of boundaries for each estimate of a number of said segments K within said range of said number of said segments;
determining an optimality criterion Qk for each of said estimate of said number of segments K from said location of said boundaries;
determining an optimal number of segments K0 in said speech from said optimality criterion Qk ;
segmenting said speech into said optimal number of segments K0 ; and
storing said optimal number of segments K0.
7 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a method for segmenting speech into subword speech segments. Optimal boundary locations for each estimate of a number of segments are determined within an estimated range of the number of segments. In addition, an optimality criteria is found for each estimate of the number of segments within the range. Using the optimality criteria, the optimal number of subwords are determined. From the location of the boundaries and the optimal number of segments, data can be clustered or speech can be segmented. The method can be used in data processing systems, speaker verification, medium size vocabulary speech recognition systems, language identification systems and coarse subword level speech segmentation processes.
-
Citations
26 Claims
-
1. A method for segmenting speech without knowledge of linguistic information into a plurality of segments comprising the steps of:
-
estimating a range of a number of said segments in said speech; dynamically determining locations of boundaries for each estimate of a number of said segments K within said range of said number of said segments; determining an optimality criterion Qk for each of said estimate of said number of segments K from said location of said boundaries; determining an optimal number of segments K0 in said speech from said optimality criterion Qk ; segmenting said speech into said optimal number of segments K0 ; and storing said optimal number of segments K0. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for segmenting speech without knowledge of linguistic information into a plurality of segments comprising:
-
means for estimating a range of a number of said segments in said speech; means for dynamically determining locations of boundaries for each estimate of a number of said segments K within said range of said number of said segments; means for determining an optimality criterion Qk for each of said estimate of said number of segments K from said location of said boundaries; means for determining an optimal number of segments K0 in said speech from said optimality criterion Qk ; means for segmenting said speech into said optimal number of segments K0 ; means for storing said optimal number of segments K0 ; and means for modeling said speech based on said optimal number of segments K0. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system for speaker verification without knowledge of linguistic information for speech spoken by said speaker comprising:
-
means for extracting at least one spectral feature vector from first speech; means for segmenting said extracted feature vector by estimating a range of a number of said segments in said extracted feature vector; means for dynamically determining locations of boundaries for each estimate of numbers of said segments K within said range of said number of said segments; means for determining an optimality criterion Qk for each of said estimate of said number of segments K from said location of said boundaries for determining an optimal number of segments K0 in said first speech from said optimal criterion Qk ; means for segmenting said first speech into said optimal number of segments; means for storing said boundaries and said optimal number of segments as segmentation parameters; means for determining a first subword model from said segmentation parameters of said first speech; means for determining a second subword model from said optimal number of segments; means for storing said first subword model and said second subword model; means for extracting at least one second feature vector from a second speech sample; means for segmenting said second feature vector into said optimal number of segments using said stored segmentation parameters; means for recognizing the segmented second speech sample from said stored first subword model and said second subword model to produce recognized output; and means for determining from said recognized output whether to accept or reject said speaker. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A system for speech recognition of user defined vocabulary words comprising:
-
means for estimating a range of a number of said subwords in a first vocabulary word; means for dynamically determining locations of boundaries for each estimate of a number of said subwords within said range of said number of said subwords; means for determining an optimality criterion Qk for each of said estimate of said number of subwords from said location of said boundaries; means for determining an optimal number of subwords in said vocabulary word from said optimality criterion Qk ; means for modeling said subwords with a classifier to determine a plurality of word models for said vocabulary word; means for storing said word models; and means for recognizing a second vocabulary word from said stored word models. - View Dependent Claims (23)
-
-
24. A system for recognizing a language comprising:
-
means for estimating a range of a number of said subwords in first speech of said language; means for dynamically determining locations of boundaries for each estimate of a number of said subwords within said range of said number of said subwords; means for determining an optimal criterion Qk for each of said estimate of said number of subwords from said location of said boundaries; means for determining an optimal number of subwords in said first speech from said optimality criterion Qk ; means for modeling said subwords with a classifier to determine a language model; means for storing said language model; and means for recognizing a language of a second speech sample from said stored language model. - View Dependent Claims (25)
-
-
26. A system for phonetic transcription comprising:
-
means for estimating a range of a number of said subwords in a first speech sample; means for dynamically determining locations of boundaries for each estimate of a number of said subwords within said range of said number of said subwords; means for determining an optimal criterion Qk for each of said estimate of said number of subwords from said location of said boundaries; means for determining an optimal number of subwords in said speech from said optimality criterion Qk ; and means for storing said boundary locations, wherein said boundary locations are used in subsequent phonetic transcription.
-
Specification