Segmentation approach for speech recognition systems
First Claim
1. A method for automatically determining a set of phonetic units from a body of utterance data, the method comprising the computer-implemented steps of:
- receiving the body of utterance data;
determining a first set of candidate phonetic units from the body of utterance data;
determining a set of no-cross regions from the body of utterance data wherein the no-cross regions correspond to a time span of utterance data having a high probability of containing a boundary between phonetic units;
filtering the first set of candidate phonetic units to generate a subset of candidate phonetic units therefrom wherein the filtering analyzes the candidate phonetic units to determine if the candidate spans a no-cross region for the utterance data such that the subset omits candidate phonetic units which spanned a no-cross region.
5 Assignments
0 Petitions
Accused Products
Abstract
Phonetic units are identified in a body of utterance data according to a novel segmentation approach. A body of received utterance data is processed and a set of candidate phonetic unit boundaries is determined that defines a set of candidate phonetic units. The set of candidate phonetic unit boundaries is determined based upon changes in Cepstral coefficient values, changes in utterance energy, changes in phonetic classification, broad category analysis (retroflex, back vowels, front vowels) and sonorant onset detection. The set of candidate phonetic unit boundaries is filtered by priority and proximity to other candidate phonetic units and by silence regions. The set of candidate phonetic units is filtered using no-cross region analysis to generate a set of filtered candidate phonetic units. No-cross region analysis generally involves discarding candidate phonetic units that completely span an energy up, energy down, dip or broad category type no-cross region. Finally, a set of phonetic units is selected from the set of filtered candidate phonetic units based upon the probabilities of candidate boundaries defining the ends of the unit and within the unit.
-
Citations
45 Claims
-
1. A method for automatically determining a set of phonetic units from a body of utterance data, the method comprising the computer-implemented steps of:
-
receiving the body of utterance data;
determining a first set of candidate phonetic units from the body of utterance data;
determining a set of no-cross regions from the body of utterance data wherein the no-cross regions correspond to a time span of utterance data having a high probability of containing a boundary between phonetic units;
filtering the first set of candidate phonetic units to generate a subset of candidate phonetic units therefrom wherein the filtering analyzes the candidate phonetic units to determine if the candidate spans a no-cross region for the utterance data such that the subset omits candidate phonetic units which spanned a no-cross region. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable medium carrying one or more sequences or one or more instructions for automatically determining a set of phonetic units from a body of utterance data, the one or more sequences or one or more instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
receiving the body of utterance data;
determining a first set of candidate phonetic units from the body of utterance data;
determining a set of no-cross regions from the body of utterance data wherein the no-cross regions correspond to a time span of utterance data having a high probability of containing a boundary between phonetic units;
filtering the first set of candidate phonetic units to generate a subset of candidate phonetic units therefrom wherein the filtering analyzes the candidate phonetic units to determine if the candidate spans a no-cross region for the utterance data such that the subset omits candidate phonetic units which spanned a no-cross region. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A speech recognition system for automatically determining a set of phonetic units from a body of utterance data, the speech recognition system comprising:
-
one or more processors; and
a memory communicatively coupled to the one or more processors, wherein the memory includes one or more sequences or one or more instructions which, when executed by the one or more processors, cause the one or more processors to perform the steps of;
receiving the body of utterance data;
determining a first set of candidate phonetic units from the body of utterance data;
determining a set of no-cross regions from the body of utterance data wherein the no-cross regions correspond to a time span of utterance data having a high probability of containing a boundary between phonetic units;
filtering the first set of candidate phonetic units to generate a subset of candidate phonetic units therefrom wherein the filtering analyzes the candidate phonetic units to determine if the candidate spans a no-cross region for the utterance data such that the subset omits candidate phonetic units which spanned a no-cross region. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
Specification