Orthogonalized dictionary speech recognition apparatus and method thereof
First Claim
1. A speech recognition system, comprising:
- acoustic analyzing means for converting input speech into an electrical signal and obtaining speech pattern data upon acoustic analysis of said electrical signal;
means for detecting a speech interval of the electrical signal;
means for generating sampling pattern data by extracting a predetermined number of samples from speech pattern data included in the detected speech interval;
means for prestoring sampling pattern data of a plurality of speakers for categories of speech to be recognized, said sampling pattern data including learning pattern data;
means for forming orthogonalized dictionary data for each speaker on the basis of the sampling pattern data, said forming means forming averaged pattern data of a plurality of sampling pattern data obtained from each speaker;
means for forming dictionary data of a first axis by smoothing the averaged pattern data in a time base direction;
means for forming dictionary data of a second axis orthogonal to the first axis by differentiating the averaged pattern data in the time base direction;
an orthogonalized dictionary for storing the dictionary data of the first and second axes as orthogonal dictionary data;
means for forming additional orthogonal dictionary data representing feature variations in speech of each speaker and orthogonal to the orthogonal dictionary data stored in said orthogonalized dictionary in accordance with sampling pattern data of each of a second and subsequent of said plurality of speakers on the basis of the orthogonal dictionary data obtained with respect to a first of said plurality of speakers;
means for selectively storing the additional orthogonal dictionary data in said orthogonalized dictionary;
means for computing a similarity value between the orthogonal dictionary data stored in said orthogonalized dictionary and the sampling pattern data formed by said sampling pattern data generating means; and
means for recognizing input speech on the basis of the similarity value.
2 Assignments
0 Petitions
Accused Products
Abstract
Speech pattern data representing speech of a plurality of speakers are stored in a pattern storage section in advance. Averaged pattern data obtained by averaging a plurality of speech pattern data of the first of the plurality of speakers are obtained. Data obtained by blurring and differentiating the averaged pattern data are stored in an orthogonalized dictionary as basic orthogonalized dictionary data of first and second axes, respectively. Blurred data and differentiated data obtained with respect to the second and subsequent of the plurality of speakers are selectively stored in the orthogonalized dictionary as additional dictionary data having new axes. Speech of the plurality of speakers is recognized by computing a similarity between the orthogonalized dictionary formed in this manner and input speech.
20 Citations
24 Claims
-
1. A speech recognition system, comprising:
-
acoustic analyzing means for converting input speech into an electrical signal and obtaining speech pattern data upon acoustic analysis of said electrical signal; means for detecting a speech interval of the electrical signal; means for generating sampling pattern data by extracting a predetermined number of samples from speech pattern data included in the detected speech interval; means for prestoring sampling pattern data of a plurality of speakers for categories of speech to be recognized, said sampling pattern data including learning pattern data; means for forming orthogonalized dictionary data for each speaker on the basis of the sampling pattern data, said forming means forming averaged pattern data of a plurality of sampling pattern data obtained from each speaker; means for forming dictionary data of a first axis by smoothing the averaged pattern data in a time base direction; means for forming dictionary data of a second axis orthogonal to the first axis by differentiating the averaged pattern data in the time base direction; an orthogonalized dictionary for storing the dictionary data of the first and second axes as orthogonal dictionary data; means for forming additional orthogonal dictionary data representing feature variations in speech of each speaker and orthogonal to the orthogonal dictionary data stored in said orthogonalized dictionary in accordance with sampling pattern data of each of a second and subsequent of said plurality of speakers on the basis of the orthogonal dictionary data obtained with respect to a first of said plurality of speakers; means for selectively storing the additional orthogonal dictionary data in said orthogonalized dictionary; means for computing a similarity value between the orthogonal dictionary data stored in said orthogonalized dictionary and the sampling pattern data formed by said sampling pattern data generating means; and means for recognizing input speech on the basis of the similarity value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A speech recognition apparatus for computing a similarity between an input speech pattern obtained by analyzing input speech and an orthogonalized dictionary formed on the basis of learning patterns acquired from a plurality of speakers in advance, and for recognizing the input speech based on the computed similarity comprising:
-
means for obtaining an averaged pattern of a plurality of learning patterns obtained from each speaker, and obtaining a blurred pattern and a differential pattern from the averaged pattern; and means for determining an orthogonal axis on which an orthogonalized dictionary is based from the blurred and differentiated patterns obtained from a learning pattern of a first of said plurality of speakers, determining a new axis orthogonal to an axis of the dictionary, which has already been stored, from the blurred and differentiated patterns obtained from learning patterns of second and subsequent of said plurality of speakers, and determining whether the dictionary of the new axis is stored, thereby forming the orthogonalized dictionary.
-
-
12. A speech recognition system for a plurality of speakers, comprising:
-
means for converting input speech from a plurality of speakers in to an electrical signal; means for performing acoustic analysis of the electrical signal; means for obtaining sampling pattern data from said electrical signal upon which the acoustic analysis has been performed; means for obtaining first averaged pattern data from a plurality of sampling pattern data of a first of the plurality of speakers, and forming dictionary data of first and second axes from the first averaged pattern data; orthogonalized dictionary means for storing the dictionary data of the first and second axes; means for obtaining second average pattern data from a plurality of sampling pattern data of at least one of a second and subsequent of said plurality of speakers; means for obtaining additional dictionary data having an axis different from the first and second axes on the basis of the second averaged pattern data; means for storing the additional data in said orthogonalized dictionary means; and means for recognizing the input speech by using the dictionary data stored in said orthogonalized dictionary means.
-
-
13. A speech recognition method, comprising the steps of:
-
converting input speech into an electrical signal and obtaining speech pattern data upon acoustic analysis of said electrical signal; detecting a speech interval of the electrical signal; generating sampling pattern data by extracting a predetermined number of samples from speech pattern data included in the detected speech interval, said sampling pattern data including learning pattern data; prestoring sampling pattern data of a plurality of speakers for categories of speech to be recognized; forming orthogonalized dictionary data for each speaker to be stored in said orthogonalized dictionary on the basis of the sampling pattern data, by forming averaged pattern data of a plurality of sampling pattern data obtained from each speaker; forming dictionary data of a first axis by smoothing the averaged pattern data in a time base direction; forming dictionary data of a second axis orthogonal to the first axis by differentiating the averaged pattern data in the time base direction; storing the dictionary data of the first and second axes in an orthogonalized dictionary; forming additional dictionary data representing feature variations in speech of each speaker and being orthogonal to the dictionary data stored in said orthogonalized dictionary in accordance with sampling pattern data of each of a second and subsequent of said plurality of speakers on the basis of the dictionary data obtained with respect to a first of said plurality of speakers; selectively storing the additional dictionary data in said orthogonalized dictionary; computing a similarity value between the dictionary data stored in said orthogonalized dictionary and the sampling pattern data formed by said sampling pattern data; and recognizing input speech on the basis of the similarity value. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A speech recognition method wherein a similarity is computed between an input speech pattern obtained by analyzing input speech and an orthogonalized dictionary formed on the basis of learning patterns acquired from a plurality of speakers in advance, and the input speech is recognized based on the computed similarity, comprising the steps of:
-
obtaining an averaged pattern of a plurality of learning patterns obtained from each speaker, and obtaining a blurred pattern and a differentiated pattern from the averaged pattern; and determining an orthogonal axis on which an orthogonalized dictionary is based from the blurred and differentiated patterns obtained from a learning pattern of a first of said plurality of speakers, determining a new axis orthogonal to an axis of the dictionary, which has already been stored, from the blurred and differentiated patterns obtained from learning patterns of a second and subsequent of said plurality of speakers, and determining whether the dictionary of the new axis is stored, thereby forming the orthogonalized dictionary.
-
-
24. A speech recognition method for a plurality of speakers, comprising the steps of:
-
converting input speech from a plurality of speakers into an electrical signal; performing acoustic analysis of the electrical signal; obtaining sampling pattern data from an electrical signal upon which the acoustic analysis has been performed; obtaining averaged pattern data from a plurality of sampling pattern data of a first of the plurality of speakers, and forming dictionary data of first and second axes from the averaged pattern data; storing the dictionary data of the first and second axes in an orthogonalized dictionary; obtaining second average pattern data from a plurality of sampling pattern data of at least a second of said plurality of speakers; obtaining additional dictionary data having an axis different from the first and second axes on the basis of the second averaged pattern data; storing the additional data in said orthogonalized dictionary means; and determining the input speech by using the dictionary data stored in said orthogonalized dictionary means.
-
Specification