Mobile phone having speaker dependent voice recognition method and apparatus
First Claim
1. An apparatus for performing speech recognition in a communication terminal with a voice dialing function, comprising:
- a memory having a first region for registration of feature data with respect to an input voice, a second region for storing a number of trials upon every recognition with respect to the feature data, a third region for storing an accumulative mean value with respect to a series of threshold values obtained from a corresponding number of trials, stored in the second region to and through the preceding number of trials, and a fourth region for storing a specified threshold value;
a vocoder for generating packet data according to an input voice;
a voice recognition means for analyzing the packet data currently provided from the vocoder to thereby generate corresponding feature data, comparing the generated feature data with feature data of reference voices pre-registered in the memory to thereby search any similar data, and if it is searched the similar data, then outputting an index of the searched feature data and a difference value between the generated feature data and the registered feature data; and
a controller for comparing the difference value outputted from the voice recognition means with a predetermined threshold value, so that if the difference value is less than the threshold value, then the feature data corresponding to the index are read out from the memory and delivered to the vocoder, calculating an accumulative mean value of threshold values for every trial of recognition with respect to the feature data to and through the present time, the accumulative mean value being stored in the third region of the memory, and by reflecting the accumulative mean value into the threshold value, updating the threshold value stored in the fourth region of the memory.
0 Assignments
0 Petitions
Accused Products
Abstract
An apparatus and method for performing improved speech recognition in a communication terminal, e.g., a mobile phone with a hands-free voice dialing function. In a speech recognition mode, a user'"'"'s input speech such as a desired called party name, number or a phone command, is converted to feature data and compared to individual pre-stored feature data sets corresponding to pre-recorded speech obtained during a registration process. Difference values representing the respective differences between the current user'"'"'s input speech and the respective data sets are computed. A first closest (most similar) and second closest feature data set correspond to the first smallest and second smallest difference values so obtained. A closeness threshold is computed as the sum of a small, predetermined threshold and a differential value between the first and second difference values. If the first difference value is less than the computed closeness threshold, then the input speech is determined to match the first feature data set, whereby a positive speech recognition result is obtained. When a match occurs, an automatic dialing operation may be carried out in one application.
72 Citations
21 Claims
-
1. An apparatus for performing speech recognition in a communication terminal with a voice dialing function, comprising:
-
a memory having a first region for registration of feature data with respect to an input voice, a second region for storing a number of trials upon every recognition with respect to the feature data, a third region for storing an accumulative mean value with respect to a series of threshold values obtained from a corresponding number of trials, stored in the second region to and through the preceding number of trials, and a fourth region for storing a specified threshold value;
a vocoder for generating packet data according to an input voice;
a voice recognition means for analyzing the packet data currently provided from the vocoder to thereby generate corresponding feature data, comparing the generated feature data with feature data of reference voices pre-registered in the memory to thereby search any similar data, and if it is searched the similar data, then outputting an index of the searched feature data and a difference value between the generated feature data and the registered feature data; and
a controller for comparing the difference value outputted from the voice recognition means with a predetermined threshold value, so that if the difference value is less than the threshold value, then the feature data corresponding to the index are read out from the memory and delivered to the vocoder, calculating an accumulative mean value of threshold values for every trial of recognition with respect to the feature data to and through the present time, the accumulative mean value being stored in the third region of the memory, and by reflecting the accumulative mean value into the threshold value, updating the threshold value stored in the fourth region of the memory. - View Dependent Claims (2)
-
-
3. A method for performing speech recognition in a communication terminal equipment, comprising the steps of:
-
(a) entering a speech recognition mode;
(b) upon receipt of a voice input in the speech recognition mode, processing the voice input and transmitting the processed voice input to a speech recognition circuit;
(c) receiving from the voice recognition means previously stored first data, being most similar to the processed voice input, and previously stored second data, being second most similar thereto, together with first and second difference values corresponding to the first and second data respectively, and then calculating a new threshold value based on a differential value between the first and second difference values; and
(d) comparing the new threshold value with the first difference value, and if the first difference value is less than the new threshold value, generating an audible output of speech corresponding to said first data. - View Dependent Claims (4, 5, 6)
-
-
7. A speech recognition method in a communication terminal, comprising the steps of:
-
(a) entering a speech recognition mode;
(b) upon receipt of a voice input in the speech recognition mode, processing the voice input and transmitting the processed voice input to a speech recognition circuit;
(c) receiving from the speech recognition circuit previously stored first data that are most similar to a the processed voice input, and previously stored second data that are second most similar thereto, together with first and second difference values corresponding to the first and second data respectively;
(d) comparing a predetermined threshold value with the first difference value, and if the first difference value is less than the predetermined threshold value, then generating an audible output of speech corresponding to the first data; and
(e) calculating an accumulative mean value for preceding threshold values obtained from every last recognition with respect to the voice data, in order to compensate for an error resulting from selection of feature data of the corresponding voice data subsequently to the above step (d), and reflecting the calculated accumulative mean value into the present threshold value to thereby set a new threshold value, thereafter returning to the above step (b) of transmitting. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A method for performing speech recognition in a communication terminal, comprising the steps of:
-
(a) entering a speech recognition mode;
(b) upon receipt of a voice input in the speech recognition mode, processing the voice input in the form of packet data and then transmitting the processed voice input to a speech recognition circuit;
(c) receiving from the speech recognition circuit a first set of data being most similar to a pre-registered voice feature, and a second set of data being second most similar thereto, together with first and second difference values corresponding to the first and second set of data respectively;
(d) comparing a predetermined threshold value with the first difference value, so that if the first difference value is less than the predetermined threshold value, then an audible tone responsive to a corresponding voice data is reproduced in a speaker; and
(e) calculating an accumulative mean value for preceding threshold values obtained in every past recognition with respect to all the recorded voice data, in order to compensate for an error resulting from a diversity of users subsequently to reproduction in the above step (d), and adding the calculated accumulative mean value multiplied by a weighted value to the present threshold value to thereby set up a new threshold value, then returning to the above step (b). - View Dependent Claims (13, 14, 15, 16)
-
-
17. A method for performing speech recognition in a communication terminal, comprising the steps of:
-
(a) entering a speech recognition mode;
(b) upon receipt of a voice input in the speech recognition mode, processing the voice input in the form of packet data and then transmitting the processed voice input to a speech recognition circuit;
(c) receiving from the speech recognition circuit a first set of data being most similar to a pre-registered voice feature, and a second set of data being secondly similar thereto, together with first and second difference values corresponding to the first and second set of data respectively;
(d) comparing a predetermined threshold value with the first difference value, so that if the first difference value is less than the predetermined threshold value, then an audible tone responsive to a corresponding voice data is reproduced in a speaker; and
(e) determining whether a response from a user is detected, and calculating a rate of recognition error upon absence of a said response, and if the calculated rate of recognition error is less than a predetermined reference value, then returning to the step (b) for re-recognition of the voice input. - View Dependent Claims (18, 19, 20, 21)
-
Specification