Tone based speech recognition
First Claim
Patent Images
1. A system for speech recognition comprising:
- an input terminal for receiving a segment of speech;
a speech classifier having an input coupled to the input terminal and an output to provide an indication of whether the speech segment comprises voiced or unvoiced speech;
a speech feature detector having a first input coupled to the input terminal, a second input coupled to the output of the of the speech classifier, and an output to provide a speech feature vector having a plurality of feature values indicating features of the speech segment, the speech feature vector including at least a tonal feature value indicating a tonal feature of the speech segment when the speech segment comprises voiced speech; and
a speech recogniser having an input coupled to the output of the speech feature detector and an output to provide an indication of which of a predetermined plurality of speech models is a good match to the speech segment.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for speech recognition involves classifying (38) a digitized speech segment according to whether the speech segment comprises voiced or unvoiced speech and utilizing that classification to generate tonal feature vectors (41) of the speech segment when the speech is voiced. The tonal feature vectors are then combined (42) with other non-tonal feature vectors (40) to provide speech feature vectors. The speech feature vectors are compared (35) with previously stored models of speech feature vectors (37) for different segments of speech to determine which previously stored model is a most likely match for the segment to be recognized.
-
Citations
17 Claims
-
1. A system for speech recognition comprising:
-
an input terminal for receiving a segment of speech;
a speech classifier having an input coupled to the input terminal and an output to provide an indication of whether the speech segment comprises voiced or unvoiced speech;
a speech feature detector having a first input coupled to the input terminal, a second input coupled to the output of the of the speech classifier, and an output to provide a speech feature vector having a plurality of feature values indicating features of the speech segment, the speech feature vector including at least a tonal feature value indicating a tonal feature of the speech segment when the speech segment comprises voiced speech; and
a speech recogniser having an input coupled to the output of the speech feature detector and an output to provide an indication of which of a predetermined plurality of speech models is a good match to the speech segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
a pitch extractor having an input coupled to the first input of the tonal speech transformation circuit and an output; and
a tone generator having a first input coupled to the output of the pitch extractor and an output coupled to the output of the tonal speech transformation circuit to provide the transformed tonal signal indicative of the tone of the speech segment.
-
-
9. A system for speech recognition according to claim 8, wherein the tone generator has a second input coupled to the second input of the tonal speech transformation circuit.
-
10. A method of speech recognition comprising the steps of:
-
receiving a segment of speech;
classifying the speech segment according to whether the speech segment comprises voiced or unvoiced speech;
detecting a plurality of speech features of the segment of speech;
generating a speech feature vector having a plurality of feature values indicating the detected plurality of features of the speech segment, wherein the speech feature vector includes at least a tonal feature value indicating a tonal feature of the speech segment when the speech segment comprises voiced speech; and
utilising the speech vector to determine which of a predetermined plurality of speech models is a good match to the speech segment. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
training the predetermined plurality of speech models using the speech feature vector; and
storing the predetermined plurality of speech models after the predetermined plurality of speech models have been trained.
-
-
14. A method of speech recognition according to claim 10, wherein the step of detecting a plurality of speech features comprises the steps of:
-
generating at least one non-tonal feature value for the speech segment;
generating at least one tonal feature value for the speech segment when the speech classifier determines that the speech segment comprises voiced speech; and
combining the at least one non-tonal feature value and the at least one tonal feature value to provide the speech feature vector.
-
-
15. A method of speech recognition according to claim 14, wherein the step of detecting at least one non-tonal feature value comprises the steps of:
-
transforming the speech segment using at least a first transformation to provide a transformed non-tonal signal; and
generating the at least one non-tonal feature value from the transformed non-tonal signal.
-
-
16. A method of speech recognition according to claim 14, wherein the step of detecting at least one tonal feature value comprises the steps of:
-
transforming the speech segment using at least a second transformation to provide a transformed tonal signal; and
generating the at least one tonal feature value from the transformed tonal signal.
-
-
17. A method of speech recognition according to claim 16, wherein the step of transforming the speech segment comprises the steps of:
-
extracting pitch information from the speech segment; and
generating the transformed tonal signal from the extracted pitch information.
-
Specification