Speech recognition system including manner discrimination
First Claim
Patent Images
1. A method of performing speech recognition comprising the steps of:
- receiving acoustic spoken input;
processing said acoustic input by performing speech recognition to determine (i) a text equivalent; and
(ii) a manner in which said spoken input was rendered; and
performing a further operation, dependent on the manner in which said spoken input was rendered.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition system is trained to be sensitive not only to the actual spoken text, but also to the manner in which the text is spoken, for example, whether something is said confidently, or hesitatingly. In the preferred embodiment, this is achieved by using a Hidden Markov Model (HMM) as the recognition engine, and training the HMM to recognise different styles of input. This approach finds particular application in the telephony voice processing environment, where short caller responses need to be recognised, and the system can then react in a fashion appropriate to the tone or manner in which the caller has spoken.
-
Citations
24 Claims
-
1. A method of performing speech recognition comprising the steps of:
-
receiving acoustic spoken input;
processing said acoustic input by performing speech recognition to determine (i) a text equivalent; and
(ii) a manner in which said spoken input was rendered; and
performing a further operation, dependent on the manner in which said spoken input was rendered. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A speech recognition system comprising:
-
means for receiving an acoustic spoken input;
means for processing said acoustic input by performing speech recognition to determine (i) a text equivalent; and
(ii) a manner in which said spoken input was rendered; and
means for performing a further operation, dependent on the manner in which said spoken input was rendered. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A voice processing system, comprising:
-
a speech recognition system, comprising;
means for receiving an acoustic spoken input;
means for processing said acoustic input by performing speech recognition to determine (i) a text equivalent; and
(ii) a manner in which said spoken input was rendered; and
means for performing a further operation, dependent on the manner in which said spoken input was rendered;
wherein said voice processing system is connected to a telephone network, and said spoken input is received over the telephone network. - View Dependent Claims (19)
-
-
20. A method of training a speech recognition system including a Hidden Markov Model (HMM) comprising the steps of:
-
collecting samples of acoustic spoken input data of a particular text;
marking for each sample the manner in which the text was spoken; and
training the HMM to discriminate acoustic spoken input data according to the manner in which it is spoken.
-
-
21. A method of performing speech recognition comprising the steps of:
-
receiving acoustic spoken input;
processing said acoustic input by performing speech recognition, in accordance with at least a portion of the acoustic spoken input and two or more acoustic models, to determine;
(i) a text equivalent; and
(ii) an emotional manner in which said spoken input was rendered, wherein the acoustic characteristic of each model is representative of substantially the same text equivalent; and
performing a further operation, dependent on the manner in which said spoken input was rendered.
-
-
22. A speech recognition system comprising:
-
means for receiving acoustic spoken input;
means for processing said acoustic input by performing speech recognition, in accordance with at least a portion of the acoustic spoken input and two or more acoustic models, to determine;
(i) a text equivalent; and
(ii) an emotional manner in which said spoken input was rendered, wherein the acoustic characteristic of each model is representative of substantially the same text equivalent; and
means for performing a further operation, dependent on the manner in which said spoken input was rendered.
-
-
23. A voice processing system, comprising:
-
a speech recognition system, comprising;
means for receiving an acoustic spoken input;
means for processing said acoustic input by performing speech recognition, in accordance with at least a portion of the acoustic spoken input and two or more acoustic models, to determine;
(i) a text equivalent; and
(ii) an emotional manner in which said spoken input was rendered, wherein the acoustic characteristic of each model is representative of substantially the same text equivalent; and
means for performing a further operation, dependent on the manner in which said spoken input was rendered;
wherein said voice processing system is connected to a telephone network, and said spoken input is received over the telephone network.
-
-
24. A method of training a speech recognition system including a Hidden Markov Model (HMM) comprising the steps of:
-
collecting samples of acoustic spoken input data of a particular text;
marking for each sample the emotional manner in which the text was spoken; and
training the HMM to discriminate acoustic spoken input data according to the manner in which it is spoken such that the speech recognition system is capable of outputting a text equivalent of the acoustic spoken input data.
-
Specification