Speech recognition system including manner discrimination

US 6,671,668 B2
Filed: 12/20/2002
Issued: 12/30/2003
Est. Priority Date: 03/19/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method of performing speech recognition comprising the steps of:

receiving acoustic spoken input;

processing said acoustic input by performing speech recognition to determine (i) a text equivalent; and

(ii) a manner in which said spoken input was rendered; and

performing a further operation, dependent on the manner in which said spoken input was rendered.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system is trained to be sensitive not only to the actual spoken text, but also to the manner in which the text is spoken, for example, whether something is said confidently, or hesitatingly. In the preferred embodiment, this is achieved by using a Hidden Markov Model (HMM) as the recognition engine, and training the HMM to recognise different styles of input. This approach finds particular application in the telephony voice processing environment, where short caller responses need to be recognised, and the system can then react in a fashion appropriate to the tone or manner in which the caller has spoken.

Citations

24 Claims

1. A method of performing speech recognition comprising the steps of:
- receiving acoustic spoken input;
  
  processing said acoustic input by performing speech recognition to determine (i) a text equivalent; and
  
  (ii) a manner in which said spoken input was rendered; and
  
  performing a further operation, dependent on the manner in which said spoken input was rendered.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein there is a predetermined set of available manners, and said processing step determines which manner from said predetermined set of available manners best corresponds to the manner in which said spoken input was rendered.
  - 3. The method of claim 2, wherein said processing step is performed using a Hidden Markov Model (HMM) which has been trained on said predetermined set of available manners.
  - 4. The method of claim 3 wherein said spoken input is received over a telephone connection.
  - 5. The method of claim 4, wherein said spoken input is received as part of a voice processing operation, and said step of performing a further operation, dependent on the manner in which said spoken input was rendered, comprises moving to a different part of a voice processing menu hierarchy, dependent on the manner in which said spoken input was rendered.
  - 6. The method of claim 5, wherein said spoken input comprises a single word.
  - 7. The method of claim 6, wherein said processing step further comprises determining a confidence level associated with the recognition of the text equivalent.
  - 8. The method of claim 1 wherein said spoken input is received over a telephone connection.
  - 9. The method of claim 1, wherein said spoken input comprises a single word.
  - 10. The method of claim 9, wherein said processing step further comprises determining a confidence level associated with the recognition of the text equivalent.
  - 11. The method of claim 1, wherein said processing step further comprises determining a confidence level associated with the recognition of the text equivalent.

12. A speech recognition system comprising:
- means for receiving an acoustic spoken input;
  
  means for processing said acoustic input by performing speech recognition to determine (i) a text equivalent; and
  
  (ii) a manner in which said spoken input was rendered; and
  
  means for performing a further operation, dependent on the manner in which said spoken input was rendered.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The system of claim 12, wherein there is a predetermined set of available manners, and it is determined which manner from said predetermined set of available manners best corresponds to the manner in which said spoken input was rendered.
  - 14. The system of claim 13, wherein said processing means includes a Hidden Markov Model (HMM) which has been trained on said predetermined set of available manners.
  - 15. The system of claim 14, wherein said spoken input comprises a single word.
  - 16. The system of claim 15, wherein said processing means further determines a confidence level associated with the recognition of the text equivalent.
  - 17. The system of claim 12, wherein said processing means further determines a confidence level associated with the recognition of the text equivalent.

18. A voice processing system, comprising:
- a speech recognition system, comprising;
  
  means for receiving an acoustic spoken input;
  
  means for processing said acoustic input by performing speech recognition to determine (i) a text equivalent; and
  
  (ii) a manner in which said spoken input was rendered; and
  
  means for performing a further operation, dependent on the manner in which said spoken input was rendered;
  
  wherein said voice processing system is connected to a telephone network, and said spoken input is received over the telephone network.
- View Dependent Claims (19)
- - 19. The voice processing system of claim 18, wherein said performing means comprises a voice processing application running on the voice processing system which moves to a different part of a voice processing menu hierarchy, dependent on the manner in which said spoken input was rendered.

20. A method of training a speech recognition system including a Hidden Markov Model (HMM) comprising the steps of:
- collecting samples of acoustic spoken input data of a particular text;
  
  marking for each sample the manner in which the text was spoken; and
  
  training the HMM to discriminate acoustic spoken input data according to the manner in which it is spoken.

21. A method of performing speech recognition comprising the steps of:
- receiving acoustic spoken input;
  
  processing said acoustic input by performing speech recognition, in accordance with at least a portion of the acoustic spoken input and two or more acoustic models, to determine;
  
  (i) a text equivalent; and
  
  (ii) an emotional manner in which said spoken input was rendered, wherein the acoustic characteristic of each model is representative of substantially the same text equivalent; and
  
  performing a further operation, dependent on the manner in which said spoken input was rendered.

22. A speech recognition system comprising:
- means for receiving acoustic spoken input;
  
  means for processing said acoustic input by performing speech recognition, in accordance with at least a portion of the acoustic spoken input and two or more acoustic models, to determine;
  
  (i) a text equivalent; and
  
  (ii) an emotional manner in which said spoken input was rendered, wherein the acoustic characteristic of each model is representative of substantially the same text equivalent; and
  
  means for performing a further operation, dependent on the manner in which said spoken input was rendered.

23. A voice processing system, comprising:
- a speech recognition system, comprising;
  
  means for receiving an acoustic spoken input;
  
  means for processing said acoustic input by performing speech recognition, in accordance with at least a portion of the acoustic spoken input and two or more acoustic models, to determine;
  
  (i) a text equivalent; and
  
  (ii) an emotional manner in which said spoken input was rendered, wherein the acoustic characteristic of each model is representative of substantially the same text equivalent; and
  
  means for performing a further operation, dependent on the manner in which said spoken input was rendered;
  
  wherein said voice processing system is connected to a telephone network, and said spoken input is received over the telephone network.

24. A method of training a speech recognition system including a Hidden Markov Model (HMM) comprising the steps of:
- collecting samples of acoustic spoken input data of a particular text;
  
  marking for each sample the emotional manner in which the text was spoken; and
  
  training the HMM to discriminate acoustic spoken input data according to the manner in which it is spoken such that the speech recognition system is capable of outputting a text equivalent of the acoustic spoken input data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Harris, Robert
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Storm, Donald L.

Application Number

US10/326,748
Publication Number

US 20030088409A1
Time in Patent Office

375 Days
Field of Search

704/235, 704/251, 704/256, 704/270, 704/275, 704/246, 379/88.01
US Class Current

704/246
CPC Class Codes

G10L 17/26 Recognition of special voic...

Speech recognition system including manner discrimination

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system including manner discrimination

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links