Adaptation of a speech recognition system across multiple remote sessions with a speaker

US 6,766,295 B1
Filed: 05/10/1999
Issued: 07/20/2004
Est. Priority Date: 05/10/1999
Status: Expired due to Term

- Alert
- Pin

Associated Cases

Associated Defendants

First Claim

Patent Images

1. A method of adapting a speech recognition system, wherein the method comprises steps of:

a. obtaining an identification of a speaker;

b. obtaining a sample of a speaker'"'"'s speech during a first remote session;

c. recognizing the speaker'"'"'s speech utilizing the speech recognition system during the first remote session;

d. modifying the speech recognition system by incorporating the sample into the speech recognition system thereby forming a speaker-specific modified speech recognition system;

e. storing a representation of the speaker-specific modified speech recognition system in association with the identification of the speaker; and

f. using the representation of the speaker-specific modified speech recognition system to recognize speech during a subsequent remote session with the speaker.

View all claims

6 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Reexaminations

Accused Products

Abstract

A technique for adaptation of a speech recognizing system across multiple remote communication sessions with a speaker. The speaker can be a telephone caller. An acoustic model is utilized for recognizing the speaker'"'"'s speech. Upon initiation of a first remote session with the speaker, the acoustic model is speaker-independent. During the first session, the speaker is uniquely identified and speech samples are obtained from the speaker. In the preferred embodiment, the samples are obtained without requiring the speaker to engage in a training session. The acoustic model is then modified based upon the samples thereby forming a modified model. The model can be modified during the session or after the session is terminated. Upon termination of the session, the modified model is then stored in association with an identification of the speaker. During a subsequent remote session, the speaker is identified and, then, the modified acoustic model is utilized to recognize the speaker'"'"'s speech. Additional speech samples are obtained during the subsequent session and, then, utilized to further modify the acoustic model. In this manner, an acoustic model utilized for recognizing the speech of a particular speaker is cumulatively modified according to speech samples obtained during multiple sessions with the speaker. As a result, the accuracy of the speech recognizing system improves for the speaker even when the speaker only engages in relatively short remote sessions.

272 Citations

55 Claims

1. A method of adapting a speech recognition system, wherein the method comprises steps of:
- a. obtaining an identification of a speaker;
  
  b. obtaining a sample of a speaker'"'"'s speech during a first remote session;
  
  c. recognizing the speaker'"'"'s speech utilizing the speech recognition system during the first remote session;
  
  d. modifying the speech recognition system by incorporating the sample into the speech recognition system thereby forming a speaker-specific modified speech recognition system;
  
  e. storing a representation of the speaker-specific modified speech recognition system in association with the identification of the speaker; and
  
  f. using the representation of the speaker-specific modified speech recognition system to recognize speech during a subsequent remote session with the speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method according to claim 1 further comprising a step of cumulatively modifying the speech recognition system according to speech samples obtained during one or more remote sessions with the speaker.
  - 3. The method according to claim 1 wherein the speaker is a telephone caller.
  - 4. The method according to claim 1 wherein the step of modifying the speech recognition system comprises a step of modifying an acoustic model thereby forming a speaker-specific modified acoustic model and wherein the step of storing a representation of the speaker-specific modified speech recognition system comprises a step of storing a representation of the modified acoustic model.
  - 5. The method according to claim 4 wherein the representation of the speaker-specific modified acoustic model is a set of statistics which can be utilized to modify a pre-existing acoustic model.
  - 6. The method according to claim 4 wherein the representation of the speaker-specific modified acoustic model is a set of statistics which can be utilized to modify incoming acoustic speech.
  - 7. The method according to claim 1 further comprising a step of utilizing the speaker-specific modified speech recognition system during the first remote session with the speaker.
  - 8. The method according to claim 1 wherein the speech recognition system is speaker-independent prior to the first remote session.
  - 9. The method according to claim 1 wherein the step of modifying the speech recognition system is performed during the first remote session.
  - 10. The method according to claim 1 wherein the step of modifying the speech recognition system is performed after termination of the first remote session.
  - 11. The method according to claim 1 further comprising a step of authenticating the speaker'"'"'s identification by the speaker'"'"'s speech.
  - 12. The method according to claim 2 wherein the speech recognition system is speaker-independent prior to the first remote session.
  - 13. The method according to claim 2 wherein the step of modifying the speech recognition system is performed during the first remote session.
  - 14. The method according to claim 2 wherein the step of modifying the speech recognition system is performed after termination of the first remote session.
  - 15. The method according to claim 2 further comprising a step of authenticating the speaker'"'"'s identification by the speaker'"'"'s speech.

16. A method of adapting a speech recognition system, wherein the method comprises steps of:
- a. obtaining an identification of a cluster of speakers;
  
  b. obtaining a sample of a speaker'"'"'s speech during a first remote session;
  
  c. recognizing the speaker'"'"'s speech utilizing the speech recognition system during the first remote session;
  
  d. modifying the speech recognition system by incorporating the sample into the speech recognition system thereby forming a cluster-specific modified speech recognition system;
  
  e. storing a representation of the cluster-specific modified speech recognition system in association with the identification of a cluster of speakers wherein the speaker is a member of the cluster; and
  
  f. using the representation of the cluster-specific modified speech recognition system to recognize speech during a subsequent remote session with a member of the cluster of speakers.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 17. The method according to claim 16 further comprising a step of cumulatively modifying the speech recognizing system according to speech samples obtained during one or more remote sessions with one or more members of the cluster of speakers.
  - 18. The method according to claim 16 wherein the speaker is a telephone caller.
  - 19. The method according to claim 16 wherein the step of modifying the speech recognition system comprises a step of modifying an acoustic model thereby forming a cluster-specific modified acoustic model and wherein the step of storing a representation of the cluster-specific modified speech recognition system comprises a step of storing a representation of the cluster-specific modified acoustic model.
  - 20. The method according to claim 19 wherein the representation of the cluster-specific modified acoustic model is a set of statistics which can be utilized to modify a pre-existing acoustic model.
  - 21. The method according to claim 19 wherein the representation of the cluster-specific modified acoustic model is a set of statistics which can be utilized to modify incoming acoustic speech.
  - 22. The method according to claim 16 further comprising a step of utilizing the cluster-specific modified speech recognition system during the first remote session with the speaker.
  - 23. The method according to claim 16 wherein the speech recognition system is speaker-independent prior to the first remote session.
  - 24. The method according to claim 16 wherein the step of modifying the speech recognition system is performed during the first remote session.
  - 25. The method according to claim 16 wherein the step of modifying the speech recognition system is performed after termination of the first remote session.
  - 26. The method according to claim 17 wherein the speech recognition system is speaker-independent prior to the first remote session.
  - 27. The method according to claim 17 wherein the step of modifying the speech recognition system is performed during the first remote session.
  - 28. The method according to claim 17 wherein the step of modifying the speech recognition system is performed after termination of the first remote session.
  - 29. The method according to claim 17 further comprising a step of authenticating the speaker'"'"'s identification by the speaker'"'"'s speech.

30. A method of adapting a speech recognition system, wherein the method comprises steps of:
- a. obtaining an identification of each of a plurality of speakers during a corresponding first remote session with each speaker;
  
  b. obtaining a sample of speech made by each of the plurality of speakers during a corresponding first remote session with each speaker;
  
  c. recognizing speech made by each speaker during the corresponding first remote session utilizing the speech recognition system configured to be speaker-independent;
  
  d. modifying the speech recognition system by individually incorporating the sample from each speaker into the speech recognition system thereby forming a speaker-specific modified speech recognition system corresponding to each speaker;
  
  e. storing a representation of the speaker-specific modified speech recognition system corresponding to each speaker in association with the identification of the corresponding speaker; and
  
  f. using the representation of the speaker-specific modified speech recognition system corresponding to a speaker to recognize speech during a subsequent remote session with the speaker.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
- - 31. The method according to claim 30 further comprising a step of cumulatively modifying the speech recognition system for each speaker according to speech samples obtained during one or more remote sessions with the corresponding speaker.
  - 32. The method according to claim 30 wherein each of the plurality of speakers is a telephone caller.
  - 33. The method according to claim 30 wherein the step of modifying the speech recognition system comprises a step of modifying an acoustic model thereby forming a speaker-specific modified acoustic model corresponding to each speaker and wherein the step of storing a representation of the modified speech recognition system comprises a step of storing a representation of the modified acoustic model corresponding to each speaker.
  - 34. The method according to claim 33 wherein the representation of the speaker-specific modified acoustic model corresponding to each speaker is a set of statistics which can be utilized to modify a pre-existing acoustic model.
  - 35. The method according to claim 33 wherein the representation of the speaker-specific modified acoustic model corresponding to each speaker is a set of statistics which can be utilized to modify incoming acoustic speech.
  - 36. The method according to claim 30 further comprising a step of utilizing the speaker-specific modified speech recognition system corresponding to each speaker during the first remote session with the corresponding speaker.
  - 37. The method according to claim 30 wherein the step of modifying the speech recognition system for each speaker is performed during the first remote session with the corresponding speaker.
  - 38. The method according to claim 30 wherein the step of modifying the speech recognition system for each speaker is performed after termination of the first remote session with the corresponding speaker.
  - 39. The method according to claim 30 further comprising a step of authenticating each speaker'"'"'s identification by the speaker'"'"'s speech.
  - 40. The method according to claim 31 wherein the step of modifying the speech recognition system for each speaker is performed during the first remote session with the corresponding speaker.
  - 41. The method according to claim 31 wherein the step of modifying the speech recognition system for each speaker is performed after termination of the first remote session with the corresponding speaker.
  - 42. The method according to claim 31 further comprising a step of authenticating each speaker'"'"'s identification by the speaker'"'"'s speech.
  - 43. The method according to claim 30 further comprising a step of deleting the representation of the speaker-specific modified speech recognition system corresponding to a speaker.
  - 44. The method according to claim 43 wherein the step of deleting the representation of the speaker-specific modified speech recognition system corresponding to a speaker is performed when a predetermined period of time has elapsed since the corresponding speaker last engaged in a remote session.

45. A speech recognition system comprising:
- a. an interface coupled to receive a remote session from a speaker; and
  
  b. a processing system coupled to the interface to obtain an identification of the speaker and to recognize the speaker'"'"'s speech wherein the processing system is cumulatively modified by incorporating speech samples obtained during a plurality of remote sessions with the speaker into the speech recognition system, thereby forming a speaker-specific modified processing system associated with the identification of the speaker.
- View Dependent Claims (46, 47, 48, 49, 50)
- - 46. The speech recognition system according to claim 45 wherein the speaker is a telephone caller.
  - 47. The speech recognition system according to claim 45 wherein the processing system is modified by modifying an acoustic model, thereby forming a speaker-specific acoustic model.
  - 48. The speech recognition system according to claim 47 wherein the processing system includes a memory for storing the speaker-specific acoustic model in association with the identification of the telephone caller.
  - 49. The speech recognition system according to claim 48 wherein the memory stores a plurality of speaker-specific acoustic models, one for each of a plurality of telephone callers and wherein each speaker-specific acoustic model is stored in association with the identification of the corresponding telephone caller.
  - 50. The speech recognition system according to claim 49 wherein the selected ones of the plurality of speaker-specific acoustic models are deleted when a predetermined period of time has elapsed since the corresponding speaker last engaged in a remote session with the voice recognizer.

51. A method of adapting an acoustic model utilized for speech recognition, wherein the method comprises steps of:
- a. obtaining an identification of a speaker;
  
  b. obtaining a speech utterance from the speaker during a remote session;
  
  c. recognizing the speaker'"'"'s speech utilizing an acoustic model during the remote session;
  
  d. making a determination relative to the speech utterance; and
  
  e. only when indicated by the determination, performing steps of;
  
  i. modifying the acoustic model by incorporating the speech utterance into the acoustic model thereby forming a speaker-specific modified acoustic model; and
  
  ii. storing a representation of the speaker-specific modified acoustic model in association with the identification of the speaker.
- View Dependent Claims (52, 53, 54, 55)
- - 52. The method according to claim 51 wherein the step of making the determination assigns a confidence level to the speech utterance.
  - 53. The method according to claim 51 wherein the step of making the determination assigns a confidence level to each of a plurality of portions of the speech utterance.
  - 54. The method according to claim 51 wherein the step of making a determination determines a level of resources available for storing the representation of the speaker-specific modified acoustic model.
  - 55. The method according to claim 51 wherein the step of making a determination determines a level of processing resources available for performing the step of modifying the acoustic model.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Kannan, Ashvin, Murveit, Hy
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Azad, Abul K.

Application Number

US09/309,211
Time in Patent Office

1,898 Days
Field of Search

704/243, 704/244, 704/245, 704/246, 379/88.01, 379/88.02
US Class Current

704/243
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 2015/0638 Interactive procedures

Adaptation of a speech recognition system across multiple remote sessions with a speaker

First Claim

6 Assignments

Litigations

0 Petitions

Reexaminations

Accused Products

Abstract

272 Citations

55 Claims

Specification

Use Cases

Quick Links

Others

Adaptation of a speech recognition system across multiple remote sessions with a speaker

First Claim

6 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Reexaminations

Accused Products

Subscription Required

Abstract

272 Citations

55 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others