Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise

US 9,396,721 B2
Filed: 11/04/2011
Issued: 07/19/2016
Est. Priority Date: 04/24/2008
Status: Active Grant

First Claim

Patent Images

1. A system comprising at least one processor configured to:

analyze digital data representing sounds captured by at least one microphone from an operating environment to compute background noise information associated with the operating environment, wherein;

the at least one processor is configured to match the sounds captured from the operating environment to a background noise from a plurality of background noises, andthe background noise information comprises an identification of the background noise matching the sounds captured from the operating environment;

select, based at least in part on the background noise information associated with the operating environment, a voice dialog from a plurality of voice dialogs, wherein;

the at least one processor is configured to select, based at least in part on the background noise matching the sounds captured from the operating environment, one or more grammars for use in carrying out the voice dialog with a user; and

perform automatic speech recognition, using the one or more grammars, on user speech captured from the operating environment.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.

136 Citations

24 Claims

1. A system comprising at least one processor configured to:
- analyze digital data representing sounds captured by at least one microphone from an operating environment to compute background noise information associated with the operating environment, wherein;
  
  the at least one processor is configured to match the sounds captured from the operating environment to a background noise from a plurality of background noises, andthe background noise information comprises an identification of the background noise matching the sounds captured from the operating environment;
  
  select, based at least in part on the background noise information associated with the operating environment, a voice dialog from a plurality of voice dialogs, wherein;
  
  the at least one processor is configured to select, based at least in part on the background noise matching the sounds captured from the operating environment, one or more grammars for use in carrying out the voice dialog with a user; and
  
  perform automatic speech recognition, using the one or more grammars, on user speech captured from the operating environment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein the at least one processor is further configured to:
    - determine whether a portion of the sounds captured from the operating environment includes any foreground noise; and
      
      if it is determined that the portion of the sounds captured from the operating environment does not include any foreground noise, use the portion of the sounds captured from the operating environment to compute the background noise information.
  - 3. The system of claim 1, wherein the at least one processor is configured to match the sounds captured from the operating environment to the background noise at least in part by:
    - for each background noise of the plurality of background noises, using a Hidden Markov Model of the background noise to compute a probability that the background noise matches the sounds captured from the operating environment; and
      
      identifying, as the background noise matching the sounds captured from the operating environment, a background noise having a highest probability of matching the sounds captured from the operating environment.
  - 4. The system of claim 1, wherein the at least one processor is configured to select at least one grammar of the one or more grammars from a plurality of grammars.
  - 5. The system of claim 4, wherein the at least one processor is configured to select the at least one grammar from the plurality of grammars at least in part by:
    - determining a measure of reliability for each grammar of the plurality of grammars, the measure of reliability being indicative of how reliable the grammar is when used with the background noise matching the sounds captured from the operating environment; and
      
      identifying, as the at least one grammar, a grammar having a highest measure of reliability.
  - 6. The system of claim 1, wherein the at least one processor is further configured to capture the sounds captured from the operating environment.
  - 7. The system of claim 6, wherein the at least one processor is configured to capture the sounds captured from the operating environment when a user is not interacting with the system.
  - 8. The system of claim 6, wherein the at least one processor is configured to capture the sounds captured from the operating environment immediately after a user uses voice to interact with the system.

9. A method comprising acts of:
- analyzing digital data representing sounds captured by at least one microphone from an operating environment to compute background noise information associated with the operating environment, wherein;
  
  the act of analyzing comprises matching the sounds captured from the operating environment to a background noise from a plurality of background noises, andthe background noise information comprises an identification of the background noise matching the sounds captured from the operating environment;
  
  selecting, based at least in part on the background noise information associated with the operating environment, a voice dialog from a plurality of voice dialogs, wherein;
  
  the act of selecting the voice dialog comprises selecting, based at least in part on the background noise matching the sounds captured from the operating environment, one or more grammars for use in carrying out the voice dialog with a user; and
  
  performing automatic speech recognition, using the one or more grammars, on user speech captured from the operating environment.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9, further comprising:
    - determining whether a portion of the sounds captured from the operating environment includes any foreground noise; and
      
      if it is determined that the portion of the sounds captured from the operating environment does not include any foreground noise, using the portion of the sounds captured from the operating environment to compute the background noise information.
  - 11. The method of claim 9, wherein matching the sounds captured from the operating environment to the background noise comprises:
    - for each background noise of the plurality of background noises, using a Hidden Markov Model of the background noise to compute a probability that the background noise matches the sounds captured from the operating environment; and
      
      identifying, as the background noise matching the sounds captured from the operating environment, a background noise having a highest probability of matching the sounds captured from the operating environment.
  - 12. The method of claim 9, wherein at least one grammar of the one or more grammars is selected from a plurality of grammars.
  - 13. The method of claim 12, wherein selecting the at least one grammar from the plurality of grammars comprises:
    - determining a measure of reliability for each grammar of the plurality of grammars, the measure of reliability being indicative of how reliable the grammar is when used with the background noise matching the sounds captured from the operating environment; and
      
      identifying, as the at least one grammar, a grammar having a highest measure of reliability.
  - 14. The method of claim 9, further comprising an act of capturing the sounds captured from the operating environment.
  - 15. The method of claim 14, wherein the sounds are captured from the operating environment when a user is not interacting with a system in the operating environment.
  - 16. The method of claim 14, wherein the sounds are captured from the operating environment immediately after a user uses voice to interact with a system in the operating environment.

17. At least one non-transitory computer-readable medium encoded with a plurality of instructions that, when executed, perform a method comprising acts of:
- analyzing digital data representing sounds captured by at least one microphone from an operating environment to compute background noise information associated with the operating environment, wherein;
  
  the act of analyzing comprises matching the sounds captured from the operating environment to a background noise from a plurality of background noises, andthe background noise information comprises an identification of the background noise matching the sounds captured from the operating environment;
  
  selecting, based at least in part on the background noise information associated with the operating environment, a voice dialog from a plurality of voice dialogs, wherein;
  
  the act of selecting the voice dialog comprises selecting, based at least in part on the background noise matching the sounds captured from the operating environment, one or more grammars for use in carrying out the voice dialog with a user; and
  
  performing automatic speech recognition, using the one or more grammars, on user speech captured from the operating environment.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The at least one non-transitory computer-readable medium of claim 17, further comprising:
    - determining whether a portion of the sounds captured from the operating environment includes any foreground noise; and
      
      if it is determined that the portion of the sounds captured from the operating environment does not include any foreground noise, using the portion of the sounds captured from the operating environment to compute the background noise information.
  - 19. The at least one non-transitory computer-readable medium of claim 17, wherein matching the sounds captured from the operating environment to the background noise comprises:
    - for each background noise of the plurality of background noises, using a Hidden Markov Model of the background noise to compute a probability that the background noise matches the sounds captured from the operating environment; and
      
      identifying, as the background noise matching the sounds captured from the operating environment, a background noise having a highest probability of matching the sounds captured from the operating environment.
  - 20. The at least one non-transitory computer-readable medium of claim 17, wherein at least one grammar of the one or more grammars is selected from a plurality of grammars.
  - 21. The at least one non-transitory computer-readable medium of claim 20, wherein selecting the at least one grammar from the plurality of grammars comprises:
    - determining a measure of reliability for each grammar of the plurality of grammars, the measure of reliability being indicative of how reliable the grammar is when used with the background noise matching the sounds captured from the operating environment; and
      
      identifying, as the at least one grammar, a grammar having a highest measure of reliability.
  - 22. The at least one non-transitory computer-readable medium of claim 17, wherein the method further comprises an act of capturing the sounds captured from the operating environment.
  - 23. The at least one non-transitory computer-readable medium of claim 22, wherein the sounds are captured from the operating environment when a user is not interacting with a system in the operating environment.
  - 24. The at least one non-transitory computer-readable medium of claim 22, wherein the sounds are captured from the operating environment immediately after a user uses voice to interact with a system in the operating environment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Agapi, Ciprian, Bodin, William K., Cross, Charles W. Jr., Mirt, Michael H.
Primary Examiner(s)
Han, Qi

Application Number

US13/289,233
Publication Number

US 20120053934A1
Time in Patent Office

1,719 Days
Field of Search

704/233, 704/236, 704/231, 704/243, 704/255
US Class Current

1/1
CPC Class Codes

G10L 15/01 Assessment or evaluation of...

Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

136 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

136 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links