Automated testing of voice recognition software

US 7,562,019 B2
Filed: 01/07/2005
Issued: 07/14/2009
Est. Priority Date: 01/08/2004
Status: Active Grant

First Claim

Patent Images

1. A method for testing a voice enabled application on a target device, the method comprising conducting one or more interactions with the target device, at least some of the interactions comprising:

selecting one of a plurality of input modes for sending input to the target device;

presenting an acoustic utterance in an acoustic environment to the target device, including presenting a noise signal to the target device, using the selected input mode;

determining one of a plurality of response modes for responding to an output of the target device;

receiving an output of the target device in response to the acoustic utterance and the noise signal according to the determined response mode; and

comparing the output to an output expected from the acoustic utterance;

wherein the selected input mode and the determined response mode depend on input/output capabilities of the target devicewherein presenting the acoustic utterance further comprises generating the acoustic utterance using an acoustic speaker;

wherein the speaker comprises an artificial human mouth;

wherein the acoustic environment is produced using an acoustic noise source that generates the noise signal, the noise signal representing one or more environmental noises of a natural environment.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a system for testing a voice enabled application on a target device, the method including conducting one or more interactions with the target device, at least some of the interactions including presenting an acoustic utterance in an acoustic environment to the target device, receiving an output of the target device in response to the acoustic utterance, and comparing the output to an output expected from the acoustic utterance.

26 Citations

View as Search Results

45 Claims

1. A method for testing a voice enabled application on a target device, the method comprising conducting one or more interactions with the target device, at least some of the interactions comprising:
- selecting one of a plurality of input modes for sending input to the target device;
  
  presenting an acoustic utterance in an acoustic environment to the target device, including presenting a noise signal to the target device, using the selected input mode;
  
  determining one of a plurality of response modes for responding to an output of the target device;
  
  receiving an output of the target device in response to the acoustic utterance and the noise signal according to the determined response mode; and
  
  comparing the output to an output expected from the acoustic utterance;
  
  wherein the selected input mode and the determined response mode depend on input/output capabilities of the target devicewherein presenting the acoustic utterance further comprises generating the acoustic utterance using an acoustic speaker;
  
  wherein the speaker comprises an artificial human mouth;
  
  wherein the acoustic environment is produced using an acoustic noise source that generates the noise signal, the noise signal representing one or more environmental noises of a natural environment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 2. The method of claim 1, wherein the acoustic utterance and the expected output are further determined by simulating a dialog by a user of the application with the target device.
  - 3. The method of claim 2, wherein the dialog has a plurality of allowable interactions.
  - 4. The method of claim 1, wherein at least some of the interactions comprise providing a keyboard input to the target device.
  - 5. The method of claim 4, wherein the keyboard input comprises an electrical communications signal to the target device.
  - 6. The method of claim 4, wherein the keyboard input comprises a mechanical-pneumatic force exerted on buttons of the target device.
  - 7. The method of claim 1, wherein receiving the output further comprises receiving an electric communications signal from the target device.
  - 8. The method of claim 1, wherein receiving the output further comprises receiving an image of alphanumeric characters from the target device.
  - 9. The method of claim 8 further comprising processing and recognizing the image of alphanumeric characters from the target device.
  - 10. The method of claim 9 further comprising processing and recognizing the acoustic output from the target device.
  - 11. The method of claim 1, wherein receiving the output further comprises receiving an acoustic output from the target device.
  - 12. The method of claim 1 further comprising recording the acoustic utterance.
  - 13. The method of claim 12, wherein recording the acoustic utterance further comprises recording the acoustic utterance in one of:
    - a car, an airport, an office, a shaking room, and a quiet room.
  - 14. The method of claim 12, wherein recording the acoustic utterance further comprises recording the acoustic utterance of a person using a microphone.
  - 15. The method of claim 14 wherein the microphone comprises one of a group consisting of one of:
    - an in-device microphone, a headset boom microphone, a headset dangling microphone, and a car kit microphone.
  - 16. The method of claim 14 wherein the person utters the acoustic utterance while the microphone is shaking.
  - 17. The method of claim 14 wherein the person utters the acoustic utterance while the person varies his or her position relative to the microphone.
  - 18. The method of claim 1 wherein the target device is a cell phone or a personal digital assistant.
  - 19. The method of claim 1 wherein the target device is a desktop computer.
  - 20. The method of claim 1 further comprising:
    - testing and verifying algorithms for the voice enabled application offline of the target device;
      
      testing the application on a simulator of the target device; and
      
      distributing target devices to real users and observing application behavior and user experience.
  - 21. The method of claim 1, wherein the noise signal comprises an acoustic noise signal that emulates one of multiple different types of noise.
  - 22. The method of claim 1, wherein a first input mode comprises an electrical signal over a connection from a computer, a second input mode comprises an acoustic signal from a speaker, a first response mode comprises receiving the output as an electrical signal over a connection from the device, and a second response mode comprises recognizing the output from a screen or speaker of the device.
  - 23. The method of claim 1, wherein a first input mode comprises a recorded utterance from a first audio file and a noise signal from a second audio file, and a second input mode comprises an acoustic utterance from a first speaker and a noise signal from a second speaker.
  - 24. The method of claim 1, wherein the acoustic environment comprises a quiet room, and the noise signal is derived from an audio file.

25. A system for testing a voice enabled application on a target device, the system comprising:
- the target device;
  
  a speaker configured to send sound to the target device;
  
  a noise source configured to send a noise signal to the target device;
  
  wherein the noise source comprises an acoustic noise source configured to generate the noise signal to produce the acoustic environment, the acoustic noise source replicating one or more noises of a real environment;
  
  a computer configured to conduct one or more interactions with the target device, including selecting one of a plurality of input modes for sending input to the target device and determining one of a plurality of response modes for responding to an output of the target device, at least some of the interactions comprising;
  
  sending commands to the target device using the selected input mode and receiving communications from the target device using the determined response mode;
  
  presenting an acoustic utterance in an acoustic environment to the target device;
  
  receiving an output of the target device in response to the acoustic utterance;
  
  comparing the output to an output expected from the acoustic utterancewherein the selected input mode and the determined response mode depend on input/output capabilities of the target device;
  
  wherein presenting the acoustic utterance further comprises generating the acoustic utterance using the speaker;
  
  wherein the speaker is an artificial human mouth.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
- - 26. The system of claim 25, wherein the acoustic utterance and the expected output are further determined by simulating a dialog with a user of the application on the target device.
  - 27. The system of claim 26, wherein the dialog has a plurality of allowable interactions.
  - 28. The system of claim 25 wherein the target device is a cell phone or a personal digital assistant, wherein the acoustic environment is produced using an acoustic noise source that generates the noise signal, the noise signal representing one or more environmental noises of a natural environment.
  - 29. The system of claim 25 wherein the target device is a desktop computer.
  - 30. The system of claim 25, wherein sending commands comprises sending a keyboard input to the target device.
  - 31. The system of claim 30, wherein the keyboard input is an electrical communications signal to the target device.
  - 32. The system of claim 30, wherein the keyboard input is a mechanical-pneumatic force exerted on buttons of the target device.
  - 33. The system of claim 25, wherein receiving the output further comprises receiving an electric communications signal from the target device.
  - 34. The system of claim 25, wherein receiving the output further comprises receiving an image of alphanumeric characters from the target device.
  - 35. The system of claim 34, wherein at least some of the interactions further comprise processing and recognizing the image of alphanumeric characters from the target device.
  - 36. The system of claim 34, wherein at least some of the interactions further comprise processing and recognizing the acoustic output from the target device.
  - 37. The system of claim 25, wherein receiving the output further comprises receiving an acoustic output from the target device.
  - 38. The system claim 25, wherein presenting the acoustic utterance further comprises recording the acoustic utterance.
  - 39. The system of claim 38, wherein recording the acoustic utterance further comprises recording the acoustic utterance in one of:
    - a car, an airport, an office, a shaking room, and a quiet room.
  - 40. The system of claim 38, wherein recording the acoustic utterance further comprises recording the acoustic utterance of a person using a microphone.
  - 41. The method of claim 40 wherein the microphone is one of:
    - an in-device microphone, a headset boom microphone, a headset dangling microphone, and a car kit microphone.
  - 42. The system of claim 40 wherein the person utters the acoustic utterance while the microphone is shaking.
  - 43. The system of claim 40 wherein the person utters the acoustic utterance while the person varies his or her position relative to the microphone.
  - 44. The system of claim 25, wherein the noise source comprises an acoustic noise source configured to generate an acoustic noise signal that emulates one of multiple different types of noise.
  - 45. The system of claim 25, wherein a first input mode comprises an electrical signal over a connection from a computer, a second input mode comprises an acoustic signal from a speaker, a first response mode comprises receiving the output as an electrical signal over a connection from the device, and a second response mode comprises recognizing the output from a screen or speaker of the device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Voice Signal Technologies Incorporated (Microsoft Corporation)
Inventors
Barton, William, Ploumis, John, Ely, Douglas J., Cohen, Jordan
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
Albertalli; Brian L

Application Number

US11/031,955
Publication Number

US 20050197836A1
Time in Patent Office

1,649 Days
Field of Search

None
US Class Current

704/270
CPC Class Codes

G10L 15/01 Assessment or evaluation of...

Automated testing of voice recognition software

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

45 Claims

Specification

Solutions

Use Cases

Quick Links

Automated testing of voice recognition software

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

45 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links