FREE TEXT VOICE TRAINING

US 20110301940A1
Filed: 01/07/2011
Published: 12/08/2011
Est. Priority Date: 01/08/2010
Status: Active Grant

First Claim

Patent Images

1. A method for acoustically training a speech recognition engine of a speech recognition software application, the method comprising:

receiving audio data representing a user'"'"'s voice speaking at least one phrase, the at least one phrase being unknown to the speech recognition engine in both spoken audio and text forms;

the speech recognition engine, using a process performed by a processor, translating the at least one phrase into text form for display to the user; and

receiving a reviewed version of the text form and the speech recognition software application, using a process performed by a processor, converting the reviewed version of the text form into a context free grammar based on text indicated as validated text.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method provide acoustic training of a voice or speech recognition engine and/or voice or speech recognition software application. Instead of requiring a user to read from a prepared or predetermined script, the system and method described herein enable acoustic training using any free text spoken phrases provided by the user directly, or by a previously recorded speech, presentation, or the like, performed by the user.

Citations

20 Claims

1. A method for acoustically training a speech recognition engine of a speech recognition software application, the method comprising:
- receiving audio data representing a user'"'"'s voice speaking at least one phrase, the at least one phrase being unknown to the speech recognition engine in both spoken audio and text forms;
  
  the speech recognition engine, using a process performed by a processor, translating the at least one phrase into text form for display to the user; and
  
  receiving a reviewed version of the text form and the speech recognition software application, using a process performed by a processor, converting the reviewed version of the text form into a context free grammar based on text indicated as validated text.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising the speech recognition software application recording each instance of validated text.
  - 3. The method of claim 1, further comprising the speech recognition software application recording each instance of validated text, accumulating instances of validated text up to a first predetermined number of instances of validated text or duration of audio signal, and once the first predetermined number of instances of validated text or duration of audio signal has been achieved, the speech recognition software application performing calibration of the speech recognition engine.
  - 4. The method of claim 3, wherein calibration of the speech recognition engine comprises the speech engine selecting initial properties of an acoustic match to a voice model.
  - 5. The method of claim 1, further comprising the speech recognition software application recording each instance of validated text, accumulating instances of validated text up to a second predetermined number of instances of validated text or duration of audio signal, and once the second predetermined number of instances of validated text or duration of audio signal has been achieved, the speech recognition software application performing refining calibration of the speech recognition engine.
  - 6. The method of claim 1, wherein the audio data comprises a previously recorded audio recording of the user'"'"'s voice speaking.
  - 7. The method of claim 1, wherein the audio data comprises a real-time data representation of the user'"'"'s voice speaking.

8. A computer-readable storage medium, which is not a signal, with an executable program stored thereon, wherein the executable program instructs a processor to perform a method, the method comprising:
- receive audio data at a the speech recognition engine, the audio data representing a user'"'"'s voice speaking at least one phrase, the at least one phrase being unknown to the speech recognition engine in both spoken audio and text forms;
  
  translate the at least one phrase into text form for display to the user; and
  
  receive a reviewed version of the text form and convert the reviewed version of the text form into a context free grammar based on text indicated as validated text.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The computer-readable storage medium of claim 8, wherein the method further comprises the speech recognition software application recording each instance of validated text.
  - 10. The computer-readable storage medium of claim 8, wherein the method further comprises the speech recognition software application recording each instance of validated text, accumulating instances of validated text up to a first predetermined number of instances of validated text or duration of audio signal, and once the first predetermined number of instances of validated text or duration of audio signal has been achieved, the speech recognition software application performing calibration of the speech recognition engine.
  - 11. The computer-readable storage medium of claim 10, wherein calibration of the speech recognition engine comprises the speech engine selecting initial properties of an acoustic match to a voice model.
  - 12. The computer-readable storage medium of claim 8, wherein the method further comprises the speech recognition software application recording each instance of validated text, accumulating instances of validated text up to a second predetermined number of instances of validated text or duration of audio signal, and once the second predetermined number of instances of validated text or duration of audio signal has been achieved, the speech recognition software application performing refining calibration of the speech recognition engine.
  - 13. The computer-readable storage medium of claim 8, wherein the audio data comprises a previously recorded audio recording of the user'"'"'s voice speaking or a real-time data representation of the user'"'"'s voice speaking.

14. A speech recognition system that can be acoustically trained with free text audio, the system comprising:
- a speech recognition software application operating on a computing device having a processor, the speech recognition software application comprising;
  
  a speech recognition engine;
  
  a comparison module configured to receive an indication of validated text and associate the validated text with at least one word from the free text audio; and
  
  a plurality of voice models;
  
  wherein upon receipt of a plurality of instances in which validated text is associated with the at least one word from the free text audio, the speech recognition software application selects a subset of voice models of the plurality of voice models in such a way that the subset of voice models shares a plurality of characteristics with the free text audio associated with the validated text.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The system of claim 14, wherein the speech recognition software application is configured to record each instance of validated text.
  - 16. The system of claim 14, wherein the speech recognition software application is configured to record each instance of validated text, accumulating instances of validated text up to a first predetermined number of instances of validated text or duration of audio signal, and further wherein the speech recognition software application is configured to perform calibration of the speech recognition engine once the first predetermined number of instances of validated text or duration of audio signal has been achieved.
  - 17. The system of claim 16, wherein calibration of the speech recognition engine comprises the speech engine selecting initial properties of an acoustic match to a voice model.
  - 18. The system of claim 14, wherein the speech recognition software application is configured to record each instance of validated text, accumulate instances of validated text up to a second predetermined number of instances of validated text or duration of audio signal, and further wherein the speech recognition software application is configured to perform refining calibration of the speech recognition engine once the second predetermined number of instances of validated text or duration of audio signal has been achieved.
  - 19. The system of claim 14, wherein the free text audio comprises a previously recorded audio recording of the user'"'"'s voice speaking.
  - 20. The system of claim 14, wherein the audio data comprises a real-time data representation of the user'"'"'s voice speaking.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Hon-Anderson, Eric, Stuller, Robert W.

Granted Patent

US 9,218,807 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G10L 15/063   Training

G10L 15/193   Formal grammars, e.g. finit...

G10L 2015/0638   Interactive procedures

FREE TEXT VOICE TRAINING

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

FREE TEXT VOICE TRAINING

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links