Automatic speech recognition learning using user corrections

US 8,019,602 B2
Filed: 01/20/2004
Issued: 09/13/2011
Est. Priority Date: 01/20/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method of learning with an automatic speech recognition system, the method comprising:

detecting a change to a word included in a collection of dictated text, the change producing a changed version of the word;

utilizing a computer processor that is a component of a computing device to automatically infer whether the change is a correction or editing;

if the change is inferred to be a correction, selectively learning from the nature of the correction without additional user interaction;

wherein selectively learning from the nature of the correction includes;

making a first determination as to whether a user'"'"'s pronunciation, during an utterance that gave rise to the dictated text, deviated from existing pronunciations known by the system, the utterance including an utterance of said word as well as a related context word, and wherein making the first determination comprises;

doing a forced alignment of a wave that corresponds to the utterance of said word and the related context word;

analyzing the forced alignment so as to identify a portion of the wave that is the user'"'"'s pronunciation of said word;

generating a confidence score based at least upon a distance of the user'"'"'s pronunciation of said word to each of a plurality of possible pronunciations;

wherein the confidence score is calculated using the function 1/[d/f/log(len1+len2)], where d is the distance between the user'"'"'s pronunciation of said word to one of said possible pronunciations, f is a frequency that the user'"'"'s pronunciation of said word is pronounced, and len1 and len2 are values representing the length of phonemes;

making a second determination as to whether said word is included in the existing lexicon known by the system; and

if the second determination indicates that said word does exist in the existing lexicon, and if the first determination indicates the user'"'"'s pronunciation of said word is in said existing pronunciations known by the system, then selectively changing a parameter associated within the system with the user'"'"'s pronunciation of said word.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.

59 Citations

View as Search Results

6 Claims

1. A computer-implemented method of learning with an automatic speech recognition system, the method comprising:
- detecting a change to a word included in a collection of dictated text, the change producing a changed version of the word;
  
  utilizing a computer processor that is a component of a computing device to automatically infer whether the change is a correction or editing;
  
  if the change is inferred to be a correction, selectively learning from the nature of the correction without additional user interaction;
  
  wherein selectively learning from the nature of the correction includes;
  
  making a first determination as to whether a user'"'"'s pronunciation, during an utterance that gave rise to the dictated text, deviated from existing pronunciations known by the system, the utterance including an utterance of said word as well as a related context word, and wherein making the first determination comprises;
  
  doing a forced alignment of a wave that corresponds to the utterance of said word and the related context word;
  
  analyzing the forced alignment so as to identify a portion of the wave that is the user'"'"'s pronunciation of said word;
  
  generating a confidence score based at least upon a distance of the user'"'"'s pronunciation of said word to each of a plurality of possible pronunciations;
  
  wherein the confidence score is calculated using the function 1/[d/f/log(len1+len2)], where d is the distance between the user'"'"'s pronunciation of said word to one of said possible pronunciations, f is a frequency that the user'"'"'s pronunciation of said word is pronounced, and len1 and len2 are values representing the length of phonemes;
  
  making a second determination as to whether said word is included in the existing lexicon known by the system; and
  
  if the second determination indicates that said word does exist in the existing lexicon, and if the first determination indicates the user'"'"'s pronunciation of said word is in said existing pronunciations known by the system, then selectively changing a parameter associated within the system with the user'"'"'s pronunciation of said word.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein generating the confidence score further comprises the confidence score based at least in part upon comparison of an acoustic model score of the user'"'"'s pronunciation of said word with acoustic model scores of the plurality of possible pronunciations.
  - 3. The method of claim 1, further comprising comparing the confidence score to a threshold.
  - 4. The method of claim 1, wherein generating the confidence score based at least upon the distance further comprises generating based at least upon the distance as calculated using a phone confusion matrix.

5. A computer-implemented method of learning with an automatic speech recognition system, the method comprising:
- detecting a change to a word included in a collection of dictated text, the change producing a changed version of the word;
  
  utilizing a computer processor that is a component of a computing device to automatically infer whether the change is a correction or editing;
  
  wherein inferring whether the change is a correction or editing includes comparing a speech recognition engine score of the dictated text and of the changed text;
  
  if the change is inferred to be a correction, selectively learning from the nature of the correction without additional user interaction;
  
  wherein selectively learning from the nature of the correction includes;
  
  making a first determination as to whether a user'"'"'s pronunciation, during an utterance that gave rise to the dictated text, deviated from existing pronunciations known by the system, and wherein making the first determination comprises generating a confidence score based at least upon a distance of the user'"'"'s pronunciation of said word to each of a plurality of possible pronunciations;
  
  wherein the confidence score is calculated using the function 1/[d/f/log(len1+len2)], where d is the distance between the user'"'"'s pronunciation of said word to one of said possible pronunciations, f is a frequency that the user'"'"'s pronunciation of said word is pronounced, and len1 and len2 are values representing the length of phonemes;
  
  making a second determination as to whether said word is included in the existing lexicon known by the system; and
  
  if the second determination indicates that said word does exist in the existing lexicon, and if the first determination indicates the user'"'"'s pronunciation of said word is in said existing pronunciations known by the system, then selectively changing a parameter associated within the system with the user'"'"'s pronunciation of said word.
- View Dependent Claims (6)
- - 6. The method of claim 5, further comprising comparing the confidence score to a threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Yu, Dong, Hwang, Mei-Yuh, Mau, Peter, Acero, Alejandro
Primary Examiner(s)
Wozniak; James S
Assistant Examiner(s)
SHAH, PARAS D

Application Number

US10/761,451
Publication Number

US 20050159949A1
Time in Patent Office

2,793 Days
Field of Search

704/251, 704/254, 704/243, 704/244, 704/270, 704/256, 704/231, 704/235
US Class Current

704/231
CPC Class Codes

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 2015/0631   Creating reference template...

Automatic speech recognition learning using user corrections

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

59 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic speech recognition learning using user corrections

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

59 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links