Automatic reading tutoring with parallel polarized language modeling

US 8,433,576 B2
Filed: 01/19/2007
Issued: 04/30/2013
Est. Priority Date: 01/19/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

providing a user option to tune a weighting parameter, and responding to tuning of the weighting parameter by raising or lowering criteria for a general-domain garbage language model to adjust a miscue detection rate and a rate of false alarms;

displaying a text output having target words;

dynamically generating a domain-specific target language model for the text output, the target language model being specific to the text output having the target words and including a language score for the target words of the text output;

receiving an acoustic input;

modeling, using a processor of a computer, the acoustic input with the dynamically generated domain-specific target language model, comprising calculating an acoustic score for the target words with reference to the acoustic input;

further modeling the acoustic input with the general-domain garbage language model to identify an element of the acoustic input as a miscue that does not correspond properly to the target words of the text output; and

providing user-perceptible feedback.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or “on-the-fly” based on the currently displayed text (e.g. the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.

46 Citations

View as Search Results

20 Claims

1. A computer-implemented method comprising:
- providing a user option to tune a weighting parameter, and responding to tuning of the weighting parameter by raising or lowering criteria for a general-domain garbage language model to adjust a miscue detection rate and a rate of false alarms;
  
  displaying a text output having target words;
  
  dynamically generating a domain-specific target language model for the text output, the target language model being specific to the text output having the target words and including a language score for the target words of the text output;
  
  receiving an acoustic input;
  
  modeling, using a processor of a computer, the acoustic input with the dynamically generated domain-specific target language model, comprising calculating an acoustic score for the target words with reference to the acoustic input;
  
  further modeling the acoustic input with the general-domain garbage language model to identify an element of the acoustic input as a miscue that does not correspond properly to the target words of the text output; and
  
  providing user-perceptible feedback.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the target language model is constructed with N-grams of a user-selectable order.
  - 3. The method of claim 2, wherein the N-grams are selected from one or more of unigrams, bigrams, and trigrams.
  - 4. The method of claim 1, wherein the user-perceptible feedback comprises an audible correction to the miscue.
  - 5. The method of claim 1, wherein the user-perceptible feedback comprises displaying a translation into a different language of a portion of the text output for which the acoustic input includes the miscue.
  - 6. The method of claim 1, wherein the miscue is identified with one or more miscue categories and the user-perceptible feedback is based in part on one of the miscue categories with which the miscue in the acoustic input is identified, wherein the miscue categories comprise one or more of:
    - word repetition, breath, partial word, pause, hesitation or elongation, wrong word, mispronunciation, background noise, interjection or insertion, non-speech sound, and hyperarticulation.
  - 7. The method of claim 1, further comprising applying a weighting parameter to the criteria for the garbage language model, and applying one minus the weighting parameter to the criteria for the target language model.
  - 8. The method of claim 1, wherein the garbage language model is assembled based on a restricted selection of relatively common words from a dictation grammar.
  - 9. The method of claim 1, wherein the target language model and the garbage language model are constructed as context-free grammars.
  - 10. The method of claim 1, wherein the garbage language model comprises an N-gram filler that is constructed as a context-free grammar.
  - 11. The method of claim 1, wherein providing user-perceptible feedback comprises at least one of:
    - displaying a phonetic representation of a portion of the text output corresponding to the miscue, the phonetic representation being displayed in addition to the portion of the text output corresponding to the miscue; and
      
      displaying a score representing how much of the acoustic input is free of miscues.

12. A computer-implemented method comprising:
- receiving a user input indicative of a user-selected order of N-grams;
  
  accessing a text sample;
  
  identifying a first portion of the text sample, the first portion comprising at least a first phrase of the text sample;
  
  displaying the first portion of the text sample to the user and dynamically assembling a domain-specific target language model for the first portion of the text sample while the first portion of the text sample is being displayed to the user, wherein assembling the domain-specific target language model comprises;
  
  selecting an order of N-grams based on the user input; and
  
  constructing the domain-specific target language model based on the selected order of N-grams;
  
  receiving an acoustic input from the user;
  
  modeling the acoustic input, using a processor of a computer, with the domain-specific target language model and a general-domain garbage language model to identify elements of the acoustic input as miscues that do not correspond properly to the first portion of the text output; and
  
  providing user-perceptible feedback based on modeling the acoustic input with the target language model and modeling the acoustic input with the garbage language model.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method of claim 12, further comprising:
    - iteratively displaying additional portions of the text sample to the user, and dynamically assembling additional domain-specific target language models respectively based on each of the additional portions of the text sample while they are being displayed.
  - 14. The method of claim 13, and further comprising:
    - for each of the iteratively displayed additional portions of the text sample,receiving an additional acoustic input from the user for the respective additional portion of the text sample,modeling the additional acoustic input with the additional domain-specific target language model dynamically assembled based on the respective additional portion of the text sample, andproviding user-perceptible feedback for the additional acoustic input based on the additional domain-specific target language model and the garbage language model.
  - 15. The method of claim 13, wherein providing user-perceptible feedback comprises at least one of:
    - displaying a phonetic representation of a portion of the text output corresponding to a miscue, the phonetic representation being displayed separate from the portion of the text output corresponding to the miscue; and
      
      displaying a score representing how much of the acoustic input is free of miscues.
  - 16. The method of claim 13, wherein the user-perceptible feedback comprises displaying a translation into a different language of a portion of the text output for which the acoustic input includes the miscue.
  - 17. The method of claim 13, wherein a miscue is identified with one or more miscue categories and the user-perceptible feedback is based in part on one of the miscue categories with which the miscue in the acoustic input is identified, wherein the miscue categories comprise one or more of:
    - word repetition, breath, partial word, pause, hesitation or elongation, wrong word, mispronunciation, background noise, interjection or insertion, non-speech sound, and hyperarticulation.

18. A computer-implemented method comprising:
- retrieving an indication of a text having target words from a data store;
  
  calculating a language model score for the target words using a domain-specific target model that is specific to the text having the target words;
  
  receiving an acoustic signal via a user input;
  
  calculating an acoustic score for the target words with reference to the acoustic signal using the domain-specific target model;
  
  obtaining a general-domain garbage model indicative of a set of garbage words including common words in a general domain;
  
  obtaining an acoustic score and a language model score for the set of garbage words from the general-domain garbage model;
  
  evaluating whether the acoustic signal comprises a mispronunciation with reference to the target words of the text based on a weighted comparison of the acoustic score and the language model score for the target words with the acoustic score and the language model score for the set of garbage words;
  
  providing user-perceptible feedback based on the evaluation; and
  
  providing a user option to tune a weighting parameter, and responding to tuning of the weighting parameter by raising or lowering criteria for the garbage model to adjust a miscue detection rate and a rate of false alarms.
- View Dependent Claims (19, 20)
- - 19. The computer-implemented method of claim 18, and further comprising retrieving a plurality of textual units each having a plurality of target words.
  - 20. The computer-implemented method of claim 19, and further comprising:
    - iteratively displaying the plurality of textual units to a user, and dynamically assembling additional domain-specific target language models respectively based on each of the additional portions of the text sample while they are being displayed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Li, Xiaolong, Ju, Yun-Cheng, Deng, Li, Acero, Alejandro
Primary Examiner(s)
He, Jialong

Application Number

US11/655,702
Publication Number

US 20080177545A1
Time in Patent Office

2,293 Days
Field of Search

704/270, 704/275, 704/251
US Class Current

704/270
CPC Class Codes

G06F 40/211   Syntactic parsing, e.g. bas...

G09B 17/003   electrically operated appar...

G10L 15/197   Probabilistic grammars, e.g...

G10L 2015/221   Announcement of recognition...

Automatic reading tutoring with parallel polarized language modeling

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

46 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic reading tutoring with parallel polarized language modeling

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

46 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links