SYSTEM AND METHOD FOR IMPROVING SPEECH RECOGNITION ACCURACY USING TEXTUAL CONTEXT

US 20110099013A1
Filed: 10/23/2009
Published: 04/28/2011
Est. Priority Date: 10/23/2009
Status: Active Grant

First Claim

Patent Images

1. A method for improving speech recognition accuracy using textual context, the method causing a computing device to perform steps comprising:

retrieving a recorded utterance;

retrieving text captured from a device display associated with the spoken dialog and viewed by one party to the recorded utterance;

identifying words in the captured text that are relevant to the recorded utterance;

adding the identified words to a dynamic language model; and

recognizing the recorded utterance using the dynamic language model.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.

Citations

20 Claims

1. A method for improving speech recognition accuracy using textual context, the method causing a computing device to perform steps comprising:
- retrieving a recorded utterance;
  
  retrieving text captured from a device display associated with the spoken dialog and viewed by one party to the recorded utterance;
  
  identifying words in the captured text that are relevant to the recorded utterance;
  
  adding the identified words to a dynamic language model; and
  
  recognizing the recorded utterance using the dynamic language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein the recorded utterance is a spoken dialog.
  - 3. The method of claim 1, wherein each identified word is assigned a time stamp.
  - 4. The method of claim 3, the method further causing a computing device to add one of the identified words to the dynamic language model based on its respective assigned time stamp.
  - 5. The method of claim 3, the method further causing a computing device to remove one of the identified words from the dynamic language model based on its respective assigned time stamp.
  - 6. The method of claim 1, wherein a screen scraper captures text from the device display associated with the recorded utterance.
  - 7. The method of claim 2, wherein the device display contains customer service data.
  - 8. The method of claim 1, wherein the captured text comprises at least one of a name, a location, a phone number, an account type, or a product name.
  - 9. The method of claim 1, the method further causing a computing device to perform steps comprising:
    - determining an utterance category based on the captured text; and
      
      adding utterance category specific words to the dynamic language model.
  - 10. The method of claim 2, the method further causing a computing device to perform steps comprising:
    - identifying a user in the dialog; and
      
      saving the dynamic language model as a personalized dynamic language model associated with the identified user.
  - 11. The method of claim 10, the method further causing a computing device to perform steps comprising:
    - retrieving a second spoken dialog including the identified user;
      
      loading the personalized dynamic language model associated with the identified user;
      
      andrecognizing the second spoken dialog using the personalized dynamic language model.
  - 12. The method of claim 1, wherein adding the identified words to a dynamic language model further comprises rescoring an existing language model.
  - 13. The method of claim 1, wherein identifying words in the captured text that are relevant to the recorded utterance further comprises:
    - extracting from the captured text references to external data;
      
      retrieving the external data;
      
      identifying data of interest in the parsed data; and
      
      adding the identified data of interest to the dynamic language model.

14. A system for improving speech recognition accuracy using textual context, the system comprising:
- a processor;
  
  a module configured to control the processor to retrieve a recorded utterance;
  
  a module configured to control the processor to capture text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance;
  
  a module configured to control the processor to identify words in the captured text that are relevant to the recorded utterance;
  
  a module configured to control the processor to add the identified words to a dynamic language model; and
  
  a module configured to control the processor to recognize the recorded utterance using the dynamic language model.
- View Dependent Claims (15, 16, 17)
- - 15. The system of claim 14, wherein the recorded utterance is a spoken dialog.
  - 16. The system of claim 14, the system further comprising a module configured to control the processor to assign a time stamp to each identified word.
  - 17. The system of claim 16, the system further comprising a module configured to control the processor to add one of the identified words to the dynamic language model based on its respective assigned time stamp.

18. A computer-readable storage medium storing instructions for improving speech recognition accuracy using textual context which, when executed by a computing device, cause the computing device to perform steps comprising:
- retrieving a recorded utterance;
  
  capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance;
  
  identifying words in the captured text that are relevant to the recorded utterance;
  
  adding the identified words to a dynamic language model; and
  
  recognizing the recorded utterance using the dynamic language model.
- View Dependent Claims (19, 20)
- - 19. The computer-readable storage medium of claim 18, wherein the recorded utterance is a spoken dialog.
  - 20. The computer-readable storage medium of claim 18, wherein identifying words in the captured text that are relevant to the recorded utterance further comprises:
    - extracting from the captured text references to external data;
      
      retrieving the external data;
      
      identifying data of interest in the parsed data; and
      
      adding the identified data of interest to the dynamic language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
MELAMED, Dan, Johnston, Michael, Bangalore, Srinivas

Granted Patent

US 8,571,866 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/255
CPC Class Codes

G06F 3/162   Interface to dedicated audi...

G10L 15/05   Word boundary detection

G10L 15/07   to the speaker

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/30   Distributed recognition, e....

G10L 17/04   Training, enrolment or mode...

G10L 2015/228   of application context

G10L 25/51   for comparison or discrimin...

SYSTEM AND METHOD FOR IMPROVING SPEECH RECOGNITION ACCURACY USING TEXTUAL CONTEXT

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR IMPROVING SPEECH RECOGNITION ACCURACY USING TEXTUAL CONTEXT

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links