SPEECH RECOGNITION REPAIR USING CONTEXTUAL INFORMATION

US 20130080177A1
Filed: 09/28/2011
Published: 03/28/2013
Est. Priority Date: 09/28/2011
Status: Active Grant

First Claim

Patent Images

1. A machine implemented method comprising:

receiving a speech input from a user of a data processing system;

determining a context, of the data processing system, when the speech input was received;

recognizing text in the speech input through a speech recognition system that includes an acoustic model and a language model, the recognizing of text producing a first text output;

storing the first text output as a parsed data structure having a plurality of tokens each of which represents a word in the first text output;

processing each of the tokens with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and each of the tokens, each of the interpreters determining from any matches and from the context whether it can repair a token in the first text output, wherein each interpreter is designed to repair an error of a specific type in the first text output;

merging selected results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output;

providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom'"'"'s phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.

Citations

24 Claims

1. A machine implemented method comprising:
- receiving a speech input from a user of a data processing system;
  
  determining a context, of the data processing system, when the speech input was received;
  
  recognizing text in the speech input through a speech recognition system that includes an acoustic model and a language model, the recognizing of text producing a first text output;
  
  storing the first text output as a parsed data structure having a plurality of tokens each of which represents a word in the first text output;
  
  processing each of the tokens with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and each of the tokens, each of the interpreters determining from any matches and from the context whether it can repair a token in the first text output, wherein each interpreter is designed to repair an error of a specific type in the first text output;
  
  merging selected results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output;
  
  providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as in claim 1 wherein the context includes a history of prior user inputs and wherein the one or more databases comprises a contacts database which stores at least one of names, addresses and phone numbers.
  - 3. The method as in claim 2 wherein the context includes a conversation history and wherein the one or more databases comprises a media database which stores at least one of song, titles, and artists and wherein an interpreter in the set of interpreters uses at least two consecutive words when evaluating a possible match.
  - 4. The method as in claim 1 wherein a first interpreter, in the set of interpreters, uses a first algorithm to determine whether to repair a word and wherein a second interpreter, in the set of interpreters, uses a second algorithm to determine whether to repair a word, the first algorithm being different than the second algorithm.
  - 5. The method as in claim 1 wherein a first interpreter, in the set of interpreters, uses a first algorithm to search the one or more databases and a second interpreter, in the set of interpreters, uses a second algorithm to search the one or more databases, and wherein the first algorithm and the second algorithm are different.
  - 6. The method as in claim 1 wherein the interpreters in the set of interpreters do not attempt to repair the command.
  - 7. The method as in claim 1 wherein the merging merges only non-overlapping results from the set of interpreters, and overlapping results from the set of interpreters are ranked in a ranked set and one result in the ranked set is selected and merged into the final interpreted speech transcription.

8. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising:
- receiving a speech input from a user of a data processing system;
  
  determining a context, of the data processing system, when the speech input was received;
  
  recognizing text in the speech input through a speech recognition system that includes an acoustic model and a language model, the recognizing of text producing a first text output;
  
  storing the first text output as a parsed data structure having a plurality of tokens each of which represents a word in the first text output;
  
  processing each of the tokens with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and at least one of the tokens, each of the interpreters determining from any matches and from the context whether it can repair a token in the first text output, wherein each interpreter is designed to repair an error of a specific type in the first text output;
  
  merging selected results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output;
  
  providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The medium as in claim 8 wherein the context includes a history of prior user inputs and wherein the one or more databases comprises a contacts database which stores at least one of names, addresses and phone numbers.
  - 10. The medium as in claim 9 wherein the context includes a conversation history and wherein the one or more databases comprises a media database which stores at least one of song, titles, and artists and wherein an interpreter in the set of interpreters uses at least two consecutive words when evaluating a possible match.
  - 11. The medium as in claim 8 wherein a first interpreter, in the set of interpreters, uses a first algorithm to determine whether to repair a word and wherein a second interpreter, in the set of interpreters, uses a second algorithm to determine whether to repair a word, the first algorithm being different than the second algorithm.
  - 12. The medium as in claim 8 wherein a first interpreter, in the set of interpreters, uses a first algorithm to search the one or more databases and a second interpreter, in the set of interpreters, uses a second algorithm to search the one or more databases, and wherein the first algorithm and the second algorithm are different.
  - 13. The medium as in claim 8 wherein the interpreters in the set of interpreters do not attempt to repair the command.
  - 14. The medium as in claim 8 wherein the merging merges only non-overlapping results from the set of interpreters, and overlapping results from the set of interpreters are ranked in a ranked set and one result in the ranked set is selected and merged into the final interpreted speech transcription.

15. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising:
- receiving a speech input from a user of a data processing system;
  
  recognizing text in the speech input through a speech recognition system that includes an acoustic model and an optional language model, the recognizing of text producing a first text output;
  
  storing the first text output as a parsed data structure having a plurality of words in the first text output;
  
  processing at least one of the words with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and the at least one of the words, each of the interpreters determining from any matches whether it can repair a word in the first text output, wherein each interpreter is designed to repair an error of a specific field in the one or more databases;
  
  merging repaired results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output;
  
  providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription.
- View Dependent Claims (16)
- - 16. The medium as in claim 15, wherein the method further comprises:
    - determining a context, of the data processing system, when the speech input was received, wherein the context includes a history of prior user inputs and wherein the one or more databases comprises a contacts database which stores at least one of names, addresses and phone numbers;
      
      and wherein different interpreters, in the set of interpreters, use different algorithms to determine whether to repair a word in the first text output, and wherein each interpreter determines, through a score, whether it can repair a word in the first text output.

17. A data processing system comprising:
- a speech recognizer which recognizes text in a speech input and produces a first text output;
  
  a context determining system which determines a context of the data processing system when the speech input is received;
  
  a microphone coupled to the speech recognizer to provide the speech input to the speech recognizer;
  
  a speech repair system coupled to the speech recognizer and coupled to the context determining system, the speech repair system including a set of interpreters, each of which is configured to repair an error of a certain type in recognized text, the certain type being determined by one or more fields in one or more databases which are searched by the set of interpreters.
- View Dependent Claims (18)
- - 18. The data processing system of claim 17 wherein the context includes a history of user inputs and wherein the set of interpreters use the context in a process of determining whether to repair one or more words in the first text output and wherein the speech recognizer includes an acoustic model and a language model.

19. The data processing system of claim 19 wherein the set of interpreters search the one or more databases to compare words in the first text output with one or more items in the one or more databases when determining whether to repair one or more words in the first text output.

20. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising:
- executing a speech assistant application which is a first application in a set of applications;
  
  receiving a digitized speech input and recognizing text in the speech input through a speech recognition system which provides a first text output;
  
  determining a command from the first text output;
  
  selecting an application in the set of applications based on the command, wherein the selected application is different than the speech assistant application, the selected application being configured to execute the command with text from or derived from the first text output.
- View Dependent Claims (21, 22, 23, 24)
- - 21. The medium as in claim 20 wherein the method further comprises:
    - repairing text in the first text output through a set of interpreters each of which is configured to repair an error of a specific type, based on one or more fields of one or more databases, in the first text output;
      
      merging results from the set of interpreters to produce a final interpreted transcription to the selected application.
  - 22. The medium as in claim 21 wherein the method further comprises:
    - determining a context of the data processing system when the digitized speech input is received, and wherein the set of interpreters use the context when determining whether to repair one or more words in the first text output.
  - 23. The medium as in claim 22 wherein a grammar parser determines the command from the first text output.
  - 24. The medium as in claim 22 wherein the set of applications comprises at least two of:
    - (a) a telephone dialer that uses the final interpreted transcription to dial a telephone number;
      
      (b) a media player for playing songs or other context;
      
      (c) a text messaging application;
      
      (d) an email application;
      
      (e) a calendar application;
      
      (f) a local search application;
      
      (g) a video conferencing application;
      
      or (h) a person or object locating application.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Chen, Lik Harry

Granted Patent

US 8,762,156 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/275
CPC Class Codes

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

SPEECH RECOGNITION REPAIR USING CONTEXTUAL INFORMATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH RECOGNITION REPAIR USING CONTEXTUAL INFORMATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links