SPEECH RECOGNITION REPAIR USING CONTEXTUAL INFORMATION
First Claim
Patent Images
1. A machine implemented method comprising:
- receiving a speech input from a user of a data processing system;
determining a context, of the data processing system, when the speech input was received;
recognizing text in the speech input through a speech recognition system that includes an acoustic model and a language model, the recognizing of text producing a first text output;
storing the first text output as a parsed data structure having a plurality of tokens each of which represents a word in the first text output;
processing each of the tokens with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and each of the tokens, each of the interpreters determining from any matches and from the context whether it can repair a token in the first text output, wherein each interpreter is designed to repair an error of a specific type in the first text output;
merging selected results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output;
providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom'"'"'s phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.
-
Citations
24 Claims
-
1. A machine implemented method comprising:
-
receiving a speech input from a user of a data processing system; determining a context, of the data processing system, when the speech input was received; recognizing text in the speech input through a speech recognition system that includes an acoustic model and a language model, the recognizing of text producing a first text output; storing the first text output as a parsed data structure having a plurality of tokens each of which represents a word in the first text output; processing each of the tokens with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and each of the tokens, each of the interpreters determining from any matches and from the context whether it can repair a token in the first text output, wherein each interpreter is designed to repair an error of a specific type in the first text output; merging selected results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output; providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising:
-
receiving a speech input from a user of a data processing system; determining a context, of the data processing system, when the speech input was received; recognizing text in the speech input through a speech recognition system that includes an acoustic model and a language model, the recognizing of text producing a first text output; storing the first text output as a parsed data structure having a plurality of tokens each of which represents a word in the first text output; processing each of the tokens with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and at least one of the tokens, each of the interpreters determining from any matches and from the context whether it can repair a token in the first text output, wherein each interpreter is designed to repair an error of a specific type in the first text output; merging selected results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output; providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising:
-
receiving a speech input from a user of a data processing system; recognizing text in the speech input through a speech recognition system that includes an acoustic model and an optional language model, the recognizing of text producing a first text output; storing the first text output as a parsed data structure having a plurality of words in the first text output; processing at least one of the words with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and the at least one of the words, each of the interpreters determining from any matches whether it can repair a word in the first text output, wherein each interpreter is designed to repair an error of a specific field in the one or more databases; merging repaired results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output; providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription. - View Dependent Claims (16)
-
-
17. A data processing system comprising:
-
a speech recognizer which recognizes text in a speech input and produces a first text output; a context determining system which determines a context of the data processing system when the speech input is received; a microphone coupled to the speech recognizer to provide the speech input to the speech recognizer; a speech repair system coupled to the speech recognizer and coupled to the context determining system, the speech repair system including a set of interpreters, each of which is configured to repair an error of a certain type in recognized text, the certain type being determined by one or more fields in one or more databases which are searched by the set of interpreters. - View Dependent Claims (18)
-
-
19. The data processing system of claim 19 wherein the set of interpreters search the one or more databases to compare words in the first text output with one or more items in the one or more databases when determining whether to repair one or more words in the first text output.
-
20. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising:
-
executing a speech assistant application which is a first application in a set of applications; receiving a digitized speech input and recognizing text in the speech input through a speech recognition system which provides a first text output; determining a command from the first text output; selecting an application in the set of applications based on the command, wherein the selected application is different than the speech assistant application, the selected application being configured to execute the command with text from or derived from the first text output. - View Dependent Claims (21, 22, 23, 24)
-
Specification