Speech recognition using an operating system hooking component for context-aware recognition models
First Claim
1. A computer-implemented method performed by at least one computer processor executing computer program instructions tangibly stored on at least one non-transitory computer-readable medium, the method comprising using the at least one computer processor to perform operations of:
- receiving, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into a user interface element of a target application while the target application is in a first state;
training, by the automatic speech recognition system, a first language model based on the first plurality of inputs, comprising;
identifying, by an operating system hooking component included in the speech recognition system executed by the at least one computer processor, a state of the user interface element by intercepting messages between the said user interface element and the computer processor'"'"'s operating system while the user interface element is displayed in a foreground of a graphical user interface; and
associating, by the automatic speech recognition system, the first language model with the user interface element;
determining, by the automatic speech recognition system, that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state;
applying, by the automatic speech recognition system, the first language model to a first speech input in response to determining that the target application is in the first state;
receiving, by the automatic speech recognition system, a second plurality of inputs into the target application while the target application is in a second state that differs from the first state;
training, by the automatic speech recognition system, a second language model based on the second plurality of inputs;
determining, by the automatic speech recognition system, that the target application is in the second state; and
applying, by the automatic speech recognition system, the second language model to second speech input in response to determining that the target application is in the second state.
10 Assignments
0 Petitions
Accused Products
Abstract
Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided. For each state, a corresponding language model is trained based on the input(s) provided to the application while the application was in that state. When the application is next observed to be in a previously-observed state, a language model associated with the application'"'"'s current state is applied to recognize speech input provided by a user and thereby to generate speech recognition output that is provided to the application. An application'"'"'s state at a particular time may include the user interface element(s) that are displayed and/or in focus at that time, and is determined by an operating system hooking component embedded in the automatic speech recognition system.
16 Citations
62 Claims
-
1. A computer-implemented method performed by at least one computer processor executing computer program instructions tangibly stored on at least one non-transitory computer-readable medium, the method comprising using the at least one computer processor to perform operations of:
-
receiving, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into a user interface element of a target application while the target application is in a first state; training, by the automatic speech recognition system, a first language model based on the first plurality of inputs, comprising; identifying, by an operating system hooking component included in the speech recognition system executed by the at least one computer processor, a state of the user interface element by intercepting messages between the said user interface element and the computer processor'"'"'s operating system while the user interface element is displayed in a foreground of a graphical user interface; and associating, by the automatic speech recognition system, the first language model with the user interface element; determining, by the automatic speech recognition system, that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state; applying, by the automatic speech recognition system, the first language model to a first speech input in response to determining that the target application is in the first state; receiving, by the automatic speech recognition system, a second plurality of inputs into the target application while the target application is in a second state that differs from the first state; training, by the automatic speech recognition system, a second language model based on the second plurality of inputs; determining, by the automatic speech recognition system, that the target application is in the second state; and applying, by the automatic speech recognition system, the second language model to second speech input in response to determining that the target application is in the second state. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. An automated speech recognition system comprising:
-
means for receiving a first plurality of inputs into a user interface element of a target application while the target application is in a first state; means for training a first language model based on the first plurality of inputs, comprising; means for receiving, from an operating system hooking component included in the speech recognition system, an identification of a state of the user interface element by intercepting messages between the said user interface element and the speech recognition system'"'"'s associated operating system while the user interface element is displayed in a foreground of a graphical user interface; means for associating the first language model with the user interface element; means for determining that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state; means for applying the first language model to a first speech input in response to determining that the target application is in the first state; means for receiving a second plurality of inputs into the application while the target application is in a second state that differs from the first state; means for training a second language model based on the second plurality of inputs; means for determining that the target application is in the second state; and means for applying the second language model to second speech input in response to determining that the target application is in the second state. - View Dependent Claims (32)
-
-
33. A non-transitory computer readable medium storing computer program instructions executable by at least one computer processor to perform a method, the method comprising:
-
Receiving, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into a target application while the target application is in a first state; training a first language model based on the first plurality of inputs; identifying, by an operating system hooking component included in the speech recognition system, a user interface element displayed in a foreground of a graphical user interface and its state by intercepting messages between the said user interface element and the computer processor'"'"'s operating system; associating the first language model with the identified user interface element; determining that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state; applying the first language model to a first speech input in response to determining that the target application is in the first state; receiving a second plurality of inputs into the target application while the target application is in a second state that differs from the first state; means for training a second language model based on the second plurality of inputs; means for determining that the target application is in the second state; and means for applying the second language model to second speech input in response to determining that the target application is in the second state. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62)
-
Specification