Speech recognition using an operating system hooking component for context-aware recognition models

US 9,489,375 B2
Filed: 06/19/2012
Issued: 11/08/2016
Est. Priority Date: 06/19/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method performed by at least one computer processor executing computer program instructions tangibly stored on at least one non-transitory computer-readable medium, the method comprising using the at least one computer processor to perform operations of:

receiving, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into a user interface element of a target application while the target application is in a first state;

training, by the automatic speech recognition system, a first language model based on the first plurality of inputs, comprising;

identifying, by an operating system hooking component included in the speech recognition system executed by the at least one computer processor, a state of the user interface element by intercepting messages between the said user interface element and the computer processor'"'"'s operating system while the user interface element is displayed in a foreground of a graphical user interface; and

associating, by the automatic speech recognition system, the first language model with the user interface element;

determining, by the automatic speech recognition system, that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state;

applying, by the automatic speech recognition system, the first language model to a first speech input in response to determining that the target application is in the first state;

receiving, by the automatic speech recognition system, a second plurality of inputs into the target application while the target application is in a second state that differs from the first state;

training, by the automatic speech recognition system, a second language model based on the second plurality of inputs;

determining, by the automatic speech recognition system, that the target application is in the second state; and

applying, by the automatic speech recognition system, the second language model to second speech input in response to determining that the target application is in the second state.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided. For each state, a corresponding language model is trained based on the input(s) provided to the application while the application was in that state. When the application is next observed to be in a previously-observed state, a language model associated with the application'"'"'s current state is applied to recognize speech input provided by a user and thereby to generate speech recognition output that is provided to the application. An application'"'"'s state at a particular time may include the user interface element(s) that are displayed and/or in focus at that time, and is determined by an operating system hooking component embedded in the automatic speech recognition system.

16 Citations

View as Search Results

62 Claims

1. A computer-implemented method performed by at least one computer processor executing computer program instructions tangibly stored on at least one non-transitory computer-readable medium, the method comprising using the at least one computer processor to perform operations of:
- receiving, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into a user interface element of a target application while the target application is in a first state;
  
  training, by the automatic speech recognition system, a first language model based on the first plurality of inputs, comprising;
  
  identifying, by an operating system hooking component included in the speech recognition system executed by the at least one computer processor, a state of the user interface element by intercepting messages between the said user interface element and the computer processor'"'"'s operating system while the user interface element is displayed in a foreground of a graphical user interface; and
  
  associating, by the automatic speech recognition system, the first language model with the user interface element;
  
  determining, by the automatic speech recognition system, that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state;
  
  applying, by the automatic speech recognition system, the first language model to a first speech input in response to determining that the target application is in the first state;
  
  receiving, by the automatic speech recognition system, a second plurality of inputs into the target application while the target application is in a second state that differs from the first state;
  
  training, by the automatic speech recognition system, a second language model based on the second plurality of inputs;
  
  determining, by the automatic speech recognition system, that the target application is in the second state; and
  
  applying, by the automatic speech recognition system, the second language model to second speech input in response to determining that the target application is in the second state.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 2. The method of claim 1, further comprising providing, to the target application, a result of applying the first language model to the first speech input.
  - 3. The method of claim 2, wherein providing further comprises emulating a keyboard event to submit at least one of text and a control sequence to the target application.
  - 4. The method of claim 2, wherein providing further comprises:
    - posting the result to a clipboard buffer maintained by an operating system executing the target application; and
      
      receiving, by the target application, the result from the clipboard buffer.
  - 5. The method of claim 2, wherein providing further comprises:
    - generating an operating system message including the result; and
      
      transmitting the operating system message to the target application.
  - 6. The method of claim 2, wherein providing further comprises emulating a pointing device event to submit at least one of text and a control sequence to the target application.
  - 7. The method of claim 2, wherein providing further comprises:
    - identifying, by an operating system hooking component, a user interface having input focus; and
      
      providing the result to the user interface having input focus.
  - 8. The method of claim 1, wherein receiving further comprises receiving the first plurality of inputs from a text-based input device.
  - 9. The method of claim 1, wherein receiving further comprises receiving the first plurality of inputs from a pointing device.
  - 10. The method of claim 1, wherein receiving further comprises receiving the first plurality of inputs from a speech input device.
  - 11. The method of claim 1, wherein receiving further comprises receiving a speech-based input and a text-based input in the first plurality of inputs.
  - 12. The method of claim 1, wherein receiving further comprises receiving a speech-based input and input from a pointing device in the first plurality of inputs.
  - 13. The method of claim 1, wherein training the first language model further comprises:
    - receiving a second plurality of inputs into a second copy of the target application, while the second copy of the target application is in the first state and executing on a different computing device than a computing device on which the first plurality of inputs is received; and
      
      modifying the first language model based on the second plurality of inputs.
  - 14. The method of claim 1, wherein training the first language model comprises:
    - identifying a pattern of use of a user interface associated with the first plurality of inputs; and
      
      modifying the first language model based on the pattern of use.
  - 15. The method of claim 1, further comprising modifying a resource accessed by the language model, based on the first plurality of inputs.
  - 16. The method of claim 1, further comprising configuring a parameter governing whether to interpret an utterance as a grammar, based on the first plurality of inputs.
  - 17. The method of claim 1, further comprising configuring a parameter governing whether to interpret an utterance as text, based on the first plurality of inputs.
  - 18. The method of claim 1, further comprising:
    - identifying, in the plurality of inputs, a plurality of input values;
      
      identifying a frequency with which each of the plurality of input values occurs in the plurality of inputs; and
      
      training the first language model based on the identified frequency.
  - 19. The method of claim 1, further comprising:
    - identifying, for one of the plurality of inputs, an input value;
      
      determining that the input value is an instance of a concept;
      
      identifying, in the plurality of inputs, a number of instances of the concept;
      
      identifying a frequency with which the concept occurs in the plurality of inputs; and
      
      training the first language model based on the identified frequency.
  - 20. The method of claim 1, wherein training the first language model further comprises associating a probability with a word in the language model.
  - 21. The method of claim 1, further comprising modifying the language model, based on a type of a user interface element provided by the target application and receiving the first plurality of inputs.
  - 22. The method of claim 1, further comprising:
    - identifying, by an operating system hooking component, a user interface element displayed in a foreground of a graphical user interface; and
      
      associating the first language model with the identified user interface element.
  - 23. The method of claim 1, further comprising:
    - identifying, by an operating system hooking component, a state of a user interface element displayed in a foreground of a graphical user interface; and
      
      associating the first language model with the identified user interface element.
  - 24. The method of claim 1, further comprising:
    - identifying, by an operating system hooking component, a user interface element into which one of the first plurality of inputs is provided; and
      
      associating the first language model with the identified user interface element.
  - 25. The method of claim 1, further comprising:
    - identifying, by an operating system hooking component, a target application associated with a user interface element into which one of the first plurality of inputs is provided; and
      
      associating the first language model with the identified application.
  - 26. The method of claim 1, further comprising:
    - identifying, by an operating system hooking component, a plurality of user interface elements displayed by a target application associated with a user interface element into which one of the first plurality of inputs is provided; and
      
      associating the first language model with the identified plurality of user interface elements.
  - 27. The method of claim 1, wherein determining that the target application is in the first state further comprises:
    - analyzing application data to determine that the target application is in the first state;
      
      comparing the determined first state of the target application to a state associated with the first language model; and
      
      determining that the determined first state of the target application and the state associated with the first language model are substantially the same state.
  - 28. The method of claim 1, wherein determining that the target application is in the first state further comprises:
    - comparing application data of the target application to application data associated with the first language model; and
      
      determining that the application data of the target application and the application data associated with the first language model are substantially the same data.
  - 29. The method of claim 1, wherein determining that the target application is in the first state further comprises:
    - analyzing information provided by a context sharing application to determine that the target application is in the first state;
      
      comparing the determined first state of the target application to a state associated with the first language model; and
      
      determining that the determined first state of the target application and the state associated with the first language model are substantially the same state.
  - 30. The method of claim 1, wherein applying further comprises applying the first language model to a first speech input after achieving a degree of confidence in a level of accuracy of the first language model.

31. An automated speech recognition system comprising:
- means for receiving a first plurality of inputs into a user interface element of a target application while the target application is in a first state;
  
  means for training a first language model based on the first plurality of inputs, comprising;
  
  means for receiving, from an operating system hooking component included in the speech recognition system, an identification of a state of the user interface element by intercepting messages between the said user interface element and the speech recognition system'"'"'s associated operating system while the user interface element is displayed in a foreground of a graphical user interface;
  
  means for associating the first language model with the user interface element;
  
  means for determining that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state;
  
  means for applying the first language model to a first speech input in response to determining that the target application is in the first state;
  
  means for receiving a second plurality of inputs into the application while the target application is in a second state that differs from the first state;
  
  means for training a second language model based on the second plurality of inputs;
  
  means for determining that the target application is in the second state;
  
  and means for applying the second language model to second speech input in response to determining that the target application is in the second state.
- View Dependent Claims (32)
- - 32. The automated speech recognition system of claim 31, further comprising an operating system hooking component retrieving the first plurality of inputs.

33. A non-transitory computer readable medium storing computer program instructions executable by at least one computer processor to perform a method, the method comprising:
- Receiving, by an automatic speech recognition system executed by the at least one computer processor, a first plurality of inputs into a target application while the target application is in a first state;
  
  training a first language model based on the first plurality of inputs;
  
  identifying, by an operating system hooking component included in the speech recognition system, a user interface element displayed in a foreground of a graphical user interface and its state by intercepting messages between the said user interface element and the computer processor'"'"'s operating system;
  
  associating the first language model with the identified user interface element;
  
  determining that the target application is in the first state, wherein determining further comprises determining that the user interface element is in the identified state;
  
  applying the first language model to a first speech input in response to determining that the target application is in the first state;
  
  receiving a second plurality of inputs into the target application while the target application is in a second state that differs from the first state;
  
  means for training a second language model based on the second plurality of inputs;
  
  means for determining that the target application is in the second state;
  
  and means for applying the second language model to second speech input in response to determining that the target application is in the second state.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62)
- - 34. The computer readable medium of claim 33, wherein the method further comprises providing, to the target application, a result of applying the first language model to the first speech input.
  - 35. The computer readable medium of claim 34, wherein providing further comprises emulating a keyboard event to submit at least one of text and a control sequence to the target application.
  - 36. The computer readable medium of claim 34, wherein providing further comprises:
    - posting the result to a clipboard buffer maintained by an operating system executing the target application; and
      
      receiving, by the target application, the result from the clipboard buffer.
  - 37. The computer readable medium of claim 34, wherein providing further comprises:
    - generating an operating system message including the result; and
      
      transmitting the operating system message to the application.
  - 38. The computer readable medium of claim 34, wherein providing further comprises emulating a pointing device event to submit at least one of text and a control sequence to the target application.
  - 39. The computer readable medium of claim 34, wherein providing further comprises:
    - identifying, by an operating system hooking component, a user interface having input focus; and
      
      providing the result to the user interface having input focus.
  - 40. The computer readable medium of claim 33, wherein receiving further comprises receiving the first plurality of inputs from a text-based input device.
  - 41. The computer readable medium of claim 33, wherein receiving further comprises receiving the first plurality of inputs from a pointing device.
  - 42. The computer readable medium of claim 33, wherein receiving further comprises receiving the first plurality of inputs from a speech input device.
  - 43. The computer readable medium of claim 33, wherein receiving further comprises receiving a speech-based input and a text-based input in the first plurality of inputs.
  - 44. The computer readable medium of claim 33, wherein receiving further comprises receiving a speech-based input and input from a pointing device in the first plurality of inputs.
  - 45. The computer readable medium of claim 33, wherein training the first language model further comprises:
    - receiving a second plurality of inputs into a second copy of the target application, while the second copy of the target application is in the first state and executing on a different computing device than a computing device on which the first plurality of inputs is received; and
      
      modifying the first language model based on the second plurality of inputs.
  - 46. The computer readable medium of claim 33, wherein training the first language model comprises:
    - identifying a pattern of use of a user interface associated with the first plurality of inputs; and
      
      modifying the first language model based on the pattern of use.
  - 47. The computer readable medium of claim 33, wherein the method further comprises modifying a resource accessed by the language model, based on the first plurality of inputs.
  - 48. The computer readable medium of claim 33, wherein the method further comprises configuring a parameter governing whether to interpret an utterance as a grammar, based on the first plurality of inputs.
  - 49. The computer readable medium of claim 33, wherein the method further comprises configuring a parameter governing whether to interpret an utterance as text, based on the first plurality of inputs.
  - 50. The computer readable medium of claim 33, wherein the method further comprises:
    - identifying, in the plurality of inputs, a plurality of input values;
      
      identifying a frequency with which each of the plurality of input values occurs in the plurality of inputs; and
      
      training the first language model based on the identified frequency.
  - 51. The computer readable medium of claim 33, wherein the method further comprises:
    - identifying, for one of the plurality of inputs, an input value;
      
      determining that the input value is an instance of a concept;
      
      identifying, in the plurality of inputs, a number of instances of the concept;
      
      identifying a frequency with which the concept occurs in the plurality of inputs; and
      
      training the first language model based on the identified frequency.
  - 52. The computer readable medium of claim 33, wherein training the first language model further comprises associating a probability with a word in the language model.
  - 53. The computer readable medium of claim 33, wherein the method further comprises modifying the language model, based on a type of a user interface element provided by the application and receiving the first plurality of inputs.
  - 54. The computer readable medium of claim 33, wherein the method further comprises:
    - identifying, by an operating system hooking component, a state of a user interface element displayed in a foreground of a graphical user interface; and
      
      associating the first language model with the identified user interface element.
  - 55. The computer readable medium of claim 33, wherein the method further comprises:
    - identifying, by an operating system hooking component, a user interface element having input focus; and
      
      associating the first language model with the identified user interface element.
  - 56. The computer readable medium of claim 33, wherein the method further comprises:
    - identifying, by an operating system hooking component, a user interface element into which one of the first plurality of inputs is provided; and
      
      associating the first language model with the identified user interface element.
  - 57. The computer readable medium of claim 33, wherein the method further comprises:
    - identifying, by an operating system hooking component, a target application associated with a user interface element into which one of the first plurality of inputs is provided; and
      
      associating the first language model with the identified target application.
  - 58. The computer readable medium of claim 33, wherein the method further comprises:
    - identifying, by an operating system hooking component, a plurality of user interface elements displayed by a target application associated with a user interface element into which one of the first plurality of inputs is provided; and
      
      associating the first language model with the identified plurality of user interface elements.
  - 59. The computer readable medium of claim 33, wherein determining that the application is in the first state further comprises:
    - analyzing application data to determine that the target application is in the first state;
      
      comparing the determined first state of the target application to a state associated with the first language model; and
      
      determining that the determined first state and the state associated with the first language model are substantially the same state.
  - 60. The computer readable medium of claim 33, wherein determining further comprises:
    - comparing application data of the target application to application data associated with the first language model; and
      
      determining that the application data of the target application and the application data associated with the first language model are substantially the same data.
  - 61. The computer readable medium of claim 33, wherein determining that the target application is in the first state further comprises:
    - analyzing information provided by a context sharing application to determine that the target application is in the first state;
      
      comparing the determined first state of the target application to a state associated with the first language model; and
      
      determining that the determined first state of the target application and the state associated with the first language model are substantially the same state.
  - 62. The computer readable medium of claim 33, wherein applying further comprises applying the first language model to a first speech input after achieving a degree of confidence in a level of accuracy of the first language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Solventum Intellectual Properties Company (Solventum Corp.)
Original Assignee
MModal IP LLC (3M Company)
Inventors
Koll, Detlef, Finke, Michael
Primary Examiner(s)
Kazeminezhad, Farzad

Application Number

US13/526,789
Publication Number

US 20120323557A1
Time in Patent Office

1,603 Days
Field of Search

704/275
US Class Current

1/1
CPC Class Codes

G06F 3/04817   using icons graphical or vi...

G06F 40/274   Converting codes to words; ...

G06F 9/451   Execution arrangements for ...

G06F 9/454   Multi-language systems; Loc...

G10L 15/063   Training

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0635   updating or merging of old ...

G10L 2015/223   Execution procedure of a sp...

Speech recognition using an operating system hooking component for context-aware recognition models

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

16 Citations

62 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using an operating system hooking component for context-aware recognition models

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

62 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links