Speech Recognition Using Loosely Coupled Components

US 20120316871A1
Filed: 06/08/2012
Published: 12/13/2012
Est. Priority Date: 06/13/2011
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a first device including an audio capture component, the audio capture component comprising means for capturing an audio signal representing speech of a user to produce a captured audio signal;

a speech recognition processing component comprising means for performing automatic speech recognition on the captured audio signal to produce speech recognition results;

a second device including a result processing component;

a context sharing component comprising;

means for determining that the result processing component is associated with a current context of the user; and

wherein the result processing component comprises means for processing the speech recognition results to produce result output.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic speech recognition system includes an audio capture component, a speech recognition processing component, and a result processing component which are distributed among two or more logical devices and/or two or more physical devices. In particular, the audio capture component may be located on a different logical device and/or physical device from the result processing component. For example, the audio capture component may be on a computer connected to a microphone into which a user speaks, while the result processing component may be on a terminal server which receives speech recognition results from a speech recognition processing server.

10 Citations

View as Search Results

104 Claims

1. A system comprising:
- a first device including an audio capture component, the audio capture component comprising means for capturing an audio signal representing speech of a user to produce a captured audio signal;
  
  a speech recognition processing component comprising means for performing automatic speech recognition on the captured audio signal to produce speech recognition results;
  
  a second device including a result processing component;
  
  a context sharing component comprising;
  
  means for determining that the result processing component is associated with a current context of the user; and
  
  wherein the result processing component comprises means for processing the speech recognition results to produce result output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 2. The system of claim 1:
    - wherein the system further comprises means for providing the speech recognition results to the result processing component in response to the determination that the result processing component is associated with the current context of the user.
  - 3. The system of claim 2, wherein the context sharing component comprises the means for providing the speech recognition results to the result processing component.
  - 4. The system of claim 2, wherein the speech recognition processing component comprises the means for providing the speech recognition results to the result processing component.
  - 5. The system of claim 2, wherein the means for determining comprises means for determining, at run-time, that the result processing component is associated with the user.
  - 6. The system of claim 2, wherein the means for providing the speech recognition results to the result processing component comprises means for providing the speech recognition results to the result processing component in real-time.
  - 7. The system of claim 1, wherein the system further comprises an audio capture device coupled to the first device, and wherein the audio capture device comprises means for capturing the speech of the user, means for producing the audio signal representing the speech of the user, and means for providing the audio signal to the audio capture component;
    - andwherein the first device does not include the speech recognition processing component.
  - 8. The system of claim 7, wherein the first device further comprises means for transmitting the captured audio signal to the speech recognition processing component over a network connection.
  - 9. The system of claim 1, wherein the second device further includes a terminal session manager.
  - 10. The system of claim 9, wherein the second device further includes the speech recognition processing component.
  - 11. The system of claim 9, wherein the first device further comprises a terminal services client, wherein the terminal services client comprises means for establishing a terminal services connection with the terminal session manager.
  - 12. The system of claim 9, further comprising a third device, wherein the third device includes the speech recognition processing component, and wherein the third device does not include a terminal session manager.
  - 13. The system of claim 1, wherein the second device further includes the speech recognition processing component.
  - 14. The system of claim 1, further comprising a third device, wherein the third device includes the speech recognition processing component.
  - 15. The system of claim 1, wherein the first device comprises a logical device.
  - 16. The system of claim 1, wherein the first device comprises a physical device.
  - 17. The system of claim 1, wherein the second device comprises a logical device.
  - 18. the system of claim 1, wherein the second device comprises a physical device.
  - 19. The system of claim 1:
    - wherein the first device further comprises the speech recognition processing component;
      
      wherein the system further comprises a third device;
      
      wherein the second device further includes means for providing the result output to the third device; and
      
      wherein the third device comprises means for providing output representing the result output to the user.
  - 20. The system of claim 19:
    - wherein the third device comprises a terminal services client;
      
      wherein the means for providing the result output to the third device comprises a terminal session manager in the second device; and
      
      wherein the terminal services client comprises the means for providing output representing the result output to the user.
  - 21. The system of claim 20, further comprising:
    - an audio capture device comprising means for capturing the speech of the user, means for producing the audio signal representing the speech of the user, and means for transmitting the audio signal to the audio capture component over a network connection.
  - 22. The system of claim 21, wherein the audio capture device is not connected to the third device.
  - 23. The system of claim 20, wherein the second device further includes the speech recognition processing component.
  - 24. The system of claim 20, further comprising a third device, wherein the third device includes the speech recognition processing component.
  - 25. The system of claim 1, wherein the result processing component further comprises:
    - means for providing the result output to an application;
      
      means for obtaining data representing a state of the application; and
      
      means for providing the data representing the state of the application to the speech recognition processing component.
  - 26. The system of claim 25, wherein the speech recognition processing component further comprises:
    - means for receiving the data representing the state of the application; and
      
      means for changing a speech recognition context of the speech recognition processing component based on the state of the application.
  - 27. The system of claim 26, wherein the means for changing the speech recognition context comprises means for changing a language model of the speech recognition processing component.
  - 28. The system of claim 26, wherein the means for changing the speech recognition context comprises means for changing an acoustic model of the speech recognition processing component.
  - 29. The system of claim 1, wherein the means for performing automatic speech recognition comprises means for performing automatic speech recognition on the captured audio signal to produce the speech recognition results in real-time.

30. A method, for use with a system, the method performed by at least one processor executing computer program instructions stored on a non-transitory computer-readable medium:
- wherein the system comprises;
  
  a first device including an audio capture component;
  
  a speech recognition processing component; and
  
  a second device including a result processing component;
  
  wherein the method comprises;
  
  (A) using the audio capture component to capture an audio signal representing speech of a user to produce a captured audio signal;
  
  (B) using the speech recognition processing component to perform automatic speech recognition on the captured audio signal to produce speech recognition results;
  
  (C) determining that the result processing component is associated with a current context of the user;
  
  (D) in response to the determination that the result processing component is associated with the current context of the user, providing the speech recognition results to the result processing component; and
  
  (E) using the result processing component to process the speech recognition results to produce result output.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58)
- - 31. The method of claim 30, further comprising:
    - (F) providing the speech recognition results to the result processing component in response to the determination that the result processing component is associated with the current context of the user.
  - 32. The method of claim 30, wherein the system further comprises a context sharing component, and wherein the context sharing component performs (C) and (D).
  - 33. The method of claim 30, wherein the speech recognition processing component performs (D).
  - 34. The method of claim 30, wherein (C) comprises determining at run-time that the result processing component is associated with the user.
  - 35. The method of claim 30, wherein (D) comprises providing the speech recognition results to the result processing component in real-time.
  - 36. The method of claim 30:
    - wherein the system further comprises;
      
      an audio capture device coupled to the first device; and
      
      wherein the method further comprises using the audio capture device to;
      
      (F) capture the speech of the user;
      
      (G) produce the audio signal representing the speech of the user;
      
      (H) providing the audio signal to the audio capture component; and
      
      wherein the first device does not include the speech recognition processing component.
  - 37. The method of claim 36, further comprising:
    - (I) using the first device transmit the captured audio signal to the speech recognition processing component over a network connection.
  - 38. The method of claim 30, wherein the second device further includes a terminal session manager.
  - 39. The method of claim 38, wherein the second device further includes the speech recognition processing component.
  - 40. The method of claim 38, wherein the first device further comprises a terminal services client, wherein the method further comprises:
    - (F) using the terminal services client to establish a terminal services connection with the terminal session manager.
  - 41. The method of claim 38, further comprising a third device, wherein the third device includes the speech recognition processing component, and wherein the third device does not include a terminal session manager.
  - 42. The method of claim 30, wherein the second device further includes the speech recognition processing component.
  - 43. The method of claim 30, further comprising a third device, wherein the third device includes the speech recognition processing component.
  - 44. The method of claim 30, wherein the first device comprises a logical device.
  - 45. The method of claim 30, wherein the first device comprises a physical device.
  - 46. The method of claim 30, wherein the second device comprises a logical device.
  - 47. the method of claim 30, wherein the second device comprises a physical device.
  - 48. The method of claim 30:
    - wherein the first device further comprises the speech recognition processing component;
      
      wherein the system further comprises a third device;
      
      wherein the second device further includes means for providing the result output to the third device; and
      
      wherein the method further comprises;
      
      (F) using the means for providing the result output to provide the result output to the third device; and
      
      (G) using the third device to provide output representing the result output to the user.
  - 49. The method of claim 48:
    - wherein the third device comprises a terminal services client;
      
      wherein the means for providing the result output to the third device comprises a terminal session manager in the second device; and
      
      wherein the terminal services client comprises the means for providing output representing the result output to the user.
  - 50. The method of claim 49:
    - wherein the system further comprises an audio capture device; and
      
      wherein the method further comprises using the audio capture device to;
      
      (H) capture the speech of the user;
      
      (I) produce the audio signal representing the speech of the user; and
      
      (J) transmit the audio signal to the audio capture component over a network connection.
  - 51. The method of claim 50, wherein the audio capture device is not connected to the third device.
  - 52. The method of claim 49, wherein the second device further includes the speech recognition processing component.
  - 53. The method of claim 49, wherein the system further comprises a third device, and wherein the third device includes the speech recognition processing component.
  - 54. The method of claim 30, further comprising using the result processing component to:
    - (F) provide the result output to an application;
      
      (G) obtain data representing a state of the application; and
      
      (H) provide the data representing the state of the application to the speech recognition processing component.
  - 55. The method of claim 30, further comprising using the speech recognition processing component to:
    - (F) receive the data representing the state of the application; and
      
      (G) change a speech recognition context of the speech recognition processing component based on the state of the application.
  - 56. The method of claim 55, wherein (E) comprises changing a language model of the speech recognition processing component.
  - 57. The method of claim 55, wherein (E) comprises changing an acoustic model of the speech recognition processing component.
  - 58. The method of claim 30, wherein (B) comprises performing automatic speech recognition on the captured audio signal to produce the speech recognition results in real-time.

59. A system comprising:
- an audio capture component, the audio capture component comprising means for capturing a first audio signal representing first speech of a user to produce a first captured audio signal;
  
  a speech recognition processing component comprising means for performing automatic speech recognition on the first captured audio signal to produce first speech recognition results;
  
  a first result processing component, the first result processing component comprising first means for processing the first speech recognition results to produce first result output;
  
  a second result processing component, the second result processing component comprising second means for processing the first speech recognition results to produce second result output;
  
  a context sharing component comprising means for identifying a first one of the first and second result processing components as being associated with a first context of the user at a first time; and
  
  speech recognition result provision means for providing the first speech recognition results to the identified first one of the first and second result processing components.
- View Dependent Claims (60)
- - 60. The system of claim 59, wherein:
    - the audio capture component further comprises means for capturing a second audio signal representing second speech of the user to produce a second captured audio signal;
      
      the speech recognition processing component further comprises means for performing automatic speech recognition on the second captured audio signal to produce second speech recognition results;
      
      the context sharing component further comprises means for identifying a second one of the first and second result processing components as being associated with a second context of the user at a second time, wherein the second one of the first and second result processing components differs from the first one of the first and second result processing components; and
      
      wherein the speech recognition result provision means further comprises means for providing the second speech recognition results to the identified second one of the first and second result processing components.

61. A computer-implemented method for use with a system:
- wherein the system comprises;
  
  an audio capture component;
  
  a speech recognition processing component;
  
  a first result processing component;
  
  a second result processing component;
  
  a context sharing component; and
  
  speech recognition result provision means;
  
  wherein the method comprises;
  
  (A) using the audio capture component to capture a first audio signal representing first speech of a user to produce a first captured audio signal;
  
  (B) using the speech recognition processing component to perform automatic speech recognition on the first captured audio signal to produce first speech recognition results;
  
  (C) using the first result processing component to process the first speech recognition results to produce first result output;
  
  (D) using second result processing component to process the first speech recognition results to produce second result output;
  
  (E) using the context sharing component to identify a first one of the first and second result processing components as being associated with a first context of the user at a first time;
  
  (F) using the speech recognition result provision means to provide the first speech recognition results to the identified first one of the first and second result processing components.
- View Dependent Claims (62)
- - 62. The method of claim 61, further comprising:
    - (G) using the audio capture component to capture a second audio signal representing second speech of the user to produce a second captured audio signal;
      
      (H) using the speech recognition processing component to perform automatic speech recognition on the second captured audio signal to produce second speech recognition results;
      
      (I) using the context sharing component to identify a second one of the first and second result processing components as being associated with a second context of the user at a second time, wherein the second one of the first and second result processing components differs from the first one of the first and second result processing components; and
      
      (J) using the speech recognition result provision means to provide the second speech recognition results to the identified second one of the first and second result processing components.

63. A system comprising:
- a first audio capture component comprising first means for capturing a first audio signal representing speech of a user to produce a first captured audio signal;
  
  a first speech recognition processing component comprising first means for performing automatic speech recognition on the first captured audio signal to produce first speech recognition results;
  
  a first result processing component comprising first means for processing the first speech recognition results to produce first result output;
  
  a context sharing component comprising means for dynamically coupling at least two of the first audio capture component, the first speech recognition processing component, and the first result processing component to each other.
- View Dependent Claims (64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81)
- - 64. The system of claim 63, further comprising a first device, wherein the first device comprises the first audio capture component, and a second device, wherein the second device includes the first result processing component, wherein the first device is distinct from the second device.
  - 65. The system of claim 63, wherein the context sharing component comprises means for dynamically coupling the first audio capture component to the first speech recognition processing component.
  - 66. The system of claim 65:
    - further comprising a second audio capture component comprising second means for capturing a second audio signal representing speech of a user to produce a second captured audio signal; and
      
      wherein the context sharing component further comprises means for dynamically coupling the second audio capture component to the first speech recognition processing component.
  - 67. The system of claim 65, further comprising:
    - means for providing the first captured audio signal to the first speech recognition processing component after dynamically coupling the first audio capture component to the first speech recognition processing component.
  - 68. The system of claim 67, wherein the context sharing component comprises the means for providing the first captured audio signal.
  - 69. The system of claim 67, wherein the first audio capture component comprises the means for providing the first captured audio signal.
  - 70. The system of claim 67, wherein the first speech recognition processing component comprises the means for providing the first captured audio signal.
  - 71. The system of claim 67, wherein the means for providing comprises means for providing the first captured audio signal to the first speech recognition processing component in real-time.
  - 72. The system of claim 63, wherein the context sharing component comprises means for dynamically coupling the first audio capture component to the first result processing component.
  - 73. The system of claim 63, wherein the context sharing component comprises means for dynamically coupling the first speech recognition processing component to the first result processing component.
  - 74. The system of claim 73:
    - further comprising a second speech recognition processing component comprising second means for performing automatic speech recognition on the first captured audio signal to produce second speech recognition results; and
      
      wherein the context sharing component further comprises means for dynamically coupling the first audio capture component to the second speech recognition processing component.
  - 75. The system of claim 73, further comprising:
    - means for providing the first speech recognition results to the first result processing component after dynamically coupling the first speech recognition processing component to the first result processing component.
  - 76. The system of claim 75, wherein the context sharing component comprises the means for providing the first speech recognition results.
  - 77. The system of claim 75, wherein the first speech recognition processing component comprises the means for providing the first speech recognition results.
  - 78. The system of claim 75, wherein the first result processing component comprises the means for providing the first speech recognition results.
  - 79. The system of claim 75, wherein the means for providing comprises means for providing the first speech recognition results to the first result processing component in real-time.
  - 80. The system of claim 63, wherein the context sharing component comprises means for dynamically coupling the first audio capture component to the first speech recognition processing component and for dynamically coupling the first speech recognition processing component to the first result processing component.
  - 81. The system of claim 63, wherein the means for dynamically coupling comprises means for dynamically coupling at least two of the first audio capture component, the first speech recognition processing component, and the first result processing component to each other at run-time.

82. A computer-implemented method for us with a system:
- wherein the system comprises;
  
  a first audio capture component;
  
  a first speech recognition processing component;
  
  a first result processing component; and
  
  a context sharing component;
  
  wherein the method comprises;
  
  (A) using the first audio capture component to capture a first audio signal representing speech of a user to produce a first captured audio signal;
  
  (B) using the first speech recognition processing component to perform automatic speech recognition on the first captured audio signal to produce first speech recognition results;
  
  (C) using the first result processing component to process the first speech recognition results to produce first result output;
  
  (D) using the context sharing component to dynamically couple at least two of the first audio capture component, the first speech recognition processing component, and the first result processing component to each other.
- View Dependent Claims (83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)
- - 83. The method of claim 82:
    - wherein the system further comprises;
      
      a first device, wherein the first device comprises the first audio capture component; and
      
      a second device, wherein the second device includes the first result processing component;
      
      wherein the first device is distinct from the second device.
  - 84. The method of claim 82, wherein (D) comprises dynamically coupling the first audio capture component to the first speech recognition processing component.
  - 85. The method of claim 82:
    - wherein the system further comprises a second audio capture component;
      
      wherein the method further comprises;
      
      (E) using the second audio capture component to capture a second audio signal representing speech of a user to produce a second captured audio signal; and
      
      wherein (D) comprises dynamically coupling the second audio capture component to the first speech recognition processing component.
  - 86. The method of claim 84, further comprising:
    - (E) providing the first captured audio signal to the first speech recognition processing component after dynamically coupling the first audio capture component to the first speech recognition processing component.
  - 87. The method of claim 86, wherein (E) is performed by the context sharing component.
  - 88. The method of claim 86, wherein (E) is performed by the first audio capture component.
  - 89. The system of claim 86, wherein (E) is performed by the first speech recognition processing component.
  - 90. The method of claim 86, wherein (E) comprises providing the first captured audio signal to the first speech recognition processing component in real-time.
  - 91. The method of claim 82, wherein (D) comprises dynamically coupling the first audio capture component to the first result processing component.
  - 92. The method of claim 82, wherein (D) comprises dynamically coupling the first speech recognition processing component to the first result processing component.
  - 93. The method of claim 92:
    - wherein the system further comprises a second speech recognition processing component; and
      
      wherein the method further comprises;
      
      (E) performing automatic speech recognition on the first captured audio signal to produce second speech recognition results; and
      
      wherein (D) comprises dynamically coupling the first audio capture component to the second speech recognition processing component.
  - 94. The method of claim 92, further comprising:
    - (F) providing the first speech recognition results to the first result processing component after dynamically coupling the first speech recognition processing component to the first result processing component.
  - 95. The method of claim 94, wherein (F) is performed by the context sharing component.
  - 96. The method of claim 94, wherein (F) is performed by the first speech recognition processing component.
  - 97. The method of claim 94, wherein (F) is performed by the first result processing component.
  - 98. The method of claim 94, wherein (F) comprises providing the first speech recognition results to the first result processing component in real-time.
  - 99. The method of claim 82, wherein (D) comprises dynamically coupling the first audio capture component to the first speech recognition processing component and dynamically coupling the first speech recognition processing component to the first result processing component.
  - 100. The method of claim 82, wherein (D) comprises dynamically coupling at least two of the first audio capture component, the first speech recognition processing component, and the first result processing component to each other at run-time.

101. A system comprising:
- a first machine comprising;
  
  a target application; and
  
  a result processing component comprising;
  
  means for processing first speech recognition results to produce result output;
  
  means for providing the result output to the target application;
  
  an audio capture device, wherein the first machine does not include the audio capture device; and
  
  a context sharing component comprising means for logically coupling the result processing component to the audio capture device.
- View Dependent Claims (102)
- - 102. The system of claim 101, wherein the audio capture device comprises a telephone.

103. A computer-implemented method for use with a system:
- the system comprising;
  
  a first machine comprising;
  
  a target application; and
  
  a result processing component comprising;
  
  an audio capture device, wherein the first machine does not include the audio capture device; and
  
  a context sharing component;
  
  wherein the method comprises;
  
  (A) using the result processing component to process first speech recognition results to produce result output;
  
  (B) using the result processing component to provide the result output to the target application; and
  
  (C) using the context sharing component to logically couple the result processing component to the audio capture device.
- View Dependent Claims (104)
- - 104. The method of claim 103, wherein the audio capture device comprises a telephone.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
3M Health Information Systems (3M Company)
Original Assignee
MModal IP LLC (3M Company)
Inventors
Koll, Detlef, Finke, Michael

Granted Patent

US 9,082,408 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/228   of application context

Speech Recognition Using Loosely Coupled Components

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

10 Citations

104 Claims

Specification

Solutions

Use Cases

Quick Links

Speech Recognition Using Loosely Coupled Components

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

10 Citations

104 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links