Configurable speech recognition system using multiple recognizers
First Claim
1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:
- receiving, by the electronic device, input audio comprising speech;
determining that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer;
generating a search tree associated with the recognition grammar, wherein the search tree includes a plurality of nodes, wherein each of the nodes is associated with a type of item in the recognition grammar;
determining whether the recognition grammar includes at least one generic speech nod, wherein determining whether the recognition grammar includes at least one generic speech node comprises determining whether the search tree includes only nodes associated with types of items that can be recognized by the embedded speech recognizer with an accuracy above a threshold;
determining that recognition by the remote speech recognizer is desired in response to determining that the recognition grammar includes at least one generic speech node indicating that the speech in the input audio may include free-form dictation; and
sending at least a portion of the input audio to the network device in response to determining that recognition by the remote speech recognizer is desired.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.
-
Citations
18 Claims
-
1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:
-
receiving, by the electronic device, input audio comprising speech; determining that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer; generating a search tree associated with the recognition grammar, wherein the search tree includes a plurality of nodes, wherein each of the nodes is associated with a type of item in the recognition grammar; determining whether the recognition grammar includes at least one generic speech nod, wherein determining whether the recognition grammar includes at least one generic speech node comprises determining whether the search tree includes only nodes associated with types of items that can be recognized by the embedded speech recognizer with an accuracy above a threshold; determining that recognition by the remote speech recognizer is desired in response to determining that the recognition grammar includes at least one generic speech node indicating that the speech in the input audio may include free-form dictation; and sending at least a portion of the input audio to the network device in response to determining that recognition by the remote speech recognizer is desired. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:
-
receiving, by the electronic device, input audio comprising speech; determining that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer, wherein determining that at least a portion of the input audio matches a recognition grammar comprises determining that the at least a portion of the input audio matches a voice command for sending a text message, and wherein the recognition grammar is a text message grammar for the voice command that includes at least one generic speech node corresponding to text inserted into a body of the text message; sending at least a portion of the input audio to the network device in response to determining that the at least a portion of the input audio matches the voice command for sending a text message.
-
-
10. An electronic device for use in a distributed speech recognition system comprising the electronic device and a network device remote from the electronic device, the electronic device, comprising:
-
an embedded speech recognizer configured to receive input audio comprising speech; and at least one processor programmed to; determine that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer; generate a search tree associated with the recognition grammar, wherein the search tree includes a plurality of nodes, wherein each of the nodes is associated with a type of item in the recognition grammar; determine whether the recognition grammar includes at least one generic speech node, wherein determining whether the recognition grammar includes at least one generic speech node comprises determining whether the search tree includes only nodes associated with types of items that can be recognized by the embedded speech recognizer with an accuracy above a threshold; determine that recognition by the remote speech recognizer is desired in response to determining that the recognition grammar includes at least one generic speech node indicating that the speech in the input audio may include free-form dictation; and send at least a portion of the input audio to the network device in response to determining that recognition by the remote speech recognizer is desired. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. An electronic device for use in a distributed speech recognition system comprising the electronic device and a network device remote from the electronic device, the electronic device, comprising:
-
an embedded speech recognizer configured to receive input audio comprising speech; and at least one processor programmed to; determine that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer, wherein determining that at least a portion of the input audio matches a recognition grammar comprises determining that at least a portion of the input audio matches a voice command for sending a text message, and wherein the recognition grammar is a text message grammar for the voice command that includes at least one generic speech node corresponding to text inserted into a body of the text message; and send at least a portion of the input audio to the network device in response to determining that at least a portion of the input audio matches the voice command for sending a text message.
-
Specification