Configurable speech recognition system using multiple recognizers
First Claim
1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:
- receiving, by the electronic device, input audio comprising speech;
accessing, prior to sending at least a portion of the input audio to the network device, personal information stored on the electronic device, wherein the personal information is associated with the electronic device and/or a user of the electronic device;
processing, by the embedded speech recognizer prior to sending at least a portion of the input audio to the network device, at least a portion of the input audio to produce recognized speech and a confidence value for the recognized speech, wherein the processing is based, at least in part, on the accessed personal information stored on the electronic device;
determining based, at least in part, on the confidence value, whether to send at least a portion of the input audio to the network device for speech recognition by the remote speech recognizer, wherein the remote speech recognizer does not have access to the personal information stored on the electronic device; and
sending, in response to determining to send at least a portion of the input audio to the network device, the at least a portion of the input audio and at least some of the accessed personal information stored on the electronic device to the network device.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.
-
Citations
22 Claims
-
1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising:
-
receiving, by the electronic device, input audio comprising speech; accessing, prior to sending at least a portion of the input audio to the network device, personal information stored on the electronic device, wherein the personal information is associated with the electronic device and/or a user of the electronic device; processing, by the embedded speech recognizer prior to sending at least a portion of the input audio to the network device, at least a portion of the input audio to produce recognized speech and a confidence value for the recognized speech, wherein the processing is based, at least in part, on the accessed personal information stored on the electronic device; determining based, at least in part, on the confidence value, whether to send at least a portion of the input audio to the network device for speech recognition by the remote speech recognizer, wherein the remote speech recognizer does not have access to the personal information stored on the electronic device; and sending, in response to determining to send at least a portion of the input audio to the network device, the at least a portion of the input audio and at least some of the accessed personal information stored on the electronic device to the network device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by at least one processor on an electronic device in a distributed speech recognition system comprising the electronic device having an embedded speech recognizer and a network device having a remote speech recognizer remote from the electronic device, perform a method comprising:
-
receiving, by the electronic device, input audio comprising speech; accessing, prior to sending at least a portion of the input audio to the network device, personal information stored on the electronic device, wherein the personal information is associated with the electronic device and/or a user of the electronic device; processing, by the embedded speech recognizer prior to sending at least a portion of the input audio to the network device, at least a portion of the input audio to produce recognized speech and a confidence value for the recognized speech, wherein the processing is based, at least in part, on the accessed personal information stored on the electronic device; determining based, at least in part, on the confidence value, whether to send at least a portion of the input audio to the network device for speech recognition by the remote speech recognizer, wherein the remote speech recognizer does not have access to the personal information stored on the electronic device; and sending, in response to determining to send at least a portion of the input audio to the network device, the at least a portion of the input audio and at least some of the accessed personal information stored on the electronic device to the network device. - View Dependent Claims (14, 15, 16, 17)
-
-
18. An electronic device for use in a distributed speech recognition system comprising the electronic device and a network device remote from the electronic device, the electronic device, comprising:
-
at least one storage device configured to store personal information associated with the electronic device and/or a user of the electronic device; and an embedded speech recognizer configured to; receive input audio comprising speech; access, prior to sending at least a portion of the input audio to the network device, personal information stored on the electronic device, wherein the personal information is associated with the electronic device and/or a user of the electronic device; process, prior to sending at least a portion of the input audio to the network device, at least a portion of the input audio to produce recognized speech and a confidence value for the recognized speech, wherein the processing is based, at least in part, on the accessed personal information stored on the electronic device; and at least one processor programmed to; determine based, at least in part, on the confidence value, whether to send at least a portion of the input audio to the network device for speech recognition by the remote speech recognizer, wherein the remote speech recognizer does not have access to the personal information stored on the electronic device; and send, in response to determining to send at least a portion of the input audio to the network device, the at least a portion of the input audio and at least some of the accessed personal information stored on the electronic device to the network device. - View Dependent Claims (19, 20, 21, 22)
-
Specification