Multiple-source speech dialog input
First Claim
Patent Images
1. A system comprising:
- a handheld remote control having a first microphone and a talk button, the handheld remote control being configured to produce a first audio signal using the first microphone in response to a first actuation of the talk button, wherein the first audio signal represents first user speech associated with a user;
a stationary base device having a second microphone, the stationary base device being configured to produce a second audio signal using the second microphone in response to an utterance of a keyword, wherein the second audio signal represents second user speech associated with the user that follows the utterance of the keyword;
wherein the stationary base device is further configured to receive the first audio signal from the handheld remote control;
a speech service configured to receive the first audio signal and the second audio signal from the stationary base device and to engage in a speech dialog with the user to determine an intent of the user, wherein engaging in the speech dialog comprises engaging in a first dialog turn and a second dialog turn with the user;
the speech service being further configured to cause an action to be performed in fulfillment of the intent of the user;
wherein engaging in the first dialog turn comprises;
analyzing the first audio signal to recognize the first user speech;
determining a first meaning of the first user speech;
generating a first speech response to the first user speech; and
directing the stationary base device to play the first speech response; and
wherein engaging in the second dialog turn comprises;
analyzing the second audio signal to recognize the second user speech;
determining a second meaning of the second user speech;
generating a second speech response to the second user speech; and
directing the stationary base device to play the second speech response.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech system may be configured to operate in conjunction with a stationary base device and a handheld remote device to receive voice commands from a user. A user may direct speech either to the base device or to the handheld device. In order to direct speech to the base device, the user first speaks a keyword. In order to direct speech to the handheld device, the user presses a talk control on the handheld device. A dialog may be conducted with the user in multiple turns, where each turn comprises user speech and a speech response by the speech system. The user speech in any given dialog turn may be provided from the base device and/or the handheld device.
43 Citations
22 Claims
-
1. A system comprising:
-
a handheld remote control having a first microphone and a talk button, the handheld remote control being configured to produce a first audio signal using the first microphone in response to a first actuation of the talk button, wherein the first audio signal represents first user speech associated with a user; a stationary base device having a second microphone, the stationary base device being configured to produce a second audio signal using the second microphone in response to an utterance of a keyword, wherein the second audio signal represents second user speech associated with the user that follows the utterance of the keyword; wherein the stationary base device is further configured to receive the first audio signal from the handheld remote control; a speech service configured to receive the first audio signal and the second audio signal from the stationary base device and to engage in a speech dialog with the user to determine an intent of the user, wherein engaging in the speech dialog comprises engaging in a first dialog turn and a second dialog turn with the user; the speech service being further configured to cause an action to be performed in fulfillment of the intent of the user; wherein engaging in the first dialog turn comprises; analyzing the first audio signal to recognize the first user speech; determining a first meaning of the first user speech; generating a first speech response to the first user speech; and directing the stationary base device to play the first speech response; and wherein engaging in the second dialog turn comprises; analyzing the second audio signal to recognize the second user speech; determining a second meaning of the second user speech; generating a second speech response to the second user speech; and directing the stationary base device to play the second speech response. - View Dependent Claims (2, 3)
-
-
4. A method comprising:
-
engaging in a speech dialog with a user to determine an intent of the user, wherein engaging in the speech dialog comprises engaging in a first dialog turn and, a second dialog turn, and a third dialog turn with the user; wherein engaging in the first dialog turn comprises; receiving a first audio signal produced by a first device, the first audio signal representing first user speech; and determining, based at least in part on providing the first audio signal to a speech service, a first meaning of the first user speech; wherein engaging in the second dialog turn comprises; receiving a second audio signal that is produced using a microphone of a second device, the second audio signal representing second user speech; and determining, based at least in part on providing the second audio signal to the speech service, a second meaning of the second user speech based; and wherein engaging in the third dialog turn comprises; receiving, based at least in part on an actuation of a talk control of the first device within a predefined time period after engaging in the second dialog turn, a third audio signal from the first device, the third audio signal representing third user speech; and determining a third meaning of the third user speech. - View Dependent Claims (5, 6, 7, 8, 16)
-
-
9. A method comprising:
-
engaging in a speech dialog to determine an intent of a user, wherein engaging in the speech dialog comprises engaging in a first dialog turn and a second dialog turn; wherein engaging in the first dialog turn comprises; determining an actuation of a talk control associated with a first device; receiving a first audio signal produced by the first device, wherein the first audio signal represents first user speech; and determining a first meaning of the first user speech; wherein engaging in the second dialog turn comprises; receiving, after engaging in the first dialog turn, a second audio signal representing second user speech, the second audio signal produced using a microphone of a second device; and determining a second meaning based at least in part on; receiving a third audio signal produced by the first device based at least in part on an actuation of the talk control of the first device within a predefined time period after the first dialog turn, the third audio signal representing third user speech;
orreceiving the second audio signal based at least in part on the predefined time period elapsing without actuation of the talk control. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
17. A method comprising:
-
engaging in a speech dialog with a user to determine an intent of the user, wherein engaging in the speech dialog comprises engaging in a first dialog turn, a second dialog turn, and a third dialog turn with the user wherein engaging in the first dialog turn comprises; receiving a first audio signal produced by a first device, the first audio signal representing first user speech; and determining, based at least in part on providing the first audio signal to a speech service, a first meaning of the first user speech; wherein engaging in the second dialog turn comprises; receiving a second audio signal produced using a microphone of a second device, the second audio signal representing second user speech; determining, based at least in part on providing the second audio signal to the speech service, a second meaning of the second user speech; and wherein engaging in the third dialog turn comprises; receiving, based at least in part on a predefined time period elapsing after engaging in the second dialog turn without actuation of a talk control of the first device, a third audio signal from the second device, the third audio signal representing third user speech; and determining a third meaning of the third user speech. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification