Multiple-source speech dialog input

US 9,792,901 B1
Filed: 12/11/2014
Issued: 10/17/2017
Est. Priority Date: 12/11/2014
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a handheld remote control having a first microphone and a talk button, the handheld remote control being configured to produce a first audio signal using the first microphone in response to a first actuation of the talk button, wherein the first audio signal represents first user speech associated with a user;

a stationary base device having a second microphone, the stationary base device being configured to produce a second audio signal using the second microphone in response to an utterance of a keyword, wherein the second audio signal represents second user speech associated with the user that follows the utterance of the keyword;

wherein the stationary base device is further configured to receive the first audio signal from the handheld remote control;

a speech service configured to receive the first audio signal and the second audio signal from the stationary base device and to engage in a speech dialog with the user to determine an intent of the user, wherein engaging in the speech dialog comprises engaging in a first dialog turn and a second dialog turn with the user;

the speech service being further configured to cause an action to be performed in fulfillment of the intent of the user;

wherein engaging in the first dialog turn comprises;

analyzing the first audio signal to recognize the first user speech;

determining a first meaning of the first user speech;

generating a first speech response to the first user speech; and

directing the stationary base device to play the first speech response; and

wherein engaging in the second dialog turn comprises;

analyzing the second audio signal to recognize the second user speech;

determining a second meaning of the second user speech;

generating a second speech response to the second user speech; and

directing the stationary base device to play the second speech response.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech system may be configured to operate in conjunction with a stationary base device and a handheld remote device to receive voice commands from a user. A user may direct speech either to the base device or to the handheld device. In order to direct speech to the base device, the user first speaks a keyword. In order to direct speech to the handheld device, the user presses a talk control on the handheld device. A dialog may be conducted with the user in multiple turns, where each turn comprises user speech and a speech response by the speech system. The user speech in any given dialog turn may be provided from the base device and/or the handheld device.

43 Citations

View as Search Results

22 Claims

1. A system comprising:
- a handheld remote control having a first microphone and a talk button, the handheld remote control being configured to produce a first audio signal using the first microphone in response to a first actuation of the talk button, wherein the first audio signal represents first user speech associated with a user;
  
  a stationary base device having a second microphone, the stationary base device being configured to produce a second audio signal using the second microphone in response to an utterance of a keyword, wherein the second audio signal represents second user speech associated with the user that follows the utterance of the keyword;
  
  wherein the stationary base device is further configured to receive the first audio signal from the handheld remote control;
  
  a speech service configured to receive the first audio signal and the second audio signal from the stationary base device and to engage in a speech dialog with the user to determine an intent of the user, wherein engaging in the speech dialog comprises engaging in a first dialog turn and a second dialog turn with the user;
  
  the speech service being further configured to cause an action to be performed in fulfillment of the intent of the user;
  
  wherein engaging in the first dialog turn comprises;
  
  analyzing the first audio signal to recognize the first user speech;
  
  determining a first meaning of the first user speech;
  
  generating a first speech response to the first user speech; and
  
  directing the stationary base device to play the first speech response; and
  
  wherein engaging in the second dialog turn comprises;
  
  analyzing the second audio signal to recognize the second user speech;
  
  determining a second meaning of the second user speech;
  
  generating a second speech response to the second user speech; and
  
  directing the stationary base device to play the second speech response.
- View Dependent Claims (2, 3)
- - 2. The system of claim 1, wherein:
    - the stationary base device is further configured to produce a third audio signal using the second microphone after engaging in the second dialog turn without further utterance of the keyword, the third audio signal representing third user speech;
      
      engaging in the speech dialog further comprises engaging in a third dialog turn with the user, wherein engaging in the third dialog turn comprises;
      
      analyzing the third audio signal to recognize the third user speech;
      
      determining a third meaning of the third user speech;
      
      generating a third speech response to the third user speech; and
      
      directing the stationary base device to play the third speech response.
  - 3. The system of claim 1, wherein:
    - the stationary base device is further configured to provide a third audio signal to the speech service after engaging in the second dialog turn;
      
      the stationary base device receives the third audio signal from the handheld remote control in response to a second actuation of the talk button within a predefined time period after engaging in the second dialog turn;
      
      the stationary base device produces the third audio signal using the second microphone of the stationary base device in response to the predefined time period elapsing without actuation of the talk button;
      
      engaging in the speech dialog further comprises engaging in a third dialog turn with the user, wherein engaging in the third dialog turn comprises;
      
      analyzing the third audio signal to recognize third user speech;
      
      determining a third meaning of the third user speech;
      
      generating a third speech response to the third user speech; and
      
      directing the stationary base device to play the third speech response.

4. A method comprising:
- engaging in a speech dialog with a user to determine an intent of the user, wherein engaging in the speech dialog comprises engaging in a first dialog turn and, a second dialog turn, and a third dialog turn with the user;
  
  wherein engaging in the first dialog turn comprises;
  
  receiving a first audio signal produced by a first device, the first audio signal representing first user speech; and
  
  determining, based at least in part on providing the first audio signal to a speech service, a first meaning of the first user speech;
  
  wherein engaging in the second dialog turn comprises;
  
  receiving a second audio signal that is produced using a microphone of a second device, the second audio signal representing second user speech; and
  
  determining, based at least in part on providing the second audio signal to the speech service, a second meaning of the second user speech based; and
  
  wherein engaging in the third dialog turn comprises;
  
  receiving, based at least in part on an actuation of a talk control of the first device within a predefined time period after engaging in the second dialog turn, a third audio signal from the first device, the third audio signal representing third user speech; and
  
  determining a third meaning of the third user speech.
- View Dependent Claims (5, 6, 7, 8, 16)
- - 5. The method of claim 4, further comprising:
    - generating a response to at least one of the first user speech, the second user speech, or the third user speech, wherein the response queries the user regarding the intent of the user.
  - 6. The method of claim 4, further comprising:
    - producing, based at least in part on actuation of the talk control, the first audio signal; and
      
      producing, based at least in part on an utterance of a trigger expression, the second audio signal.
  - 7. The method of claim 4, further comprising:
    - analyzing the first audio signal to recognize the first user speech; and
      
      analyzing the second audio signal to recognize the second user speech.
  - 8. The method of claim 4, wherein receiving the first audio signal comprises receiving the first audio signal from the second device.
  - 16. The method of claim 4, wherein, based at least in part on the first dialogue turn and the second dialogue turn, the speech service is configured to cause an action to be performed to fulfill the intent of the user.

9. A method comprising:
- engaging in a speech dialog to determine an intent of a user, wherein engaging in the speech dialog comprises engaging in a first dialog turn and a second dialog turn;
  
  wherein engaging in the first dialog turn comprises;
  
  determining an actuation of a talk control associated with a first device;
  
  receiving a first audio signal produced by the first device, wherein the first audio signal represents first user speech; and
  
  determining a first meaning of the first user speech;
  
  wherein engaging in the second dialog turn comprises;
  
  receiving, after engaging in the first dialog turn, a second audio signal representing second user speech, the second audio signal produced using a microphone of a second device; and
  
  determining a second meaning based at least in part on;
  
  receiving a third audio signal produced by the first device based at least in part on an actuation of the talk control of the first device within a predefined time period after the first dialog turn, the third audio signal representing third user speech;
  
  orreceiving the second audio signal based at least in part on the predefined time period elapsing without actuation of the talk control.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The method of claim 9, further comprising:
    - generating an output to at least one of the first user speech, the second user speech, or the third user speech, the output corresponding to a query regarding the intent of the user.
  - 11. The method of claim 9, further comprising:
    - buffering the second audio signal for at least the predefined time period;
      
      determining that the predefined time period has elapsed without actuation of the talk control; and
      
      providing the second audio signal to a speech service.
  - 12. The method of claim 9, further comprising:
    - providing the second audio signal to a speech service during the predefined time period.
  - 13. The method of claim 9, wherein:
    - determining the first meaning comprises providing the first audio signal to a speech service; and
      
      determining the second meaning comprises providing at least one of the second audio signal or the third audio signal to the speech service.
  - 14. The method of claim 9, further comprising:
    - producing the second audio signal without user utterance of a keyword.
  - 15. The method of claim 9, wherein receiving the first audio signal comprises receiving the first audio signal from the second device.

17. A method comprising:
- engaging in a speech dialog with a user to determine an intent of the user, wherein engaging in the speech dialog comprises engaging in a first dialog turn, a second dialog turn, and a third dialog turn with the userwherein engaging in the first dialog turn comprises;
  
  receiving a first audio signal produced by a first device, the first audio signal representing first user speech; and
  
  determining, based at least in part on providing the first audio signal to a speech service, a first meaning of the first user speech;
  
  wherein engaging in the second dialog turn comprises;
  
  receiving a second audio signal produced using a microphone of a second device, the second audio signal representing second user speech;
  
  determining, based at least in part on providing the second audio signal to the speech service, a second meaning of the second user speech; and
  
  wherein engaging in the third dialog turn comprises;
  
  receiving, based at least in part on a predefined time period elapsing after engaging in the second dialog turn without actuation of a talk control of the first device, a third audio signal from the second device, the third audio signal representing third user speech; and
  
  determining a third meaning of the third user speech.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The method of claim 17, further comprising:
    - generating a response to at least one of the first user speech, the second user speech, or the third user speech, wherein the response queries the user regarding the intent of the user.
  - 19. The method of claim 17, further comprising:
    - producing, based at least in part on an actuation of the talk control, the first audio signal; and
      
      producing, based at least in part on an utterance of a trigger expression, the second audio signal.
  - 20. The method of claim 17, further comprising:
    - analyzing the first audio signal to recognize the first user speech; and
      
      analyzing the second audio signal to recognize the second user speech.
  - 21. The method of claim 17, wherein receiving the first audio signal comprises receiving the first audio signal from the second device.
  - 22. The method of claim 19, wherein engaging in the third dialog turn further comprises:
    - producing the third audio signal after engaging in the second dialog turn without further utterance of the trigger expression.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Saleem, Shirin, Piercy, Aimee Therese, Typrin, Marcello, Somashekar, Shamitha, Piersol, Kurt Wesley
Primary Examiner(s)
Neway, Samuel G

Application Number

US14/567,416
Time in Patent Office

1,041 Days
Field of Search
US Class Current
CPC Class Codes

B60R 16/0373   Voice control in general G10L

G06F 3/167   Audio in a user interface, ...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

Multiple-source speech dialog input

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

43 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Multiple-source speech dialog input

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links