Speech-responsive portable speaker

US 9,633,661 B1
Filed: 02/02/2015
Issued: 04/25/2017
Est. Priority Date: 02/02/2015
Status: Active Grant

First Claim

Patent Images

1. A portable music device comprising:

a microphone;

a speaker;

a talk button;

a wireless communications interface configured to communicate with a speech support service server over a wide-area network;

the portable music device being configured to operate in a first mode when the portable music device is not receiving external power;

the portable music device being configured to operate in a second mode when the portable music device is receiving external power;

wherein operating in the first mode comprises;

detecting actuation of the talk button;

receiving first speech input, the first speech input including information about a first song to be played;

generating first audio data using the microphone, the first audio data corresponding to the first speech input;

sending the first audio data to the speech support service server;

receiving second audio data from the speech support service server, wherein the second audio data corresponds to the first song; and

playing the first song using the speaker;

wherein operating in the second mode comprises;

receiving second speech input, the second speech input corresponding to a trigger expression;

receiving third speech input, the third speech input including information about a second song to be played;

identifying, by the portable music device, the second song using the third speech input; and

playing the second song using the speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A portable music device may operate in response to user speech. In situations in which the music device is operating primarily from battery power, a push-to-talk (PTT) button may be used to indicate when the user is directing speech to the device. When the music device is receiving external power, the music device may continuously monitor a microphone signal to detect a user utterance of a wakeword, which may be used to indicate that subsequent speech is directed to the device. When operating from battery power, the device may send audio to a network-based support service for speech recognition and natural language understanding. When operating from external power, the speech recognition and/or natural language understanding may be performed by the music device itself.

Citations

20 Claims

1. A portable music device comprising:
- a microphone;
  
  a speaker;
  
  a talk button;
  
  a wireless communications interface configured to communicate with a speech support service server over a wide-area network;
  
  the portable music device being configured to operate in a first mode when the portable music device is not receiving external power;
  
  the portable music device being configured to operate in a second mode when the portable music device is receiving external power;
  
  wherein operating in the first mode comprises;
  
  detecting actuation of the talk button;
  
  receiving first speech input, the first speech input including information about a first song to be played;
  
  generating first audio data using the microphone, the first audio data corresponding to the first speech input;
  
  sending the first audio data to the speech support service server;
  
  receiving second audio data from the speech support service server, wherein the second audio data corresponds to the first song; and
  
  playing the first song using the speaker;
  
  wherein operating in the second mode comprises;
  
  receiving second speech input, the second speech input corresponding to a trigger expression;
  
  receiving third speech input, the third speech input including information about a second song to be played;
  
  identifying, by the portable music device, the second song using the third speech input; and
  
  playing the second song using the speaker.
- View Dependent Claims (2)
- - 2. The device of claim 1, further comprising a speech recognition component, wherein identifying the second song comprises recognizing words in the third speech input using the speech recognition component.

3. A portable device comprising:
- a microphone;
  
  a talk actuator;
  
  a power detector configured to detect a first power state and a second power state of the portable device;
  
  the portable device being configured to operate in a first mode when in the first power state and a second mode when in the second power state;
  
  wherein operating in the first mode comprises;
  
  detecting actuation of the talk actuator;
  
  generating, based at least in part on the actuation of the talk actuator, first audio data corresponding to first speech input;
  
  sending the first audio data to a speech support service server that is external to the portable device;
  
  receiving second audio data from the speech support service server, wherein the second audio data is based at least in part on the first audio data; and
  
  outputting audible content corresponding to the second audio data; and
  
  wherein operating in the second mode comprises;
  
  receiving second speech input;
  
  generating third audio data corresponding to the second speech input; and
  
  analyzing the third audio data.
- View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 4. The portable device of claim 3, wherein operating in the second mode further comprises:
    - detecting, based at least in part on analyzing the third audio data, utterance of a trigger expression;
      
      receiving third speech input;
      
      generating fourth audio data using the microphone, the fourth audio data corresponding to the third speech input; and
      
      causing the fourth audio data to be analyzed to recognize words in the third user speech.
  - 5. The portable device of claim 4, wherein causing the fourth audio data to be analyzed comprises sending the fourth audio data to the speech support service server.
  - 6. The portable device of claim 4, further comprising a speech recognition component, wherein causing the fourth audio data to be analyzed comprises recognizing words in the third speech input using the speech recognition component.
  - 7. The portable device of claim 3, wherein:
    - the first power state indicates that the portable device is not receiving power from an external power source; and
      
      the second power state indicates that the portable device is receiving power from an external power source.
  - 8. The portable device of claim 3, further comprising:
    - a wireless network interface configured to communicate over a wide-area network with a music service server to receive fourth audio data containing music; and
      
      a speaker configured to play the music.
  - 9. The portable device of claim 3, wherein:
    - operating in the first mode further comprises receiving an indication from the speech support service server of a first action to perform in response to the first speech input; and
      
      operating in the second mode further comprises determining, by the portable device, a second action to perform in response to the second speech input.
  - 10. The portable device of claim 3, wherein sending the first audio data to the speech support service server comprises sending the first audio data over a wide-area network to the speech support service server, wherein the speech support service server is configured to perform automatic speech recognition and natural language understanding.
  - 11. The portable device of claim 3, further comprising a wireless network interface configured to communicate with speech support service server.
  - 12. The portable device of claim 3, whereinoperating the portable device in the second mode further comprises:
    - detecting an utterance of a trigger expression in the second speech input; and
      
      generating fourth audio data corresponding to third speech input based at least in part on detecting the utterance of the trigger expression.

13. A method, comprising:
- operating a device in a first mode; and
  
  operating the device in a second mode;
  
  wherein operating in the first mode comprises;
  
  detecting actuation of a physical talk actuator;
  
  generating, based at least in part on the actuation of the physical talk actuator, first audio data corresponding to first speech input; and
  
  sending the first audio data to a network-accessible speech support service server, wherein the network-accessible speech support service server is configured to analyze the first audio data to recognize words of the first speech input; and
  
  wherein operating in the second mode comprises;
  
  receiving second speech input;
  
  generating second audio data corresponding to the second speech input; and
  
  analyzing the second audio data.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The method of claim 13, wherein operating in the second mode further comprises:
    - detecting, based at least in part on analyzing the second audio data, an utterance of a trigger expression;
      
      receiving third speech input;
      
      generating third audio data corresponding to the third speech input; and
      
      causing the third audio data to be analyzed to recognize speech in the third speech input.
  - 15. The method of claim 14, wherein causing the third audio data to be analyzed comprises sending the third audio data to the network-accessible speech support service server.
  - 16. The method of claim 14, wherein causing the third audio data to be analyzed comprises performing speech recognition.
  - 17. The method of claim 13, wherein:
    - operating in the first mode further comprises receiving an indication from the network-accessible speech support service server of a first action to perform in response to the first speech input; and
      
      operating in the second mode further comprises determining a second action to perform in response to the second speech input.
  - 18. The method of claim 13, further comprising:
    - detecting a first power state;
      
      operating in the first mode when in the first power state;
      
      detecting a second power state; and
      
      operating in the second mode when in the second power state.
  - 19. The method of claim 18, wherein:
    - detecting the first power state comprises determining that power is not being received from an external source; and
      
      detecting the second power state comprises determining that power is being received from an external source.
  - 20. The method of claim 13, whereinoperating the device in the second mode further comprises:
    - detecting an utterance of a trigger expression in the second speech input; and
      
      generating third audio data corresponding to third speech input based at least in part on detecting the utterance of the trigger expression.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Yum, Steve Hoonsuck, Hagler, Chris Stewart, Typrin, Marcello
Primary Examiner(s)
Pullias, Jesse

Application Number

US14/611,853
Time in Patent Office

813 Days
Field of Search

704275
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 17/22   Interactive procedures; Man...

G10L 2015/223   Execution procedure of a sp...

H04R 2203/12   Beamforming aspects for ste...

H04R 2227/005   Audio distribution systems ...

Speech-responsive portable speaker

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech-responsive portable speaker

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links