Contextualization of voice inputs

US 10,593,331 B2
Filed: 11/15/2018
Issued: 03/17/2020
Est. Priority Date: 07/15/2016
Status: Active Grant

First Claim

Patent Images

1. Tangible, non-transitory, computer-readable media having instructions encoded therein, wherein the instructions, when executed by one or more processors, cause a first playback device of a first zone of a media playback system to perform a method comprising:

recording, via a microphone array of the first playback device, first audio data indicating a voice command, wherein a second playback device of a second zone of the media playback system records, via a microphone array of the second playback device, second audio data indicating the voice command;

identifying, based on the recorded first audio data, a first characteristic of the voice command, the first characteristic comprising a sound pressure level of the voice command as detected by the microphone array of the first playback device;

receiving, via a network interface of the first playback device from the second playback device, contextual information, wherein the contextual information comprises a second characteristic of the voice command, the second characteristic comprising a sound pressure level of the voice command as detected by the microphone array of the second playback device;

based on the contextual information, determining that the sound pressure level of the voice command as detected by the microphone array of the first playback device is greater than the sound pressure level of the voice command as detected by the microphone array of the second playback device;

in response to the determining, querying, via the network interface of the first playback device, one or more servers of a voice assistant service with the voice command;

receiving, via the network interface in response to the query, a playback command corresponding to the voice command; and

playing back audio content according to the playback command via one or more amplifiers configured to drive one or more speakers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are example techniques to provide contextual information corresponding to a voice command. An example implementation may involve receiving voice data indicating a voice command, receiving contextual information indicating a characteristic of the voice command, and determining a device operation corresponding to the voice command. Determining the device operation corresponding to the voice command may include identifying, among multiple zones of a media playback system, a zone that corresponds to the characteristic of the voice command, and determining that the voice command corresponds to one or more particular devices that are associated with the identified zone. The example implementation may further involve causing the one or more particular devices to perform the device operation.

Citations

20 Claims

1. Tangible, non-transitory, computer-readable media having instructions encoded therein, wherein the instructions, when executed by one or more processors, cause a first playback device of a first zone of a media playback system to perform a method comprising:
- recording, via a microphone array of the first playback device, first audio data indicating a voice command, wherein a second playback device of a second zone of the media playback system records, via a microphone array of the second playback device, second audio data indicating the voice command;
  
  identifying, based on the recorded first audio data, a first characteristic of the voice command, the first characteristic comprising a sound pressure level of the voice command as detected by the microphone array of the first playback device;
  
  receiving, via a network interface of the first playback device from the second playback device, contextual information, wherein the contextual information comprises a second characteristic of the voice command, the second characteristic comprising a sound pressure level of the voice command as detected by the microphone array of the second playback device;
  
  based on the contextual information, determining that the sound pressure level of the voice command as detected by the microphone array of the first playback device is greater than the sound pressure level of the voice command as detected by the microphone array of the second playback device;
  
  in response to the determining, querying, via the network interface of the first playback device, one or more servers of a voice assistant service with the voice command;
  
  receiving, via the network interface in response to the query, a playback command corresponding to the voice command; and
  
  playing back audio content according to the playback command via one or more amplifiers configured to drive one or more speakers.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The tangible, non-transitory, computer-readable media of claim 1, wherein the contextual information further comprise an indication that the voice command was detected in at least one zone of the media playback system different than the first zone.
  - 3. The tangible, non-transitory, computer-readable media of claim 1, wherein the contextual information further comprises the recorded second audio data.
  - 4. The tangible, non-transitory, computer-readable media of claim 1, wherein the determining comprises:
    - determining that the sound pressure level of the voice command as detected by the microphone array of the first playback device has a higher magnitude than the sound pressure level of the voice command as detected by the microphone array of the second playback device.
  - 5. The tangible, non-transitory, computer-readable media of claim 1, wherein the method further comprises:
    - detecting, within the recorded first audio data, a wake-word preceding the voice command; and
      
      identifying a portion of the recorded first audio data following the wake-word as the voice command.
  - 6. The tangible, non-transitory, computer-readable media of claim 1, wherein the playback command comprises a command to play back particular audio content in the first zone and the second zone, and wherein the method further comprises:
    - instructing the second playback device of the second zone to play back the particular audio content according to the playback command in synchrony with playback of the particular audio content by the first playback device of the first zone.
  - 7. The tangible, non-transitory, computer-readable media of claim 1, wherein the first zone is configured into a zone group with the second zone, and wherein the method further comprises:
    - playing back the audio content according to the playback command in synchrony with the second playback device.

8. A first playback device of a first zone of a media playback system, the first playback device comprising:
- a microphone array;
  
  a network interface;
  
  one or more processors; and
  
  computer-readable media having instructions encoded therein, wherein the instructions, when executed by the one or more processors, cause the first playback device to perform functions comprising;
  
  recording, via the microphone array, first audio data indicating a voice command, wherein a second playback device of a second zone of the media playback system records, via a microphone array of the second playback device, second audio data indicating the voice command;
  
  identifying, based on the recorded first audio data, a first characteristic of the voice command, the first characteristic comprising a sound pressure level of the voice command as detected by the microphone array of the first playback device;
  
  receiving, via the network interface, from the second playback device, contextual information, wherein the contextual information comprises a second characteristic of the voice command, the second characteristic comprising a sound pressure level of the voice command as detected by the microphone array of the second playback device;
  
  based on the contextual information, determining that the sound pressure level of the voice command as detected by the microphone array of the first playback device is greater than the sound pressure level of the voice command as detected by the microphone array of the second playback device;
  
  in response to the determining, querying one or more servers of a voice assistant service with the voice command;
  
  receiving, via the network interface in response to the query, a playback command corresponding to the voice command; and
  
  playing back audio content according to the playback command via one or more amplifiers configured to drive one or more speakers.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The first playback device of claim 8, wherein the contextual information further comprises an indication that the voice command was detected in at least one zone of the media playback system different than the first zone.
  - 10. The first playback device of claim 8, wherein the contextual information further comprises the recorded second audio data.
  - 11. The first playback device of claim 8, wherein the determining comprises:
    - determining that the sound pressure level of the voice command as detected by the microphone array of the first playback device has a higher magnitude than the sound pressure level of the voice command as detected by the microphone array of the second playback device.
  - 12. The first playback device of claim 8, wherein the functions further comprise:
    - detecting, within the recorded first audio data, a wake-word preceding the voice command; and
      
      identifying a portion of the recorded first audio data following the wake-word as the voice command.
  - 13. The first playback device of claim 8, wherein the playback command comprises a command to play back particular audio content in the first zone and the second zone, and wherein the functions further comprise:
    - instructing the second playback device of the second zone to play back the particular audio content according to the playback command in synchrony with playback of the particular audio content by the first playback device of the first zone.
  - 14. The first playback device of claim 8, wherein the first zone is configured into a zone group with the second zone, and wherein the functions further comprise:
    - playing back the audio content according to the playback command in synchrony with the second playback device.

15. A method comprising:
- recording, via a microphone array of a first playback device of a first zone of a media playback system, first audio data indicating a voice command, wherein a second playback device of a second zone of the media playback system records, via a microphone array of the second playback device, second audio data indicating the voice command;
  
  identifying, based on the recorded first audio data, a first characteristic of the voice command, the first characteristic comprising a sound pressure level of the voice command as detected by the microphone array of the first playback device;
  
  receiving, via a network interface of the first playback device from the second playback device, contextual information, wherein the contextual information comprises a second characteristic of the voice command, the second characteristic comprising a sound pressure level of the voice command as detected by the microphone array of the second playback device;
  
  based on the contextual information, determining that the sound pressure level of the voice command as detected by the microphone array of the first playback device is greater than the sound pressure level of the voice command as detected by the microphone array of the second playback device;
  
  in response to the determining, querying one or more servers of a voice assistant service with the voice command;
  
  receiving, via the network interface in response to the query, a playback command corresponding to the voice command; and
  
  playing back audio content according to the playback command via one or more amplifiers configured to drive one or more speakers.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, wherein the contextual information further comprises an indication that the voice command was detected in at least one zone of the media playback system different than the first zone.
  - 17. The method of claim 15, wherein the contextual information further comprises the recorded second audio data.
  - 18. The method of claim 15, wherein the determining comprises:
    - determining that the sound pressure level of the voice command as detected by the microphone array of the first playback device has a higher magnitude than the sound pressure level of the voice command as detected by the microphone array of the second playback device.
  - 19. The method of claim 15, wherein the method further comprises:
    - detecting, within the recorded first audio data, a wake-word preceding the voice command; and
      
      identifying a portion of the recorded first audio data following the wake-word as the voice command.
  - 20. The method of claim 15, wherein the playback command comprises a command to play back particular audio content in the first zone and the second zone, and wherein the method further comprises:
    - instructing the second playback device of the second zone to play back the particular audio content according to the playback command in synchrony with playback of the particular audio content by the first playback device of the first zone.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sonos, Inc.
Original Assignee
Sonos, Inc.
Inventors
Lang, Jonathan P., Kadri, Romi, Butts, Christopher
Primary Examiner(s)
Guerra-Erazo, Edgar X

Application Number

US16/192,126
Publication Number

US 20190088261A1
Time in Patent Office

488 Days
Field of Search
US Class Current
CPC Class Codes

G01S 5/18   using ultrasonic, sonic, or...

G06F 3/165   Management of the audio str...

G06F 3/167   Audio in a user interface, ...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 17/22   Interactive procedures; Man...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/226   using non-speech characteri...

G10L 2015/228   of application context

H05B 47/165   following a pre-assigned pr...

Contextualization of voice inputs

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Contextualization of voice inputs

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links