Home graph

US 10,818,290 B2
Filed: 12/11/2018
Issued: 10/27/2020
Est. Priority Date: 12/11/2017
Status: Active Grant

First Claim

Patent Images

1. A system comprising one or more servers of a voice assistant service, wherein the one or more servers are configured to communicate with a network microphone device (NMD) of a media playback system comprising multiple devices connected via a local area network,wherein the NMD is configured to perform operations comprising:

recording, via a microphone array, audio into a buffer;

monitoring the recorded audio for wake-words; and

when a wake-word is detected in the recorded audio, sending, via a network interface to the voice assistant service, data representing an audio recording from the buffer of the NMD, the audio recording comprising a voice input following the detected wake-word within the buffer; and

wherein the one or more servers are configured to perform operations comprising;

storing a data structure comprising nodes in a hierarchy representing the media playback system, wherein the data structure comprises (i) a root node representing the media playback system as a Home of the hierarchy, (ii) one or more first nodes in a first level, the first nodes representing respective devices of the media playback system as Sets of the hierarchy, and (ii) one or more second nodes in a second level as parents to one or more respective child first nodes to represent Sets in respective Rooms of the hierarchy, wherein the nodes in the hierarchy are assigned respective names;

receiving, via a network interface of the one or more servers, data representing the audio recording;

processing the audio recording to determine one or more voice commands within the voice input, wherein processing the audio recording comprises;

determining, based on the data structure representing the media playback system, that one or more first voice commands within the voice input represent respective target variables indicating one or more particular nodes of the data structure, each target variable referencing a name of a respective node of the data structure; and

determining that one or more second voice commands within the voice input correspond to one or more playback commands; and

causing, via the network interface of the one or more servers, one or more particular playback devices to play back audio content according to the one or more playback commands, wherein the one or more particular playback devices include (a) all playback devices represented by the one or more particular nodes of the data structure and (b) all playback devices represented by child nodes of the one or more particular nodes of the data structure.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Example techniques involve a control hierarchy for a “smart” home having smart appliances and related devices, such as wireless illumination devices, home-automation devices (e.g., thermostats, door locks, etc.), and audio playback devices, among others. An example home includes various rooms in which smart devices might be located. Under the example control hierarchy described herein and referred to as “home graph,” a name of a room (e.g., “Kitchen”) may represent a smart device (or smart devices) within that room. In other words, from the perspective of a user, the smart devices within a room are that room. This hierarchy permits a user to refer to a smart device within a given room by way of the name of the room when controlling smart devices within the home using a voice user interface (VUI) or graphical user interface (GUI).

439 Citations

20 Claims

1. A system comprising one or more servers of a voice assistant service, wherein the one or more servers are configured to communicate with a network microphone device (NMD) of a media playback system comprising multiple devices connected via a local area network,wherein the NMD is configured to perform operations comprising:
- recording, via a microphone array, audio into a buffer;
  
  monitoring the recorded audio for wake-words; and
  
  when a wake-word is detected in the recorded audio, sending, via a network interface to the voice assistant service, data representing an audio recording from the buffer of the NMD, the audio recording comprising a voice input following the detected wake-word within the buffer; and
  
  wherein the one or more servers are configured to perform operations comprising;
  
  storing a data structure comprising nodes in a hierarchy representing the media playback system, wherein the data structure comprises (i) a root node representing the media playback system as a Home of the hierarchy, (ii) one or more first nodes in a first level, the first nodes representing respective devices of the media playback system as Sets of the hierarchy, and (ii) one or more second nodes in a second level as parents to one or more respective child first nodes to represent Sets in respective Rooms of the hierarchy, wherein the nodes in the hierarchy are assigned respective names;
  
  receiving, via a network interface of the one or more servers, data representing the audio recording;
  
  processing the audio recording to determine one or more voice commands within the voice input, wherein processing the audio recording comprises;
  
  determining, based on the data structure representing the media playback system, that one or more first voice commands within the voice input represent respective target variables indicating one or more particular nodes of the data structure, each target variable referencing a name of a respective node of the data structure; and
  
  determining that one or more second voice commands within the voice input correspond to one or more playback commands; and
  
  causing, via the network interface of the one or more servers, one or more particular playback devices to play back audio content according to the one or more playback commands, wherein the one or more particular playback devices include (a) all playback devices represented by the one or more particular nodes of the data structure and (b) all playback devices represented by child nodes of the one or more particular nodes of the data structure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein determining that the one or more first voice commands within the voice input represent respective target variables comprises determining that the one or more first voice commands within the voice input represent a target variable referencing a name of a particular second node representing a particular Room, the particular Room including a first Set consisting of a first playback device and a second Set consisting of a second playback device, and wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing the first playback device and the second playback device to play back the audio content in synchrony.
  - 3. The system of claim 1, wherein determining that the one or more first voice commands within the voice input represent respective target variables comprises determining that the one or more first voice commands within the voice input represent a target variable referencing a name of a particular first node representing a particular Set, the particular Set consisting of a first playback device and a second playback device in a bonded zone, and wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing the first playback device and the second playback device to play back respective channels of the audio content in synchrony.
  - 4. The system of claim 1, wherein the data structure further comprises one or more third nodes in a third level as parents to one or more respective child second nodes to represent Rooms in respective Areas of the hierarchy, wherein determining that the one or more first voice commands within the voice input represent respective target variables referencing one or more respective names corresponding to one or more particular nodes of the data structure comprises determining that the one or more first voice commands within the voice input represent a target variable referencing a name of a particular third node representing an Area including a first Room and a second Room, the first Room including a first Set that consists of a first playback device and the second Room including a second Set that consists of a second playback device, and wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing the first playback device and the second playback device to play back the audio content in synchrony.
  - 5. The system of claim 4, wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing the first playback device and the second playback device to form a synchrony group.
  - 6. The system of claim 1, wherein determining that the one or more first voice commands within the voice input represent respective target variables comprises determining that the one or more first voice commands within the voice input represent a target variable referencing a name of the root node, and wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing all playback devices represented by nodes within the data structure to play back the audio content in synchrony.
  - 7. The system of claim 1, wherein determining that one or more first voice commands within the voice input represent respective target variables comprises:
    - searching the nodes of the data structure for nodes having assigned names that match target variables represented by the one or more first voice commands; and
      
      determining that the one or more first voice commands match names assigned to the one or more particular nodes.
  - 8. The system of claim 1, wherein the data structure defines a tree, and wherein determining that one or more first voice commands within the voice input represent respective target variables comprises:
    - traversing the tree to search for nodes having assigned names that match target variables represented by the one or more first voice commands; and
      
      determining that the one or more first voice commands match names assigned to the one or more particular nodes.
  - 9. The system of claim 8, wherein traversing the tree to search for nodes comprises traversing the tree in level order beginning with the root node.
  - 10. The system of claim 1, wherein the audio recording is a first audio recording and the voice input is a first voice input, and wherein the one or more servers are configured to perform operations further comprising:
    - receiving, via a network interface of the one or more servers, data representing a second audio recording comprising a second voice input following another detected wake-word within the buffer;
      
      processing the second audio recording to determine one or more voice commands within the second voice input, wherein processing the audio recording comprises;
      
      determining, based on the data structure representing the media playback system, that one or more third voice commands within the second voice input represent respective target variables, each target variable referencing a name of a respective node of the data structure; and
      
      determining that one or more fourth voice commands within the voice input correspond to one or more commands to toggle on one or more smart illumination devices; and
      
      causing, via the network interface of the one or more servers, one or more particular one or more smart illumination devices to play back audio content according to the one or more playback commands, wherein the one or more particular playback devices include (a) all smart illumination devices represented by the one or more particular nodes of the data structure and (b) all smart illumination devices represented by child nodes of the one or more particular nodes of the data structure.

11. A method to be performed by a system comprising one or more servers of a voice assistant service, wherein the one or more servers are configured to communicate with a network microphone device (NMD) of a media playback system comprising multiple devices connected via a local area network, wherein the NMD is configured to perform operations comprising:
- recording, via a microphone array, audio into a buffer;
  
  monitoring the recorded audio for wake-words; and
  
  when a wake-word is detected in the recorded audio, sending, via a network interface to the voice assistant service, data representing an audio recording from the buffer of the NMD, the audio recording comprising a voice input following the detected wake-word within the buffer; and
  
  wherein the method comprises;
  
  the one or more servers storing a data structure comprising nodes in a hierarchy representing the media playback system, wherein the data structure comprises (i) a root node representing the media playback system as a Home of the hierarchy, (ii) one or more first nodes in a first level, the first nodes representing respective devices of the media playback system as Sets of the hierarchy, and (ii) one or more second nodes in a second level as parents to one or more respective child first nodes to represent Sets in respective Rooms of the hierarchy, wherein the nodes in the hierarchy are assigned respective names;
  
  the one or more servers receiving, via a network interface of the one or more servers, data representing the audio recording;
  
  the one or more servers processing the audio recording to determine one or more voice commands within the voice input, wherein processing the audio recording comprises;
  
  determining, based on the data structure representing the media playback system, that one or more first voice commands within the voice input represent respective target variables indicating one or more particular nodes of the data structure, each target variable referencing a name of a respective node of the data structure; and
  
  determining that one or more second voice commands within the voice input correspond to one or more playback commands; and
  
  the one or more servers causing, via the network interface of the one or more servers, one or more particular playback devices to play back audio content according to the one or more playback commands, wherein the one or more particular playback devices include (a) all playback devices represented by the one or more particular nodes of the data structure and (b) all playback devices represented by child nodes of the one or more particular nodes of the data structure.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The method of claim 11, wherein determining that the one or more first voice commands within the voice input represent respective target variables comprises determining that the one or more first voice commands within the voice input represent a target variable referencing a name of a particular second node representing a particular Room, the particular Room including a first Set consisting of a first playback device and a second Set consisting of a second playback device, and wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing the first playback device and the second playback device to play back the audio content in synchrony.
  - 13. The method of claim 11, wherein determining that the one or more first voice commands within the voice input represent respective target variables comprises determining that the one or more first voice commands within the voice input represent a target variable referencing a name of a particular first node representing a particular Set, the particular Set consisting of a first playback device and a second playback device in a bonded zone, and wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing the first playback device and the second playback device to play back respective channels of the audio content in synchrony.
  - 14. The method of claim 11, wherein the data structure further comprises one or more third nodes in a third level as parents to one or more respective child second nodes to represent Rooms in respective Areas of the hierarchy, wherein determining that the one or more first voice commands within the voice input represent respective target variables referencing one or more respective names corresponding to one or more particular nodes of the data structure comprises determining that the one or more first voice commands within the voice input represent a target variable referencing a name of a particular third node representing an Area including a first Room and a second Room, the first Room including a first Set that consists of a first playback device and the second Room including a second Set that consists of a second playback device, and wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing the first playback device and the second playback device to play back the audio content in synchrony.
  - 15. The method of claim 14, wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing the first playback device and the second playback device to form a synchrony group.
  - 16. The method of claim 11, wherein determining that the one or more first voice commands within the voice input represent respective target variables comprises determining that the one or more first voice commands within the voice input represent a target variable referencing a name of the root node, and wherein causing the one or more particular playback devices to play back audio content according to the one or more playback commands comprises causing all playback devices represented by nodes within the data structure to play back the audio content in synchrony.
  - 17. The method of claim 11, wherein determining that one or more first voice commands within the voice input represent respective target variables comprises:
    - searching the nodes of the data structure for nodes having assigned names that match target variables represented by the one or more first voice commands; and
      
      determining that the one or more first voice commands match names assigned to the one or more particular nodes.
  - 18. The method of claim 11, wherein the data structure defines a tree, and wherein determining that one or more first voice commands within the voice input represent respective target variables comprises:
    - traversing the tree to search for nodes having assigned names that match target variables represented by the one or more first voice commands; and
      
      determining that the one or more first voice commands match names assigned to the one or more particular nodes.
  - 19. The method of claim 18, wherein traversing the tree to search for nodes comprises traversing the tree in level order beginning with the root node.

20. A method to be performed by a system comprising one or more servers of a voice assistant service and a network microphone device (NMD) of a media playback system comprising multiple devices connected via a local area network, wherein the method comprises:
- the NMD recording, via a microphone array, audio into a buffer;
  
  the NMD monitoring the recorded audio for wake-words; and
  
  when a wake-word is detected in the recorded audio, the NMD sending, via a network interface to the voice assistant service, data representing an audio recording from the buffer of the NMD, the audio recording comprising a voice input following the detected wake-word within the buffer; and
  
  the one or more servers storing a data structure comprising nodes in a hierarchy representing the media playback system, wherein the data structure comprises (i) a root node representing the media playback system as a Home of the hierarchy, (ii) one or more first nodes in a first level, the first nodes representing respective devices of the media playback system as Sets of the hierarchy, and (ii) one or more second nodes in a second level as parents to one or more respective child first nodes to represent Sets in respective Rooms of the hierarchy, wherein the nodes in the hierarchy are assigned respective names;
  
  the one or more servers receiving, via a network interface of the one or more servers, data representing the audio recording;
  
  the one or more servers processing the audio recording to determine one or more voice commands within the voice input, wherein processing the audio recording comprises;
  
  determining, based on the data structure representing the media playback system, that one or more first voice commands within the voice input represent respective target variables indicating one or more particular nodes of the data structure, each target variable referencing a name of a respective node of the data structure; and
  
  determining that one or more second voice commands within the voice input correspond to one or more playback commands; and
  
  the one or more servers causing, via the network interface of the one or more servers, one or more particular playback devices to play back audio content according to the one or more playback commands, wherein the one or more particular playback devices include (a) all playback devices represented by the one or more particular nodes of the data structure and (b) all playback devices represented by child nodes of the one or more particular nodes of the data structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sonos, Inc.
Original Assignee
Sonos, Inc.
Inventors
Lambourne, Robert, Wilberding, Dayn, Torgerson, Jeffrey
Primary Examiner(s)
Abebe, Daniel

Application Number

US16/216,357
Publication Number

US 20190287522A1
Time in Patent Office

686 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

H04R 1/406   microphones

H04R 2227/005   Audio distribution systems ...

H04R 2430/01   Aspects of volume control, ...

H04R 27/00   Public address systems circ...

H04R 3/005   for combining the signals o...

H04R 3/12   for distributing signals to...

H04R 5/04   Circuit arrangements, e.g. ...

H04S 2400/13   Aspects of volume control, ...

H04S 7/302   Electronic adaptation of st...

H04S 7/40   Visual indication of stereo...

Home graph

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

439 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Home graph

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

439 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links