Voice detection by multiple devices

US 10,297,256 B2
Filed: 12/10/2018
Issued: 05/21/2019
Est. Priority Date: 07/15/2016
Status: Active Grant

First Claim

Patent Images

1. A system comprising one or more servers of a voice assistant service, wherein the one or more servers are configured to communicate with multiple network microphone devices, wherein the multiple networked microphone devices (NMDs) are communicatively coupled to one another via a local area network, and wherein:

each NMD is configured to perform operations comprising;

recording, via a respective microphone array, audio into a buffer of the respective NMD;

monitoring the recorded audio for wake-words; and

when a wake-word is detected in the recorded audio, sending, via a respective network interface to a voice assistant service, data representing an audio recording from the buffer of the respective NMD, the audio recording representing a portion of the recorded audio including the detected wake-word as recorded by the respective NMD; and

the one or more servers are configured to perform operations comprising;

receiving, via a network interface of the one or more servers, data representing multiple audio recordings of a voice input spoken by a given user, each audio recording recorded by a respective NMD of the multiple NMDs, wherein the voice input comprises the detected wake-word;

based on respective sound pressure levels of the multiple audio recordings of the voice input, (i) selecting a particular NMD of the multiple NMDs and (ii) foregoing selection of other NMDs of the multiple NMDs; and

after the selecting, sending, via the network interface to the particular NMD, data representing a playback command that corresponds to a voice command following the wake-word in the voice input represented in the multiple audio recordings, wherein the data representing the playback command causes the particular NMD to play back audio content according to the playback command via one or more amplifiers configured to drive one or more speakers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are example techniques for voice detection by multiple NMDs. An example implementation may involve one or more servers receiving, via a network interface, data representing multiple audio recordings of a voice input spoken by a given user, each audio recording recorded by a respective NMD of the multiple NMDs, wherein the voice input comprises a detected wake-word. Based on respective sound pressure levels of the multiple audio recordings of the voice input, the servers (i) select a particular NMD of the multiple NMDs and (ii) forego selection of other NMDs of the multiple NMDs. The servers send, via the network interface to the particular NMD, data representing a playback command that corresponds to a voice command in the voice input represented in the multiple audio recordings, wherein the data representing the playback command causes the particular NMD to play back audio content according to the playback command.

Citations

21 Claims

1. A system comprising one or more servers of a voice assistant service, wherein the one or more servers are configured to communicate with multiple network microphone devices, wherein the multiple networked microphone devices (NMDs) are communicatively coupled to one another via a local area network, and wherein:
- each NMD is configured to perform operations comprising;
  
  recording, via a respective microphone array, audio into a buffer of the respective NMD;
  
  monitoring the recorded audio for wake-words; and
  
  when a wake-word is detected in the recorded audio, sending, via a respective network interface to a voice assistant service, data representing an audio recording from the buffer of the respective NMD, the audio recording representing a portion of the recorded audio including the detected wake-word as recorded by the respective NMD; and
  
  the one or more servers are configured to perform operations comprising;
  
  receiving, via a network interface of the one or more servers, data representing multiple audio recordings of a voice input spoken by a given user, each audio recording recorded by a respective NMD of the multiple NMDs, wherein the voice input comprises the detected wake-word;
  
  based on respective sound pressure levels of the multiple audio recordings of the voice input, (i) selecting a particular NMD of the multiple NMDs and (ii) foregoing selection of other NMDs of the multiple NMDs; and
  
  after the selecting, sending, via the network interface to the particular NMD, data representing a playback command that corresponds to a voice command following the wake-word in the voice input represented in the multiple audio recordings, wherein the data representing the playback command causes the particular NMD to play back audio content according to the playback command via one or more amplifiers configured to drive one or more speakers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein the one or more servers are configured to perform operations further comprising:
    - based on the respective sound pressure levels of the multiple audio recordings of the voice input, sending, via the network interface of the one or more servers to the other NMDs of the multiple NMDs, one or more respective instructions to stop sending the data representing their respective audio recordings of the multiple audio recordings.
  - 3. The system of claim 1, wherein the one or more servers are configured to perform operations further comprising:
    - processing a particular audio recording of the multiple audio recordings to determine the playback command that corresponds to the voice command within the voice input, the particular audio recording recorded by the particular NMD, wherein other audio recordings of the multiple audio recordings are not processed.
  - 4. The system of claim 1, wherein selecting the particular NMD of the multiple NMDs comprises determining that a sound pressure level of a particular audio recording of the multiple audio recordings has a higher sound pressure level than other audio recording of the multiple audio recordings, the particular audio recording recorded by the particular NMD.
  - 5. The system of claim 1, wherein the multiple networked microphone devices comprise a first NMD and a second NMD, and wherein receiving the data representing multiple audio recordings of the wake-word spoken by the given user comprises:
    - receiving, via the network interface of the one or more servers, a first data stream representing a first audio recording of the voice input spoken by a given user, the first audio recording stored in a buffer of the first NMD; and
      
      receiving, via the network interface of the one or more servers, a second data stream representing a second audio recording of the voice input spoken by a given user, the second audio recording stored in a buffer of the second NMD.
  - 6. The system of claim 1, wherein receiving the data representing multiple audio recordings of the voice input spoken by the given user comprises receiving, via the network interface from the multiple NMDs, respective queries to the voice assistant service with the voice input.
  - 7. The system of claim 1, wherein a first zone of a media playback system includes the particular NMD, and wherein the first zone is configured into a zone group with a second zone that includes one or more additional playback devices, and wherein playing back the audio content according to the playback command comprises playing back the audio content in synchrony with one or more additional playback devices of the second zone.
  - 8. The system of claim 1, wherein a first zone of a media playback system includes the particular NMD and one or more additional NMDs in a bonded zone configuration in which the particular NMD and the one or more additional NMDs play respective channels of the audio content, and wherein playing back the audio content according to the playback command comprises playing back a first channel of the audio content in synchrony with the one or more additional NMDs playing back respective second channels of the audio content.

9. A method to be performed by one or more servers of a voice assistant service, the method comprising:
- receiving, via a network interface of the one or more servers, data representing multiple audio recordings of a voice input spoken by a given user, each audio recording recorded by a respective NMD of multiple networked microphone devices (NMDs) connected via a local area network, wherein the voice input comprises a wake-word detected by the multiple NMDs;
  
  based on respective sound pressure levels of the multiple audio recordings of the voice input, (i) selecting a particular NMD of the multiple NMDs and (ii) foregoing selection of other NMDs of the multiple NMDs; and
  
  after the selecting, sending, via the network interface to the particular NMD, data representing a playback command that corresponds to a voice command following the wake-word in the voice input represented in the multiple audio recordings, wherein the data representing the playback command causes the particular NMD to play back audio content according to the playback command via one or more amplifiers configured to drive one or more speakers.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9, further comprising:
    - based on the respective sound pressure levels of the multiple audio recordings of the voice input, sending, via the network interface of the one or more servers to the other NMDs of the multiple NMDs, one or more respective instructions to stop sending the data representing their respective audio recordings of the multiple audio recordings.
  - 11. The method of claim 9, further comprising:
    - processing a particular audio recording of the multiple audio recordings to determine the playback command that corresponds to the voice command within the voice input, the particular audio recording recorded by the particular NMD, wherein other audio recordings of the multiple audio recordings are not processed.
  - 12. The method of claim 9, wherein selecting the particular NMD of the multiple NMDs comprises determining that a sound pressure level of a particular audio recording of the multiple audio recordings has a higher sound pressure level than other audio recording of the multiple audio recordings, the particular audio recording recorded by the particular NMD.
  - 13. The method of claim 9, wherein the multiple networked microphone devices comprise a first NMD and a second NMD, and wherein receiving the data representing multiple audio recordings of the wake-word spoken by the given user comprises:
    - receiving, via the network interface of the one or more servers, a first data stream representing a first audio recording of the voice input spoken by a given user, the first audio recording stored in a buffer of the first NMD; and
      
      receiving, via the network interface of the one or more servers, a second data stream representing a second audio recording of the voice input spoken by a given user, the second audio recording stored in a buffer of the second NMD.

14. A method to be performed by a system comprising one or more servers of a voice assistant service, wherein the one or more servers are configured to communicate with multiple network microphone devices, wherein the multiple networked microphone devices (NMDs) are communicatively coupled to one another via a local area network, and wherein:
- each NMD is configured to perform operations comprising;
  
  recording, via a respective microphone array, audio into a buffer of the respective NMD;
  
  monitoring the recorded audio for wake-words; and
  
  when a wake-word is detected in the recorded audio, sending, via a respective network interface to a voice assistant service, data representing an audio recording from the buffer of the respective NMD, the audio recording representing a portion of the recorded audio including the detected wake-word as recorded by the respective NMD; and
  
  the method comprises;
  
  receiving, via a network interface of the one or more servers, data representing multiple audio recordings of a voice input spoken by a given user, each audio recording recorded by a respective NMD of the multiple NMDs, wherein the voice input comprises the detected wake-word;
  
  based on respective sound pressure levels of the multiple audio recordings of the voice input, (i) selecting a particular NMD of the multiple NMDs and (ii) foregoing selection of other NMDs of the multiple NMDs; and
  
  after the selecting, sending, via the network interface to the particular NMD, data representing a playback command that corresponds to a voice command following the wake-word in the voice input represented in the multiple audio recordings, wherein the data representing the playback command causes the particular NMD to play back audio content according to the playback command via one or more amplifiers configured to drive one or more speakers.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
- - 15. The method of claim 14, further comprisingbased on the respective sound pressure levels of the multiple audio recordings of the voice input, sending, via the network interface of the one or more servers to the other NMDs of the multiple NMDs, one or more respective instructions to stop sending the data representing their respective audio recordings of the multiple audio recordings.
  - 16. The method of claim 14, further comprisingprocessing a particular audio recording of the multiple audio recordings to determine the playback command that corresponds to the voice command within the voice input, the particular audio recording recorded by the particular NMD, wherein other audio recordings of the multiple audio recordings are not processed.
  - 17. The method of claim 14, further comprising, wherein selecting the particular NMD of the multiple NMDs comprises determining that a sound pressure level of a particular audio recording of the multiple audio recordings has a higher sound pressure level than other audio recording of the multiple audio recordings, the particular audio recording recorded by the particular NMD.
  - 18. The method of claim 14, wherein the multiple networked microphone devices comprise a first NMD and a second NMD, and wherein receiving the data representing multiple audio recordings of the wake-word spoken by the given user comprises:
    - receiving, via the network interface of the one or more servers, a first data stream representing a first audio recording of the voice input spoken by a given user, the first audio recording stored in a buffer of the first NMD; and
      
      receiving, via the network interface of the one or more servers, a second data stream representing a second audio recording of the voice input spoken by a given user, the second audio recording stored in a buffer of the second NMD.
  - 19. The method of claim 14, wherein receiving the data representing multiple audio recordings of the voice input spoken by the given user comprises receiving, via the network interface from the multiple NMDs, respective queries to the voice assistant service with the voice input.
  - 20. The method of claim 14, wherein a first zone of a media playback system includes the particular NMD, and wherein the first zone is configured into a zone group with a second zone that includes one or more additional playback devices, and wherein playing back the audio content according to the playback command comprises playing back the audio content in synchrony with one or more additional playback devices of the second zone.
  - 21. The method of claim 14, wherein a first zone of a media playback system includes the particular NMD and one or more additional NMDs in a bonded zone configuration in which the particular NMD and the one or more additional NMDs play respective channels of the audio content, and wherein playing back the audio content according to the playback command comprises playing back a first channel of the audio content in synchrony with the one or more additional NMDs playing back respective second channels of the audio content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sonos, Inc.
Original Assignee
Sonos, Inc.
Inventors
Reilly, Jonathon, Burlingame, Gregory, Butts, Christopher, Kadri, Romi, Lang, Jonathan P.
Primary Examiner(s)
Guerra-Erazo, Edgar X

Application Number

US16/214,666
Publication Number

US 20190108839A1
Time in Patent Office

162 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/02   Feature extraction for spee...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/34   Adaptation of a single reco...

G10L 2015/223   Execution procedure of a sp...

Voice detection by multiple devices

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Voice detection by multiple devices

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links