Voice interaction architecture with intelligent background noise cancellation

US 9,947,333 B1
Filed: 02/10/2012
Issued: 04/17/2018
Est. Priority Date: 02/10/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A system comprising:

a voice controlled assistant having a microphone to receive voice input and background noise;

the voice controlled assistant further having a network interface to transmit aggregated audio data representing the voice input and the background noise over a network;

a command response system remote from the voice controlled assistant and communicatively coupled to the voice controlled assistant to receive the aggregated audio data from the voice controlled assistant via the network, the command response system configured to;

identify a source of the background noise at least by;

identifying first audio content from the background noise;

sending a request to a remote server for second audio content that is associated with the first audio content; and

receiving the second audio content from the remote server;

remove, using the second audio content, at least a part of the background noise from the aggregated audio data;

identify the voice input;

produce an audio response for the voice controlled assistant, the audio response representative of a speech;

send the audio response over the network to the voice controlled assistant; and

the voice controlled assistant being configured to receive the audio response and to audibly emit the audio response representative of the speech through a speaker.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice interaction architecture has a hands-free, electronic voice controlled assistant that permits users to verbally request information from cloud services. The voice controlled assistant may be positioned in a room to receive voice commands from the user. The voice controlled assistant may also pick up background sources of speech, music, or other noise, such as from a television or stereo system, which may adversely impact the user'"'"'s intended vocal input to the assistant. The assistant transmits the aggregated audio data (user command and background noise) over a network to the cloud services, which implements noise cancellation functionality to remove the background noise while isolating and preserving the user'"'"'s command. Once isolated, the cloud serves can process and interpret the user input to perform some function, and return the response over the network to the voice controlled assistant for audible output to the user.

147 Citations

29 Claims

1. A system comprising:
- a voice controlled assistant having a microphone to receive voice input and background noise;
  
  the voice controlled assistant further having a network interface to transmit aggregated audio data representing the voice input and the background noise over a network;
  
  a command response system remote from the voice controlled assistant and communicatively coupled to the voice controlled assistant to receive the aggregated audio data from the voice controlled assistant via the network, the command response system configured to;
  
  identify a source of the background noise at least by;
  
  identifying first audio content from the background noise;
  
  sending a request to a remote server for second audio content that is associated with the first audio content; and
  
  receiving the second audio content from the remote server;
  
  remove, using the second audio content, at least a part of the background noise from the aggregated audio data;
  
  identify the voice input;
  
  produce an audio response for the voice controlled assistant, the audio response representative of a speech;
  
  send the audio response over the network to the voice controlled assistant; and
  
  the voice controlled assistant being configured to receive the audio response and to audibly emit the audio response representative of the speech through a speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein the background noise includes content from a television.
  - 3. The system of claim 1, wherein the command response system comprises:
    - one or more processors;
      
      memory accessible by the one or more processors;
      
      one or more computer-executable instructions stored in the memory and executable on the one or more processors to at least partially remove the background noise using an adaptive noise cancellation algorithm.
  - 4. The system of claim 1, wherein the command response system comprises:
    - one or more processors;
      
      memory accessible by the one or more processors; and
      
      a noise source identifier stored in the memory and executable on the one or more processors to identify a source of the background noise.
  - 5. The system of claim 1, wherein the operation performed by the command response system comprises one or more of:
    - forming a search query to include information from the voice input;
      
      performing a look-up for a response associated with the voice input;
      
      initiating a transaction using the voice input;
      
      conducting online commerce;
      
      orrequesting delivery of entertainment content.
  - 6. The system of claim 1, wherein the command response system comprises a natural language processing engine to interpret the voice input prior to performing the operation.
  - 7. The system of claim 1, wherein the command response system is implemented as a network accessible platform that is accessible by the voice controlled assistant over the network.
  - 8. The system of claim 1, wherein the identifying the source of the background noise further comprises determining that the first audio content from the background noise corresponds to stored audio associated with a previously identified source of a previous background noise, the stored audio being stored at the remote server.

9. A system comprising:
- a network accessible infrastructure of one or more processors and memory accessible by the one or more processors, the network accessible infrastructure residing at a data center location and being configured to receive over a network aggregated audio data from a first device that is at a user-based location distant and separate from the data center location;
  
  one or more computer-executable instructions stored in the memory and executable on the one or more processors to;
  
  receive the aggregated audio data from the first device, the aggregated audio data representing a voice command from a user and background noise from an environment surrounding the user, the background noise comprising audio data representing speech produced from a second device that is at the user-based location;
  
  identify content in the background noise contained in the aggregated audio data by accessing content preferences previously associated with a profile of for the user and compare a portion of audio associated with the content preferences to the background noise;
  
  at least partially remove the background noise from the aggregated audio data using the content; and
  
  process the voice command extracted from the aggregated audio data after the background noise has been at least partially removed; and
  
  a response encoder to generate a response for the first device.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The system of claim 9, wherein the background noise includes additional content from the second device.
  - 11. The system of claim 9, wherein the one or more computer-executable instructions are further executable on the one or more processors to maintain the content preferences for the user, the content preferences comprising at least one of television viewing patterns of the user, most frequently viewed television programs, most frequently played music, or most frequently played video games.
  - 12. The system of claim 9, wherein the one or more computer-executable instructions are further executable on the one or more processors to analyze the background noise from the aggregated audio data and discern a signature of the background noise to be used to identify the content of the background noise.
  - 13. The system of claim 9, wherein the one or more computer-executable instructions are further executable on the one or more processors to retrieve the content.
  - 14. The system of claim 9, wherein the one or more computer-executable instructions are further executable by the one or more processors to apply an adaptive noise cancellation algorithm to at least partially remove the background noise from the aggregated audio data.
  - 15. The system of claim 9, wherein the one or more computer-executable instructions are further executable by the one or more processors to convert the voice command from audio to text data.
  - 16. The system of claim 9, wherein the one or more computer-executable instructions are further executable by the one or more processors to:
    - form a search query to include information from the voice command;
      
      perform a look-up for a response associated with the voice command;
      
      initiate a transaction using the voice command;
      
      conduct online commerce;
      
      orrequest delivery of entertainment content.
  - 17. The system of claim 9, wherein the response encoder is stored in the memory.

18. One or more non-transitory computer readable media storing instructions that, when executed on one or more processors, performs acts comprising:
- receiving aggregated audio data from a first device, the aggregated audio data containing an audio command from a user and background noise having content emitted from a second device, the background noise comprising audio data representing speech produced from the second device;
  
  analyzing content preferences associated with a user account of the user with the content emitted from the second device, the content preference including at least one of television viewing habits of the user or frequently viewed television programs associated with the user;
  
  identifying the content emitted from the second device based at least in part on the content preferences;
  
  at least partially removing the content emitted from the second device from the aggregated audio data to capture the audio command;
  
  processing the audio command to generate a response representative of speech; and
  
  sending the response back to the first device.
- View Dependent Claims (19, 20, 21, 22, 23)
- - 19. The one or more non-transitory computer readable media of claim 18, wherein transmitting the response comprises transmitting a response that is to be emitted in audible form to the user.
  - 20. The one or more non-transitory computer readable media of claim 18, wherein identifying the content from the second device further comprises searching an electronic programming guide for a source of the content.
  - 21. The one or more non-transitory computer readable media of claim 18, wherein identifying the content from the second device further comprises deriving a signature from the content and using the signature to identify the content.
  - 22. The one or more non-transitory computer readable media of claim 18, wherein at least partially removing the content from the aggregated audio data comprises applying an adaptive noise cancellation algorithm.
  - 23. The one or more non-transitory computer readable media of claim 18, wherein processing the audio command comprises at least one of:
    - forming a search query to include information from the audio command;
      
      performing a look-up for a response associated with the audio command;
      
      initiating a transaction using the audio command;
      
      conducting online commerce;
      
      orrequesting delivery of entertainment content.

24. A method comprising:
- capturing, by a client device at a first location, aggregated audio data representing an audio command from a user and ambient background noise;
  
  transmitting the aggregated audio data from the first location to a second location;
  
  identifying, at the second location by a computing system, content contributing to the ambient background noise represented in the aggregated audio data at least by;
  
  identifying first audio content from the ambient background noise;
  
  sending a request to a remote server for second audio content that is associated with the first audio content; and
  
  receiving the second audio content from the remote server;
  
  at least partially removing, by the computing system, the ambient background noise from the aggregated audio data using the second audio content;
  
  processing, by the computing system, the audio command to generate a response representative of speech;
  
  sending the response from the second location back to the first location; and
  
  emitting the response in audible form to the user.
- View Dependent Claims (25, 26, 27, 28, 29)
- - 25. The method of claim 24, wherein identifying the content further comprises deriving a signature from the content and using the signature to identify the content.
  - 26. The method of claim 24, wherein identifying the content further comprises searching remote systems at a third location to determine a match to the content.
  - 27. The method of claim 24, wherein removing the background noise comprises applying an adaptive noise cancellation algorithm.
  - 28. The method of claim 24, wherein processing the audio command comprises at least one of:
    - forming a search query to include information from the audio command;
      
      performing a look-up for a response associated with the audio command;
      
      initiating a transaction using the audio command;
      
      conducting online commerce;
      
      orrequesting delivery of entertainment content.
  - 29. The method of claim 24, wherein the content comprises television programming, and identifying the content further comprises searching an electronic programming guide for a source of the content and retrieving the content from one of the source or another location.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
David, Tony
Primary Examiner(s)
Washburn, Daniel
Assistant Examiner(s)
Nguyen, Timothy

Application Number

US13/371,294
Time in Patent Office

2,258 Days
Field of Search

704226, 704275
US Class Current
CPC Class Codes

G10L 21/0208   Noise filtering

G10L 25/06   the extracted parameters be...

G10L 25/18   the extracted parameters be...

G10L 25/51   for comparison or discrimin...

Voice interaction architecture with intelligent background noise cancellation

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

147 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Voice interaction architecture with intelligent background noise cancellation

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

147 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links