Voice interaction architecture with intelligent background noise cancellation
First Claim
1. A system comprising:
- a voice controlled assistant having a microphone to receive voice input and background noise;
the voice controlled assistant further having a network interface to transmit aggregated audio data representing the voice input and the background noise over a network;
a command response system remote from the voice controlled assistant and communicatively coupled to the voice controlled assistant to receive the aggregated audio data from the voice controlled assistant via the network, the command response system configured to;
identify a source of the background noise at least by;
identifying first audio content from the background noise;
sending a request to a remote server for second audio content that is associated with the first audio content; and
receiving the second audio content from the remote server;
remove, using the second audio content, at least a part of the background noise from the aggregated audio data;
identify the voice input;
produce an audio response for the voice controlled assistant, the audio response representative of a speech;
send the audio response over the network to the voice controlled assistant; and
the voice controlled assistant being configured to receive the audio response and to audibly emit the audio response representative of the speech through a speaker.
4 Assignments
0 Petitions
Accused Products
Abstract
A voice interaction architecture has a hands-free, electronic voice controlled assistant that permits users to verbally request information from cloud services. The voice controlled assistant may be positioned in a room to receive voice commands from the user. The voice controlled assistant may also pick up background sources of speech, music, or other noise, such as from a television or stereo system, which may adversely impact the user'"'"'s intended vocal input to the assistant. The assistant transmits the aggregated audio data (user command and background noise) over a network to the cloud services, which implements noise cancellation functionality to remove the background noise while isolating and preserving the user'"'"'s command. Once isolated, the cloud serves can process and interpret the user input to perform some function, and return the response over the network to the voice controlled assistant for audible output to the user.
147 Citations
29 Claims
-
1. A system comprising:
-
a voice controlled assistant having a microphone to receive voice input and background noise; the voice controlled assistant further having a network interface to transmit aggregated audio data representing the voice input and the background noise over a network; a command response system remote from the voice controlled assistant and communicatively coupled to the voice controlled assistant to receive the aggregated audio data from the voice controlled assistant via the network, the command response system configured to; identify a source of the background noise at least by; identifying first audio content from the background noise; sending a request to a remote server for second audio content that is associated with the first audio content; and receiving the second audio content from the remote server; remove, using the second audio content, at least a part of the background noise from the aggregated audio data; identify the voice input; produce an audio response for the voice controlled assistant, the audio response representative of a speech; send the audio response over the network to the voice controlled assistant; and the voice controlled assistant being configured to receive the audio response and to audibly emit the audio response representative of the speech through a speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a network accessible infrastructure of one or more processors and memory accessible by the one or more processors, the network accessible infrastructure residing at a data center location and being configured to receive over a network aggregated audio data from a first device that is at a user-based location distant and separate from the data center location; one or more computer-executable instructions stored in the memory and executable on the one or more processors to; receive the aggregated audio data from the first device, the aggregated audio data representing a voice command from a user and background noise from an environment surrounding the user, the background noise comprising audio data representing speech produced from a second device that is at the user-based location; identify content in the background noise contained in the aggregated audio data by accessing content preferences previously associated with a profile of for the user and compare a portion of audio associated with the content preferences to the background noise; at least partially remove the background noise from the aggregated audio data using the content; and process the voice command extracted from the aggregated audio data after the background noise has been at least partially removed; and a response encoder to generate a response for the first device. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. One or more non-transitory computer readable media storing instructions that, when executed on one or more processors, performs acts comprising:
-
receiving aggregated audio data from a first device, the aggregated audio data containing an audio command from a user and background noise having content emitted from a second device, the background noise comprising audio data representing speech produced from the second device; analyzing content preferences associated with a user account of the user with the content emitted from the second device, the content preference including at least one of television viewing habits of the user or frequently viewed television programs associated with the user; identifying the content emitted from the second device based at least in part on the content preferences; at least partially removing the content emitted from the second device from the aggregated audio data to capture the audio command; processing the audio command to generate a response representative of speech; and sending the response back to the first device. - View Dependent Claims (19, 20, 21, 22, 23)
-
-
24. A method comprising:
-
capturing, by a client device at a first location, aggregated audio data representing an audio command from a user and ambient background noise; transmitting the aggregated audio data from the first location to a second location; identifying, at the second location by a computing system, content contributing to the ambient background noise represented in the aggregated audio data at least by; identifying first audio content from the ambient background noise; sending a request to a remote server for second audio content that is associated with the first audio content; and receiving the second audio content from the remote server; at least partially removing, by the computing system, the ambient background noise from the aggregated audio data using the second audio content; processing, by the computing system, the audio command to generate a response representative of speech; sending the response from the second location back to the first location; and emitting the response in audible form to the user. - View Dependent Claims (25, 26, 27, 28, 29)
-
Specification