System for recognizing and responding to environmental noises

US 10,424,292 B1
Filed: 03/14/2013
Issued: 09/24/2019
Est. Priority Date: 03/14/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

under control of one or more computing systems configured with executable instructions,receiving, from a device, over a network, audio data representing non-conversational noise and speech;

analyzing the first audio data with respect to sound data, wherein the sound data is associated with a category of noise;

determining that at least a portion of the first audio data corresponds to the sound data;

associating the first audio data with one or more instructions, the instructions to cause a specific response from the device and based, at least in part, on the category;

receiving, from the device, over the network, second audio data that represents sound detected by the device;

determining that the second audio data represents the non-conversational noise based, at least in part, on the analyzing of the second audio data with respect to the sound data, wherein the at least the portion of the first audio data that corresponds to the sound data is different than the second audio data that represents the non-conversational noise; and

sending, over the network, the instructions to the device at least partly in response to the determining that the second audio data represents the non-conversational noise.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio controlled assistant captures environmental noise and converts the environmental noise into audio signals. The audio signals are provided to a system which analyzes the audio signals for a plurality of audio prompts, which have been customized for the acoustic environment surrounding the audio controlled assistant by an acoustic modeling system. The system configured to detect the presence of an audio prompt in the audio signals and transmit instructions associated with the detected audio prompt to at least one of the audio controlled assistant or one or more cloud based services, in response.

Citations

24 Claims

1. A method comprising:
- under control of one or more computing systems configured with executable instructions,receiving, from a device, over a network, audio data representing non-conversational noise and speech;
  
  analyzing the first audio data with respect to sound data, wherein the sound data is associated with a category of noise;
  
  determining that at least a portion of the first audio data corresponds to the sound data;
  
  associating the first audio data with one or more instructions, the instructions to cause a specific response from the device and based, at least in part, on the category;
  
  receiving, from the device, over the network, second audio data that represents sound detected by the device;
  
  determining that the second audio data represents the non-conversational noise based, at least in part, on the analyzing of the second audio data with respect to the sound data, wherein the at least the portion of the first audio data that corresponds to the sound data is different than the second audio data that represents the non-conversational noise; and
  
  sending, over the network, the instructions to the device at least partly in response to the determining that the second audio data represents the non-conversational noise.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, further comprising:
    - identifying the non-conversational noise based at least in part on the analyzing of the characteristics.
  - 3. The method of claim 1, further comprising:
    - comparing characteristics of the non-conversational noise to predefined sound pattern templates;
      
      determining that the characteristics correspond to the sound data; and
      
      wherein the instructions are associated with the sound data.
  - 4. The system of claim 1, wherein the non-conversational noise includes noise that has no definition within a dictionary.

5. A device comprising:
- one or more microphones to;
  
  generate first audio data based, at least in part, on first sound detected from an environment in which the device is located, the first audio data representing a non-conversational sound; and
  
  generate second audio data based, at least in part, on second sound detected from the environment, the second audio data representing the non-conversational sound; and
  
  one or more communication interfaces to;
  
  send the first audio data to one or more remote systems;
  
  send the second audio data to the one more remote systems;
  
  receive instructions from the one or more remote systems, wherein the instructions are based at least in part on the non-conversational sound as represented by the second audio data and the first audio data; and
  
  send, based at least in part on receiving the instructions, a control signal to at least one second device within the environment.
- View Dependent Claims (6, 7)
- - 6. The device of claim 5, further comprising one or more output interfaces to output electronic media content, the electronic media content selected based at least in part on the instructions received from the one or more remote systems.
  - 7. The device of claim 5, further comprising one or more speakers to output audio content, and wherein the device attenuates the audio content by an amount specified in the instructions.

8. A system comprising:
- one or more processors; and
  
  one or more computer-readable media storing instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving, from an electronic device, audio data that represents sound detected from an environment, the audio data including a first data portion representative of non-conversational noise and a second data portion representative of speech, the non-conversational noise being sound other than speech;
  
  identifying the first data portion representative of non-conversational noise within the audio data;
  
  analyzing the first data portion representative of the non-conversational noise using sound data; and
  
  storing the first data portion that represents the non-conversational noise based, at least in part, on a similarity threshold between acoustic characteristics of the first data portion representative of the non-conversational noise and the sound data.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The system of claim 8, the acts further comprising identifying the first data portion representative of the non-conversational noise more than a predetermined number of times in the audio data.
  - 10. The system of claim 8, wherein the non-conversational noise is environmental noise.
  - 11. The system of claim 8, the acts further comprising identifying that the first data portion representative of the non-conversational noise has no meaning within a selected language before the analyzing of the signal.
  - 12. The system of claim 8, the acts further comprising:
    - receiving, from the electronic device, additional audio data that represents additional sound detected from the environment;
      
      analyzing the additional audio data with respect to the audio data;
      
      detecting that the additional audio data represents the non-conversational noise; and
      
      sending, to the electronic device, instructions associated with the audio data.

13. A system comprising:
- one or more processors; and
  
  one or more non-transitory computer readable storage media storing instructions that, when executed on the one or more processors, cause the one or more processors to performs acts comprising;
  
  receiving, from a first device, first audio data representing a non-conversational noise and speech;
  
  associating a portion of the first audio data representative of the non-conversational noise with instructions, the instructions to cause specific responses from the device;
  
  receiving, from the device, second audio data representing sound detected by the device;
  
  separating the second audio data into a first portion of the second audio data and a second portion of the second audio data, the first portion of the second audio data representing first sound associated with speech and the second portion of the second audio data representing second sound associated with non-conversational noises;
  
  detecting, based at least in part on the first audio data, that the second portion of the second audio data represents at least the non-conversational noise; and
  
  sending the instructions to a second device, the instructions to cause the second device to perform one or more actions associated with the non-conversational noise.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The system of claim 13, wherein the non-conversational noise includes noises that have no semantic definition within a dictionary.
  - 15. The system of claim 13, the acts further comprising:
    - analyzing the first audio data with respect to sound data, wherein the sound data is associated with a category of noise;
      
      determining that the non-conventional noise represented by the first audio data corresponds to the sound data; and
      
      generating the instructions based, at least in part, on the category.
  - 16. The system of claim 13, the acts further comprising:
    - analyzing the first audio data with respect to a sound data, wherein the sound data is associated with a category of noise;
      
      determining that the non-conventional noise represented by the first audio data corresponds to the sound data; and
      
      generating the instructions based, at least in part, on the category.
  - 17. The system of claim 13, the acts further comprising:
    - analyzing the first audio data with respect to a sound data, wherein the sound data is associated with a category of noise;
      
      determining that the non-conventional noise represented by the first audio data corresponds to the sound data; and
      
      generating the instructions based, at least in part, on the category.

18. A method comprising:
- receiving, from a remote device, audio data representing sound detected from an environment associated with an audio-controlled assistant, the audio data including a first data portion representative of non-speech-related noise and a second data portion representative of speech, the first data portion different than the second data portion;
  
  identifying, from the audio data, the first data portion representative of the non-speech-related noise;
  
  determining that the first data portion occurs more than a threshold number of times within the audio data;
  
  selecting instructions to associate with the first data portion based at least in part on the first data portion occurring more than the threshold number of times and at least in part on a characteristic of the non-speech related noise;
  
  associating the first data portion with the instructions to cause the audio-controlled assistant to perform a specific action in response to a future occurrence of the first data portion; and
  
  storing the first data portion.
- View Dependent Claims (19, 20, 21, 22, 23, 24)
- - 19. The method of claim 18, wherein the determining that the first data portion occurs more than the threshold number of times is within a defined period of time.
  - 20. The method of claim 18, wherein the identifying of the reoccurring noise comprises identifying that the reoccurring noise is represented by the audio data with a predefined periodicity.
  - 21. The method of claim 18, further comprising:
    - receiving additional audio data from the audio-controlled assistant, the additional audio data representing the non-speech related noise;
      
      analyzing the additional audio data with respect to the first data portion;
      
      detecting the additional audio data represents the first data portion; and
      
      sending the instructions associated with the audio data to the audio-controlled assistant.
  - 22. The method of claim 18, further comprising:
    - comparing characteristics of the first data portion to sound data, the sound data being associated with a category of noise; and
      
      the method further comprises;
      
      detecting a match between the sound data and the reoccurring noise; and
      
      assigning the first data portion to the category associated with the sound data.
  - 23. The method of claim 22, wherein the associating of the audio data comprises associating the audio data with the instructions based, at least in part, on the category, the category defining the specific action to be performed by the audio controlled assistant, a cloud-based service or both.
  - 24. The method of claim 22, wherein the detecting of the match comprises detecting the match based at least in part on the characteristics of the first data portion being within a threshold of similarity to the sound data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Thimsen, John Daniel, Hart, Gregory Michael, Thomas, Ryan Paul
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US13/830,222
Time in Patent Office

2,385 Days
Field of Search

704270, 704275
US Class Current
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 25/84   for discriminating voice fr...

System for recognizing and responding to environmental noises

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

System for recognizing and responding to environmental noises

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links