Methods and devices for selectively ignoring captured audio data

US 9,691,378 B1
Filed: 11/05/2015
Issued: 06/27/2017
Est. Priority Date: 11/05/2015
Status: Active Grant

First Claim

Patent Images

1. A method for selectively ignoring a set of temporally related sounds that is represented by data stored in memory on an electronic device, the method comprising:

receiving, by the electronic device, audio data representing a word;

receiving a word identifier with the audio data, the word identifier being unique to the word;

receiving a data tag with the audio data, the data tag indicating a start time and an end time for the word within the audio data;

determining that the word identifier is associated with a wakeword that is a series of temporally-related sounds that, when received by a microphone of the electronic device, causes functionality of the electronic device to be activated;

determining a time window during which the word is to be outputted by a speaker of the electronic device by calculating an amount of time between the start time and the end time;

outputting the audio data using the speaker;

determining a hardware delay time associated with processing the audio data for playback, wherein determining the hardware delay time comprises;

determining an output time that the audio data begins to be outputted by the speaker; and

calculating a time difference between a processing time that the audio data begins to be processed for audio playback and the output time;

receiving audio input data using the microphone;

determining an echoing offset time for echoes subsequent to the audio data outputted by the speaker also being detected by the microphone, wherein determining the echoing offset time comprises;

determining an audio receipt time that audio input data is captured by the microphone; and

calculating another time difference between the output time and the audio receipt time;

determining a modified time window by applying the hardware delay time and the echoing offset time to the time window;

determining that a portion of the audio input data represents the wakeword;

determining that a detected time that the portion is detected by the microphone is within the modified time window; and

ignoring the portion such that functionality triggered by the wakeword remains inactive.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for selectively ignoring an occurrence of a wakeword within audio input data is provided herein. In some embodiments, a wakeword may be detected to have been uttered by an individual within a modified time window, which may account for hardware delays and echoing offsets. The detected wakeword that occurs during this modified time window may, in some embodiments, correspond to a word included within audio that is outputted by a voice activated electronic device. This may cause the voice activated electronic device to activate itself, stopping the audio from being outputted. By identifying when these occurrences of the wakeword within outputted audio are going to happen, the voice activated electronic device may selectively determine when to ignore the wakeword, and furthermore, when not to ignore the wakeword.

381 Citations

20 Claims

1. A method for selectively ignoring a set of temporally related sounds that is represented by data stored in memory on an electronic device, the method comprising:
- receiving, by the electronic device, audio data representing a word;
  
  receiving a word identifier with the audio data, the word identifier being unique to the word;
  
  receiving a data tag with the audio data, the data tag indicating a start time and an end time for the word within the audio data;
  
  determining that the word identifier is associated with a wakeword that is a series of temporally-related sounds that, when received by a microphone of the electronic device, causes functionality of the electronic device to be activated;
  
  determining a time window during which the word is to be outputted by a speaker of the electronic device by calculating an amount of time between the start time and the end time;
  
  outputting the audio data using the speaker;
  
  determining a hardware delay time associated with processing the audio data for playback, wherein determining the hardware delay time comprises;
  
  determining an output time that the audio data begins to be outputted by the speaker; and
  
  calculating a time difference between a processing time that the audio data begins to be processed for audio playback and the output time;
  
  receiving audio input data using the microphone;
  
  determining an echoing offset time for echoes subsequent to the audio data outputted by the speaker also being detected by the microphone, wherein determining the echoing offset time comprises;
  
  determining an audio receipt time that audio input data is captured by the microphone; and
  
  calculating another time difference between the output time and the audio receipt time;
  
  determining a modified time window by applying the hardware delay time and the echoing offset time to the time window;
  
  determining that a portion of the audio input data represents the wakeword;
  
  determining that a detected time that the portion is detected by the microphone is within the modified time window; and
  
  ignoring the portion such that functionality triggered by the wakeword remains inactive.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein determining the modified time window further comprises:
    - determining a new start time by adding the time difference to the start time;
      
      determining an initial new end time by adding the time difference to the end time; and
      
      determining a final new end time by adding the additional time difference to the initial new end time.
  - 3. The method of claim 1, wherein determining that the detected time that the portion is detected by the microphone is within the modified time window comprises:
    - determining an initial time that the wakeword is detected within the audio input data;
      
      determining a beginning time and an ending time of the modified time window;
      
      determining that the initial time occurs at a later time than the beginning time; and
      
      determining that the initial time occurs at an earlier time than the ending time.
  - 4. The method of claim 1, further comprising:
    - receiving additional audio input data using the microphone;
      
      determining that the additional audio input data includes an occurrence of the wakeword;
      
      determining that the occurrence of the wakeword was captured outside the modified time window; and
      
      recording the additional audio input data.

5. A method for selectively ignoring a portion of captured audio, the method comprising:
- receiving, by an electronic device, audio data;
  
  receiving, by the electronic device, a data tag associated with a sound to be output based, at least in part, on the audio data;
  
  determining that the sound is a trigger for the electronic device;
  
  determining, based at least in part on the data tag, a time window that the trigger is to be outputted by the audio data;
  
  generating a modified time window based at least in part on at least one offset and the time window;
  
  causing the audio data to be outputted from at least one speaker;
  
  receiving audio input data;
  
  determining that the audio input data includes an occurrence of the trigger;
  
  determining that a time of the occurrence is during the modified time window; and
  
  ignoring a portion of the audio input data received during to the modified time window.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. The method of claim 5, wherein:
    - determining that the audio input data includes the occurrence of the trigger comprises;
      
      monitoring the audio input data captured by at least one audio input device of the electronic device;
      
      analyzing the audio input data;
      
      identifying each word included within the audio input data;
      
      comparing each word that has been identified with the trigger; and
      
      recognizing that one of the words that has been identified is the trigger; and
      
      ignoring comprises;
      
      disabling the at least one audio input device in response to recognizing that one of the words is the trigger.
  - 7. The method of claim 5, wherein ignoring comprises:
    - disabling a trigger detector during the modified time window.
  - 8. The method of claim 5, wherein ignoring comprises:
    - removing power to at least one audio input device of the electronic device during the modified time window.
  - 9. The method of claim 5, wherein ignoring comprises:
    - generating an indication to delete the audio input data received during the modified time window.
  - 10. The method of claim 5, further comprising:
    - determining that the audio data includes an additional occurrence of the trigger;
      
      determining an additional time window for the additional occurrence;
      
      determining that the additional time window occurs at a later time than the modified time window; and
      
      analyzing audio input data received by at least one audio input device of the electronic device during the additional time window.
  - 11. The method of claim 5, further comprising:
    - calculating an amount of time between the audio data being processed by the electronic device and the audio data being outputted by the electronic device, wherein generating further comprises;
      
      determining the modified time window based, at least in part, on a start time and an end time of the time window and the amount of time.
  - 12. The method of claim 5, wherein determining the time window further comprises:
    - determining an amount of time between an end time of the sound within the audio data that is outputted and one of a time of;
      
      a start time of a next word within a phrase or an end time of the audio data.

13. An electronic device, comprising:
- communications circuitry that receives audio data and a data tag associated with a sound to be output based, at least in part, on the audio data;
  
  at least one speaker that outputs the audio data;
  
  at least one audio input device that receives audio input data;
  
  memory that stores a trigger that activates the device; and
  
  at least one processor operable to;
  
  determine that the sound is the trigger;
  
  determine, based at least in part on the data tag, a time window that the trigger is to be outputted by the audio data;
  
  generate a modified time window based at least in part on at least one offset and the time window;
  
  determine that the audio input data received by the at least one audio input device includes an occurrence of the trigger;
  
  determine that a time of the occurrence is during the modified time window; and
  
  ignore a portion of the audio input data received during the modified time window.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The electronic device of claim 13, the at least one processor is further operable to:
    - monitor the audio input data captured by the at least one audio input device;
      
      analyze the audio input data;
      
      identify each word included within the audio input data;
      
      compare each word that has been identified with the trigger;
      
      recognize that one of the words that has been identified is the trigger; and
      
      disable the at least one audio input device in response to recognition of one of the words as the trigger.
  - 15. The electronic device of claim 13, wherein the least one processor is further operable to:
    - disable a trigger detector during the modified time window.
  - 16. The electronic device of claim 13, wherein the at least one processor is further operable to:
    - remove power to the at least one audio input device during the modified time window.
  - 17. The electronic device of claim 13, wherein the at least one processor is further operable to:
    - generate an indication to delete the audio input data corresponding to the occurrence.
  - 18. The electronic device of claim 13, wherein the at least one processor is further operable to:
    - determine that the audio data received by the communications circuitry includes an additional occurrence of the trigger;
      
      determine an additional time window for the additional occurrence;
      
      determine that the additional time window occurs at a later time than the modified time window; and
      
      analyze audio input data received by the at least one audio input device during the additional time window.
  - 19. The electronic device of claim 13, wherein the at least one processor is further operable to:
    - calculate an amount of time between the audio data being processed and the audio data being outputted by the at least one speaker;
      
      determine the modified time window based, at least in part, on a start time and an end time of the time window and the amount of time.
  - 20. The electronic device of claim 13, wherein the at least one processor is further operable to:
    - determine an amount of time between an end time of the sound within the audio data that is outputted and one of a time of;
      
      a start time of a next word within a phrase or an end time of the audio data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Meyers, James David, Piersol, Kurt Wesley
Primary Examiner(s)
Yen, Eric

Application Number

US14/934,069
Time in Patent Office

600 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/08   Speech classification or se...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 2015/088   Word spotting

G10L 2021/02082   the noise being echo, rever...

G10L 21/028   using properties of sound s...

Methods and devices for selectively ignoring captured audio data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

381 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Methods and devices for selectively ignoring captured audio data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

381 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others