Audio message extraction

US 10,319,375 B2
Filed: 12/28/2016
Issued: 06/11/2019
Est. Priority Date: 12/28/2016
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

at least one device processor;

memory including instructions that, when executed by the at least one device processor, cause the system to;

receive audio input data from a voice communications device associated with an account, the audio input data corresponding to an utterance received by a microphone of the voice communications device, wherein a beginning of the utterance is identified by the voice communications device in response to a wakeword being detected by the voice communications device;

generate text data from the audio input data by performing automated speech recognition (ASR) on the audio input data;

determine, from the text data, a messaging intent by performing natural language processing (NLP) on the text data;

determine a slot pattern corresponding to the messaging intent, the slot pattern including at least a target slot and a message payload slot;

determine respective portions of the text data that correspond to the target slot and the message payload slot;

identify, based upon the text data corresponding to the target recipient slot and a contact list associated with the voice communications device, a recipient identifier;

determine a first timestamp associated with the message payload slot;

generate, based upon the first timestamp, audio message data including a portion of the audio data corresponding to the text data of the message payload slot; and

send the audio message data for playback on an audio playback device associated with the recipient identifier.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Audio data, corresponding to an utterance spoken by a person within a detection range of a voice communications device, can include an audio message portion. The audio data can be captured and analyzed to determine the intent to send a message. Based at least in part upon that intent, a remaining portion of the audio data can be analyzed to determine the intended message target or recipient, as well as the portion corresponding to the actual message payload. Once determined, the audio file can be trimmed to the message payload, and the message payload of the audio data can be delivered as an audio message to the target recipient.

Citations

19 Claims

1. A system, comprising:
- at least one device processor;
  
  memory including instructions that, when executed by the at least one device processor, cause the system to;
  
  receive audio input data from a voice communications device associated with an account, the audio input data corresponding to an utterance received by a microphone of the voice communications device, wherein a beginning of the utterance is identified by the voice communications device in response to a wakeword being detected by the voice communications device;
  
  generate text data from the audio input data by performing automated speech recognition (ASR) on the audio input data;
  
  determine, from the text data, a messaging intent by performing natural language processing (NLP) on the text data;
  
  determine a slot pattern corresponding to the messaging intent, the slot pattern including at least a target slot and a message payload slot;
  
  determine respective portions of the text data that correspond to the target slot and the message payload slot;
  
  identify, based upon the text data corresponding to the target recipient slot and a contact list associated with the voice communications device, a recipient identifier;
  
  determine a first timestamp associated with the message payload slot;
  
  generate, based upon the first timestamp, audio message data including a portion of the audio data corresponding to the text data of the message payload slot; and
  
  send the audio message data for playback on an audio playback device associated with the recipient identifier.
- View Dependent Claims (2, 3, 4)
- - 2. The system of claim 1, wherein the instructions, when executed further cause the system to:
    - determine a second timestamp corresponding to an end location of the message payload slot; and
      
      include only a portion of the audio input data located between a time of the first timestamp and a time of the second timestamp in the audio message data.
  - 3. The system of claim 1, wherein the instructions, when executed further cause the system to:
    - determine message text corresponding to the audio message data; and
      
      transmit the message text for access via the recipient identifier, wherein the message text is able to be presented with, or separate from, the audio message data.
  - 4. The system of claim 1, wherein the instructions, when executed further cause the system to:
    - receive a request from the audio playback device to an address corresponding to the audio message data; and
      
      send the audio message data for playback on the audio playback device in response to the request.

5. A computer-implemented method, comprising:
- receiving audio input data corresponding to an utterance received by at least one microphone of a voice communications device associated with an account, wherein a beginning of the utterance is identified by the voice communications device in response to a wakeword being detected by the voice communications device;
  
  determining a messaging intent represented by the audio input data;
  
  determining a slot pattern corresponding to the messaging intent, the slot pattern including at least a target slot and a message payload slot;
  
  determining, from the target slot, a recipient identifier represented by the audio input data;
  
  determining a message payload portion that corresponds to the message payload slot;
  
  determining a first time stamp identifying a beginning of the message payload portion; and
  
  generating, for playback on an audio playback device and accessible according to the recipient identifier, audio message data including the message payload portion starting from a location of the first time stamp in the audio input data.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
- - 6. The computer-implemented method of claim 5, further comprising:
    - determining a set of time stamps identifying locations of the target slot and the message payload slot, the set of time stamps including the first time stamp; and
      
      determining the recipient identifier and the message payload portion based upon the locations of the set of time stamps with respect to the audio input data.
  - 7. The computer-implemented method of claim 5, further comprising:
    - generating tokenized text data from the audio input data by performing automated speech recognition (ASR) on the audio input data.
  - 8. The computer-implemented method of claim 7, further comprising:
    - determining at least the messaging intent, and respective words corresponding to the target slot and the message payload slot, by performing natural language processing (NLP) on the tokenized text data.
  - 9. The computer-implemented method of claim 5, further comprising:
    - determining, for the audio input data, an identity of a user having spoken the utterance; and
      
      determining a contact list for the user based upon the identity; and
      
      determining the recipient identifier based upon performing a lookup of a target from the target slot against the contact list for with the user.
  - 10. The computer-implemented method of claim 5, further comprising:
    - receiving media input data including the audio input data and corresponding video input data; and
      
      extracting the audio input data for determining the messaging intent.
  - 11. The computer-implemented method of claim 10, further comprising:
    - including, for playback, a portion of the video input data corresponding to the message payload portion.
  - 12. The computer-implemented method of claim 5, further comprising:
    - determining that the recipient identifier is unable to be determined with at least a minimum level of confidence based on the audio input data;
      
      causing additional audio input data to be received that includes additional identifying information for a target of the messaging intent; and
      
      determining the recipient identifier based upon the additional identifying information.
  - 13. The computer-implemented method of claim 5, further comprising:
    - determining message text corresponding to the message payload slot; and
      
      providing the message text for access via the recipient identifier, wherein the message text is able to be presented with, or separate from, the audio message data.

14. A system, comprising:
- at least one device processor;
  
  memory including instructions that, when executed by the at least one device processor, cause the system to;
  
  receive media input data corresponding to an utterance received by a communications device associated with an account, wherein a beginning of the utterance is identified by the communications device in response to a wakeword being detected by the communications device;
  
  extract audio input data from the media input data;
  
  determine a messaging intent represented by the audio input data;
  
  determine a slot pattern corresponding to the messaging intent, the slot pattern including at least a target slot and a message payload slot;
  
  determine a recipient identifier represented by the audio input data;
  
  determine a message payload portion that corresponds to the message payload slot;
  
  determine a first time stamp identifying a beginning of the message payload portion; and
  
  generate, for playback on a playback device and accessible according to the recipient identifier, media message data including the message payload portion starting from a location of the first time stamp in the audio input data.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The system of claim 14, wherein the instructions, when executed further cause the system to:
    - determine a set of time stamps identifying locations of the target slot and the message payload slot in the audio input data; and
      
      determine the recipient identifier and the message payload portion based upon the locations of the set of time stamps with respect to the audio input data.
  - 16. The system of claim 14, wherein the instructions, when executed further cause the system to:
    - generate tokenized text data from the audio input data by performing automated speech recognition (ASR) on the audio input data; and
      
      determine at least the messaging intent, and respective words corresponding to the target slot and the message payload slot, by performing natural language processing (NLP) on the tokenized text data.
  - 17. The system of claim 14, wherein the instructions, when executed further cause the system to:
    - determine, for the media input data, an identity of a user having spoken the utterance; and
      
      determine a contact list for the user based upon the identity; and
      
      determine the recipient identifier based upon performing a lookup of a target from the target slot against the contact list for with the user.
  - 18. The system of claim 14, wherein the instructions, when executed further cause the system to:
    - determine that the recipient identifier is unable to be determined with at least a minimum level of confidence based on the audio input data;
      
      cause additional audio input data to be received that includes additional identifying information for a target of the messaging intent; and
      
      determine the recipient identifier based upon the additional identifying information.
  - 19. The system of claim 14, wherein the instructions, when executed further cause the system to:
    - extract video input data from the media input data; and
      
      include, for playback on the playback device, a portion of the video input data corresponding to the message payload portion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Fritz, Neil Christopher, Bhagat, Lakshya, Southwood, Scott, Doran, Katelyn, Lounsbury, Brett, Devaraj, Christo Frank
Primary Examiner(s)
Abebe, Daniel

Application Number

US15/392,291
Publication Number

US 20180182380A1
Time in Patent Office

895 Days
Field of Search

704275
US Class Current
CPC Class Codes

G06F 40/295   Named entity recognition

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/088   Word spotting

H04M 7/0042   where the data service is a...

H04W 4/12   Messaging; Mailboxes; Annou...

Audio message extraction

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Audio message extraction

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links