Audio message extraction
First Claim
Patent Images
1. A system, comprising:
- at least one device processor;
memory including instructions that, when executed by the at least one device processor, cause the system to;
receive audio input data from a voice communications device associated with an account, the audio input data corresponding to an utterance received by a microphone of the voice communications device, wherein a beginning of the utterance is identified by the voice communications device in response to a wakeword being detected by the voice communications device;
generate text data from the audio input data by performing automated speech recognition (ASR) on the audio input data;
determine, from the text data, a messaging intent by performing natural language processing (NLP) on the text data;
determine a slot pattern corresponding to the messaging intent, the slot pattern including at least a target slot and a message payload slot;
determine respective portions of the text data that correspond to the target slot and the message payload slot;
identify, based upon the text data corresponding to the target recipient slot and a contact list associated with the voice communications device, a recipient identifier;
determine a first timestamp associated with the message payload slot;
generate, based upon the first timestamp, audio message data including a portion of the audio data corresponding to the text data of the message payload slot; and
send the audio message data for playback on an audio playback device associated with the recipient identifier.
1 Assignment
0 Petitions
Accused Products
Abstract
Audio data, corresponding to an utterance spoken by a person within a detection range of a voice communications device, can include an audio message portion. The audio data can be captured and analyzed to determine the intent to send a message. Based at least in part upon that intent, a remaining portion of the audio data can be analyzed to determine the intended message target or recipient, as well as the portion corresponding to the actual message payload. Once determined, the audio file can be trimmed to the message payload, and the message payload of the audio data can be delivered as an audio message to the target recipient.
-
Citations
19 Claims
-
1. A system, comprising:
-
at least one device processor; memory including instructions that, when executed by the at least one device processor, cause the system to; receive audio input data from a voice communications device associated with an account, the audio input data corresponding to an utterance received by a microphone of the voice communications device, wherein a beginning of the utterance is identified by the voice communications device in response to a wakeword being detected by the voice communications device; generate text data from the audio input data by performing automated speech recognition (ASR) on the audio input data; determine, from the text data, a messaging intent by performing natural language processing (NLP) on the text data; determine a slot pattern corresponding to the messaging intent, the slot pattern including at least a target slot and a message payload slot; determine respective portions of the text data that correspond to the target slot and the message payload slot; identify, based upon the text data corresponding to the target recipient slot and a contact list associated with the voice communications device, a recipient identifier; determine a first timestamp associated with the message payload slot; generate, based upon the first timestamp, audio message data including a portion of the audio data corresponding to the text data of the message payload slot; and send the audio message data for playback on an audio playback device associated with the recipient identifier. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method, comprising:
-
receiving audio input data corresponding to an utterance received by at least one microphone of a voice communications device associated with an account, wherein a beginning of the utterance is identified by the voice communications device in response to a wakeword being detected by the voice communications device; determining a messaging intent represented by the audio input data; determining a slot pattern corresponding to the messaging intent, the slot pattern including at least a target slot and a message payload slot; determining, from the target slot, a recipient identifier represented by the audio input data; determining a message payload portion that corresponds to the message payload slot; determining a first time stamp identifying a beginning of the message payload portion; and generating, for playback on an audio playback device and accessible according to the recipient identifier, audio message data including the message payload portion starting from a location of the first time stamp in the audio input data. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system, comprising:
-
at least one device processor; memory including instructions that, when executed by the at least one device processor, cause the system to; receive media input data corresponding to an utterance received by a communications device associated with an account, wherein a beginning of the utterance is identified by the communications device in response to a wakeword being detected by the communications device; extract audio input data from the media input data; determine a messaging intent represented by the audio input data; determine a slot pattern corresponding to the messaging intent, the slot pattern including at least a target slot and a message payload slot; determine a recipient identifier represented by the audio input data; determine a message payload portion that corresponds to the message payload slot; determine a first time stamp identifying a beginning of the message payload portion; and generate, for playback on a playback device and accessible according to the recipient identifier, media message data including the message payload portion starting from a location of the first time stamp in the audio input data. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification