Asynchronous audio messaging

US 10,002,611 B1
Filed: 05/15/2013
Issued: 06/19/2018
Est. Priority Date: 05/15/2013
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors; and

one or more memories coupled to the one or more processors, the one or more memories including instructions that upon execution cause the system to;

receive a first audio signal that includes a representation of words spoken by a user;

generate a first text by performing speech recognition on the first audio signal;

analyze the first text to determine that the first audio signal includes a message to be sent to a recipient;

truncate the first text and the first audio signal to create a second text and a second audio signal as the message to be transmitted to the recipient, wherein truncation to create the second audio signal includes removing a representation of a first portion of the words spoken by the user while retaining a representation of a second portion of the words spoken by the user as the second audio signal;

determine a user profile associated with the user, wherein the user profile is one of a plurality of user profiles;

determine, from plurality of recipients different from the user, the recipient of the message based at least in part on the user profile;

determine an electronic address associated with the recipient of the message;

determine an expiration time associated with the second audio signal based at least in part on an analysis of the first audio signal;

transmit the second audio signal and the second text to the electronic address; and

transmit the expiration time to the electronic address as metadata associated with the second audio signal to update at least a portion of the second audio signal based at least in part on the expiration time of the second audio signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, devices, and techniques may provide asynchronous audio messaging. Asynchronous audio messaging may enable a user to quickly and easily create and transmit a message to a recipient. The user may simply record a message for a recipient. The message may include an indication of the recipient of the message, an action (e.g., to send a message, etc.) and/or other types of information. A messaging module may modify the message to create a modified version of the message and then generate an additional version of the modified message in a different media type. The modified message and the addition version of the modified message may be transmitted to the recipient. In some embodiments, the messaging module may transmit other information such as location information, an expiration, or other information derived from the message to enhance the message.

19 Citations

View as Search Results

22 Claims

1. A system comprising:
- one or more processors; and
  
  one or more memories coupled to the one or more processors, the one or more memories including instructions that upon execution cause the system to;
  
  receive a first audio signal that includes a representation of words spoken by a user;
  
  generate a first text by performing speech recognition on the first audio signal;
  
  analyze the first text to determine that the first audio signal includes a message to be sent to a recipient;
  
  truncate the first text and the first audio signal to create a second text and a second audio signal as the message to be transmitted to the recipient, wherein truncation to create the second audio signal includes removing a representation of a first portion of the words spoken by the user while retaining a representation of a second portion of the words spoken by the user as the second audio signal;
  
  determine a user profile associated with the user, wherein the user profile is one of a plurality of user profiles;
  
  determine, from plurality of recipients different from the user, the recipient of the message based at least in part on the user profile;
  
  determine an electronic address associated with the recipient of the message;
  
  determine an expiration time associated with the second audio signal based at least in part on an analysis of the first audio signal;
  
  transmit the second audio signal and the second text to the electronic address; and
  
  transmit the expiration time to the electronic address as metadata associated with the second audio signal to update at least a portion of the second audio signal based at least in part on the expiration time of the second audio signal.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system as recited in claim 1, wherein determination of the recipient of the message includes determining the recipient from a previous message or determining the recipient from contextual information associated with the message.
  - 3. The system as recited in claim 1, wherein the instructions, upon further execution, cause the system to:
    - receive a third audio signal from another user device;
      
      generate a third text by performing speech recognition on the third audio signal;
      
      truncate the third text and the third audio signal to create a fourth text and a fourth audio signal; and
      
      cause a device associated with the user to announce availability of at least one of the fourth audio signal or the fourth text.
  - 4. The system as recited in claim 1, wherein the first portion of the words spoken by the user includes at least one of a command associated with the message or the recipient associated with the message.
  - 5. The system as recited in claim 1, wherein the truncation of the first text and the first audio signal to create the second text and the second audio signal as the message includes using a natural language understanding algorithm to determine to remove a representation of the first portion of the words spoken by the user.

6. A computer-implemented method comprising:
- under control of one or more computing devices executing instructions,receiving a first audio signal, at least a portion of the first audio signal representing words spoken by a user;
  
  performing at least one of automatic speech recognition or natural language understanding on the at least a portion of the first audio signal to determine first text based at least in part on the words spoken by the user;
  
  creating a second audio signal based at least in part on the first audio signal, at least a portion of the second audio signal representing audio corresponding to at least a portion of the words spoken by the user from the first audio signal;
  
  creating a message associated with the second audio signal, the message including second text corresponding to the at least the portion of the words spoken by the user;
  
  determining a user profile associated with the user, the user profile being one of a plurality of user profiles associated with a device located in a user environment;
  
  determining, from a plurality of recipients different from the user, a recipient based at least in part on the user profile;
  
  determining an expiration time associated with the second audio signal based at least in part on an analysis of the first audio signal;
  
  transmitting the second audio signal and the message to an address associated with the recipient; and
  
  transmitting the expiration time to the address as metadata associated with the second audio signal.
- View Dependent Claims (7, 8, 9, 10, 11, 17, 18, 20, 21)
- - 7. The computer-implemented method as recited in claim 6, wherein the at least the portion of the words spoken by the user is a first portion of the words spoken by the user, the computer-implemented method further comprising truncating the first audio signal to remove a representation of a second portion of the words spoken by the user while preserving a representation of the first portion of the words spoken by the user.
  - 8. The computer-implemented method as recited in claim 6, further comprising determining, from the first audio signal, a command to transmit the message to the recipient, and wherein the message includes the at least the portion of the words spoken by the user and represented in the second audio signal.
  - 9. The computer-implemented method as recited in claim 6, further comprising determining the recipient of the second audio signal based at least in part on a previous message or based at least in part on contextual information associated with the first audio signal.
  - 10. The computer-implemented method as recited in claim 6, further comprising:
    - receiving a third audio signal from the user;
      
      determining that the third audio signal is associated with the first audio signal; and
      
      generating a fourth audio signal including at least a portion of the third audio signal and at least a portion of the second audio signal.
  - 11. The computer-implemented method as recited in claim 6, further comprising:
    - determining the recipient of the second audio signal;
      
      determining, from an address book storing the plurality of recipients, a preferred address associated with the recipient of the second audio signal; and
      
      using the preferred address as the address for the recipient.
  - 17. The computer-implemented method of claim 6, further comprising determining the user profile associated with the user based at least in part on the first audio signal.
  - 18. The computer-implemented method of claim 6, further comprising determining the user profile associated with the user based at least in part on a device identifier associated with the device located in the user environment.
  - 20. The computer-implemented method as recited in claim 6, wherein an individual recipient of the plurality of recipients represents an individual user different from the user, and wherein a plurality of electronic addresses is associated with the individual recipient.
  - 21. The computer-implemented method as recited in claim 6, further comprising determining the expiration time based at least in part on a contextual data associated with the first audio signal.

12. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to:
- receive a first audio signal that includes a representation of words spoken by a user;
  
  perform at least one of automatic speech recognition or natural language understanding on at least a portion of the first audio signal to determine first text based at least in part on the representation of the words spoken by the user;
  
  create a second audio signal based at least in part on the first audio signal, the second audio signal including a representation of audio corresponding to at least a portion of the words spoken by the user from the first audio signal;
  
  create a message associated with the second audio signal, the message including second text corresponding to the representation of the at least the portion of the words spoken by the user;
  
  determine a user profile associated with the user, wherein the user profile is one of a plurality of user profiles;
  
  determine, from a plurality of recipients different from the user, a recipient, wherein the recipient is based at least in part on the user profile;
  
  determine an address associated with the recipient;
  
  determine an expiration time of the second audio signal based at least in part on an analysis of the first audio signal;
  
  transmit the second audio signal and the message to the address associated with the recipient; and
  
  transmit the expiration time to the address as metadata associated with the second audio signal.
- View Dependent Claims (13, 14, 15, 16, 19, 22)
- - 13. The one or more non-transitory computer-readable media as recited in claim 12, wherein creating the second audio signal includes removing at least one of an identification of the recipient or a command from representation of the words spoken by the user.
  - 14. The one or more non-transitory computer-readable media as recited in claim 12, wherein the at least the portion of the words spoken by the user is a first portion of the words spoken by the user, and wherein creating the second audio signal includes truncating the first audio signal to remove a representation of a second portion of the words spoken by the user while preserving a representation of the first portion of the words spoken by the user.
  - 15. The one or more non-transitory computer-readable media as recited in claim 12, wherein generating the message includes using a speech-to-text algorithm to create a text-based message as the message.
  - 16. The one or more non-transitory computer-readable media as recited in claim 12, wherein the instructions, when executed by the one or more processors, cause the one or more processors to:
    - determine a first location of the user;
      
      transmit the first location of the user to the address for association with the second audio signal;
      
      determine a second location of the user that is different than the first location; and
      
      transmit, without input by the user, the second location to the address in association with the second audio signal based on the second location being different than the first location.
  - 19. The one or more non-transitory computer-readable media as recited in claim 12, wherein the instructions, when executed by the one or more processors, cause the one or more processors to perform voice recognition on the first audio signal or the second audio signal to determine the user profile associated with the user.
  - 22. The one or more non-transitory computer-readable media as recited in claim 12, wherein the metadata is first metadata, and wherein the instructions, when executed by the one or more processors, cause the one or more processors to transmit second metadata to the address to provide a visual indication based at least in part on the expiration time of the second audio signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Typrin, Marcello
Primary Examiner(s)
Mishra, Richa

Application Number

US13/895,007
Time in Patent Office

1,861 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

H04L 51/10   Multimedia information

H04L 51/214   using selective forwarding

Asynchronous audio messaging

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

19 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Asynchronous audio messaging

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links