Voice-based determination of physical and emotional characteristics of users

US 10,096,319 B1
Filed: 03/13/2017
Issued: 10/09/2018
Est. Priority Date: 03/13/2017
Status: Active Grant

First Claim

Patent Images

1. A speaker device comprising:

a microphone;

at least one memory that stores computer-executable instructions;

at least one processor configured to access the at least one memory and execute the computer-executable instructions to;

receive, using the microphone, first voice input from a user comprising a user utterance;

determine background noise in the first voice data;

determine that the user is in an ambient environment with multiple users;

generate a first tag indicative of a multiple user audience;

process the first voice data of the first voice input using a first signal processing algorithm;

determine that a physical status of the user is abnormal;

select a sore throat physical status for the user;

generate a second tag indicative of the sore throat physical status;

apply a second signal processing algorithm to the first voice data;

determine that an emotional status of the user indicates the user is excited;

select an excited emotional status for the user;

generate a third tag indicative of the excited emotional status;

send a content request comprising the first voice data, the first tag, the second tag, and the third tag to a server, wherein the server determines first audio content for presentation at the speaker device;

receive an indication of the first audio content; and

present the first audio content, wherein targeting criteria for the first audio content comprises the sore throat physical status, the excited emotional status, and the multiple user audience.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, and computer-readable media are disclosed for voice-based determination of physical and emotional characteristics of users. Example methods may include determining first voice data, wherein the first voice data is generated by a user, determining a first real-time user status of the user using the first voice data, generating a first data tag indicative of the first real-time user status, determining first audio content for presentation at a speaker device using the first data tag and the first voice data, and causing presentation of the first audio content via a speaker of the speaker device.

168 Citations

19 Claims

1. A speaker device comprising:
- a microphone;
  
  at least one memory that stores computer-executable instructions;
  
  at least one processor configured to access the at least one memory and execute the computer-executable instructions to;
  
  receive, using the microphone, first voice input from a user comprising a user utterance;
  
  determine background noise in the first voice data;
  
  determine that the user is in an ambient environment with multiple users;
  
  generate a first tag indicative of a multiple user audience;
  
  process the first voice data of the first voice input using a first signal processing algorithm;
  
  determine that a physical status of the user is abnormal;
  
  select a sore throat physical status for the user;
  
  generate a second tag indicative of the sore throat physical status;
  
  apply a second signal processing algorithm to the first voice data;
  
  determine that an emotional status of the user indicates the user is excited;
  
  select an excited emotional status for the user;
  
  generate a third tag indicative of the excited emotional status;
  
  send a content request comprising the first voice data, the first tag, the second tag, and the third tag to a server, wherein the server determines first audio content for presentation at the speaker device;
  
  receive an indication of the first audio content; and
  
  present the first audio content, wherein targeting criteria for the first audio content comprises the sore throat physical status, the excited emotional status, and the multiple user audience.
- View Dependent Claims (2, 3)
- - 2. The speaker device of claim 1, wherein the at least one processor is further configured to access the at least one memory and execute the computer-executable instructions to:
    - apply a third signal processing algorithm to the first voice data;
      
      determine that a language accent of the first voice data indicates the user has a Chinese language accent; and
      
      generate a third tag indicative of the Chinese language accent, wherein the content request further comprises the third tag.
  - 3. The speaker device of claim 1, wherein the at least one processor is further configured to access the at least one memory and execute the computer-executable instructions to:
    - cause the server to determine candidate audio content for presentation, the candidate audio content comprising the first audio content and second audio content;
      
      cause the server to determine first targeting criteria for the first audio content;
      
      cause the server to determine second targeting criteria for the second audio content;
      
      cause the server to determine a first score for the first audio content using the first targeting criteria;
      
      cause the server to determine a second score for the second audio content using the second targeting criteria; and
      
      cause the server to select the first audio content using the first score.

4. A method comprising:
- determining, by one or more computer processors coupled to at least one memory, first voice data, wherein the first voice data is generated by a user;
  
  determining a first real-time user status of the user using the first voice data;
  
  generating a first data tag indicative of the first real-time user status;
  
  determining candidate audio content for presentation using the first data tag, the candidate audio content comprising first audio content and second audio content;
  
  determining that a first score for the first audio content is greater than a second score for the second audio content, wherein the first score is determined using a first targeting criteria, and the second score is determined using a second targeting criteria;
  
  determining the first audio content for presentation at a speaker device; and
  
  causing presentation of the first audio content via a speaker of the speaker device.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 5. The method of claim 4, wherein determining the first real-time user status of the user using the first voice data comprises:
    - applying at least one signal processing algorithm to the first voice data;
      
      determining that an emotional state of the user is abnormal; and
      
      selecting a real-time emotional state of the user, wherein the real-time emotional state is the first real-time user status.
  - 6. The method of claim 4, wherein determining the first real-time user status of the user using the first voice data comprises:
    - applying at least one signal processing algorithm to the first voice data;
      
      determining that a physical state of the user is abnormal; and
      
      selecting a real-time physical state of the user, wherein the real-time physical state is the first real-time user status.
  - 7. The method of claim 4, further comprising sending a content request to a server, the content request comprising the first voice data and the first data tag.
  - 8. The method of claim 7, further comprising:
    - receiving an indication that the first data tag satisfies targeting criteria for the first audio content.
  - 9. The method of claim 4, further comprising determining that the first data tag satisfies targeting criteria for the first audio content.
  - 10. The method of claim 4, further comprising:
    - determining background noise in the first voice data;
      
      determining that the user is in an ambient environment with multiple users using the background noise; and
      
      determining that targeting criteria for the first audio content comprises ambient environments with multiple users.
  - 11. The method of claim 4, wherein the first voice data comprises a request for second audio content, the method further comprising:
    - determining the second audio content; and
      
      causing presentation of the second audio content via the speaker.
  - 12. The method of claim 4, further comprising:
    - determining second voice data after presentation of the first audio content; and
      
      initiating a purchase of an item using the second voice data, sending a notification, or providing additional information.
  - 13. The method of claim 4, wherein determining the first real-time user status of the user using the first voice data comprises:
    - applying at least one signal processing algorithm to the first voice data; and
      
      determining a language accent of the user, wherein the language accent is the first real-time user status.
  - 14. The method of claim 4, further comprising determining a second real-time user status of the user using the first voice data.
  - 15. The method of claim 4, wherein the one or more computer processors coupled to the at least one memory are at the speaker device.
  - 16. The method of claim 4, wherein the first voice data comprises a wakeword.

17. A device comprising:
- at least one memory that stores computer-executable instructions; and
  
  at least one processor configured to access the at least one memory and execute the computer-executable instructions to;
  
  determine first voice data, wherein the first voice data is generated by a user;
  
  determine a first real-time user status of the user using the first voice data;
  
  generate a first data tag indicative of the first real-time user status;
  
  determine candidate audio content for presentation using the first data tag, the candidate audio content comprising first audio content and second audio content;
  
  determine that a first score for the first audio content is greater than a second score for the second audio content, wherein the first score is determined using a first targeting criteria, and the second score is determined using a second targeting criteria;
  
  determine the first audio content for presentation at a speaker device; and
  
  present the first audio content via a speaker.
- View Dependent Claims (18, 19)
- - 18. The device of claim 17, wherein the at least one processor is configured to determine the first real-time user status of the user using the first voice data by accessing the at least one memory and executing the computer-executable instructions to:
    - apply at least one signal processing algorithm to the first voice data;
      
      determine that a physical state of the user is abnormal; and
      
      select a real-time physical state of the user, wherein the real-time physical state is the first real-time user status.
  - 19. The device of claim 17, wherein the at least one processor is further configured to access the at least one memory and execute the computer-executable instructions to:
    - determine background noise in the first voice data;
      
      determine that the user is in an ambient environment with multiple users using the background noise; and
      
      determine that targeting criteria for the first audio content comprises ambient environments with multiple users.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Jin, Huafeng, Wang, Shuo
Primary Examiner(s)
Patel, Shreyans A

Application Number

US15/457,846
Time in Patent Office

575 Days
Field of Search
US Class Current
CPC Class Codes

G06Q 30/0251   Targeted advertisements

G06Q 30/08   Auctions

G10L 25/51   for comparison or discrimin...

G10L 25/63   for estimating an emotional...

G10L 25/66   for extracting parameters r...

Voice-based determination of physical and emotional characteristics of users

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

168 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Voice-based determination of physical and emotional characteristics of users

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

168 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links