Audio selection based on user engagement

US 10,462,422 B1
Filed: 04/09/2018
Issued: 10/29/2019
Est. Priority Date: 04/09/2018
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, during an audio-video communication session, audio input data from a microphone array comprising at least two microphones, wherein the audio input data is generated by a first sound source at a first location within an environment and a second sound source at a second location within the environment;

determining a first classification for the first sound source and a second classification for the second sound source;

predicting a first engagement metric for the first sound source and a second engagement metric for the second sound source, wherein;

the first engagement metric is based on the first classification and the second engagement metric is based on the second classification;

the first engagement metric approximates an interest level of a receiving user for the first sound source; and

the second engagement metric approximates an interest level from the receiving user for the second sound source;

determining that the first engagement metric is greater than the second engagement metric;

processing the audio input data to generate an audio output signal, wherein the audio output signal amplifies sound generated by the first sound source and attenuates sound generated by the second sound source; and

sending the audio output signal to a computing device associated with the receiving user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, a method includes receiving audio input data from a microphone array of at least two microphones. The audio input data is generated by a first sound source at a first location and a second sound source at a second location. The method also includes calculating a first engagement metric for the first sound source and a second engagement metric for the second sound source. The first engagement metric approximates an interest level of a receiving user for the first sound source, and the second engagement metric approximates an interest level from the receiving user for the second sound source. The method also includes determining that the first engagement metric is greater than the second engagement metric, and processing the audio input data to generate an audio output signal. The audio output signal may amplify sound generated by the first sound source relative to the second sound source.

Citations

17 Claims

1. A method comprising:
- receiving, during an audio-video communication session, audio input data from a microphone array comprising at least two microphones, wherein the audio input data is generated by a first sound source at a first location within an environment and a second sound source at a second location within the environment;
  
  determining a first classification for the first sound source and a second classification for the second sound source;
  
  predicting a first engagement metric for the first sound source and a second engagement metric for the second sound source, wherein;
  
  the first engagement metric is based on the first classification and the second engagement metric is based on the second classification;
  
  the first engagement metric approximates an interest level of a receiving user for the first sound source; and
  
  the second engagement metric approximates an interest level from the receiving user for the second sound source;
  
  determining that the first engagement metric is greater than the second engagement metric;
  
  processing the audio input data to generate an audio output signal, wherein the audio output signal amplifies sound generated by the first sound source and attenuates sound generated by the second sound source; and
  
  sending the audio output signal to a computing device associated with the receiving user.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the first classification for the first sound source is a human voice, and wherein the second classification for the second sound source is a non-human sound.
  - 3. The method of claim 1, wherein the determining the first classification and the second classification is based on information received from a descriptive model for the audio-video communication session that comprises one or more descriptive characteristics about (1) an environment associated with the audio-video communication session;
    - (2) one or more people within the environment, or (3) one or more contextual elements associated with the audio-video communication session.
  - 4. The method of claim 1, wherein the processing the audio input data comprises acoustically beamforming a first audio input signal generated by the first source and a second audio input signal generated by the second source, wherein the acoustical beamforming comprises time delaying the second audio input signal such that the first sound source is amplified and the second sound source is attenuated.
  - 5. The method of claim 1, wherein the first engagement metric and the second engagement metric are calculated based a descriptive model for the audio-video communication session that comprises one or more descriptive characteristics about (1) an environment associated with the audio-video communication session;
    - (2) one or more people within the environment, or (3) one or more contextual elements associated with the audio-video communication session.
  - 6. The method of claim 1, further comprising:
    - accessing a social graph comprising a plurality of nodes and a plurality of edges connecting the nodes, wherein;
      
      a first node corresponds to the receiving user;
      
      a second node corresponds to an entity associated with the first sound source; and
      
      an edge between the first node and the second node represents a relationship between the receiving user and the entity; and
      
      increasing the first engagement metric based on the edge between the first node and the second node.
  - 7. The method of claim 1, wherein the first engagement is calculated at least in part based on a count of words spoken by the first sound source, a distance between the first sound source and the microphone array, or an amount of time the first sound source has been present in the environment during the audio-video communication session;
    - andthe second engagement is calculated at least in part based on a count of words spoken by the second sound source, a distance between the second sound source and the microphone array, or an amount of time the second sound source has been present in the environment during the audio-video communication session.

8. A computer-readable non-transitory storage medium embodying software that is operable when executed to:
- receive, during an audio-video communication session, audio input data from a microphone array comprising at least two microphones, wherein the audio input data is generated by a first sound source at a first location within an environment and a second sound source at a second location within the environment;
  
  determine a first classification for the first sound source and a second classification for the second sound source;
  
  predict a first engagement metric for the first sound source and a second engagement metric for the second sound source, wherein;
  
  the first engagement metric is based on the first classification and the second engagement metric is based on the second classification;
  
  the first engagement metric approximates an interest level of a receiving user for the first sound source; and
  
  the second engagement metric approximates an interest level from the receiving user for the second sound source;
  
  determine that the first engagement metric is greater than the second engagement metric;
  
  process the audio input data to generate an audio output signal, wherein the audio output signal amplifies sound generated by the first sound source and attenuates sound generated by the second sound source; and
  
  send the audio output signal to a computing device associated with the receiving user.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer-readable non-transitory storage medium of claim 8, wherein the first classification for the first sound source is a human voice, and wherein the second classification for the second sound source is a non-human sound.
  - 10. The media computer-readable non-transitory storage medium of claim 8, wherein the determining the first classification and the second classification is based on information received from a descriptive model for the audio-video communication session that comprises one or more descriptive characteristics about (1) an environment associated with the audio-video communication session;
    - (2) one or more people within the environment, or (3) one or more contextual elements associated with the audio-video communication session.
  - 11. The computer-readable non-transitory storage medium of claim 8, wherein the processing the audio input data comprises acoustically beamforming a first audio input signal generated by the first source and a second audio input signal generated by the second source, wherein the acoustical beamforming comprises time delaying the second audio input signal such that the first sound source is amplified and the second sound source is attenuated.
  - 12. The computer-readable non-transitory storage medium of claim 8, wherein the first engagement metric and the second engagement metric are calculated based a descriptive model for the audio-video communication session that comprises one or more descriptive characteristics about (1) an environment associated with the audio-video communication session;
    - (2) one or more people within the environment, or (3) one or more contextual elements associated with the audio-video communication session.
  - 13. The computer-readable non-transitory storage medium of claim 8, wherein the software is further operable when executed to:
    - access a social graph comprising a plurality of nodes and a plurality of edges connecting the nodes, wherein;
      
      a first node corresponds to the receiving user;
      
      a second node corresponds to an entity associated with the first sound source; and
      
      an edge between the first node and the second node represents a relationship between the receiving user and the entity; and
      
      increase the first engagement metric based on the edge between the first node and the second node.
  - 14. The computer-readable non-transitory storage medium of claim 8, wherein the first engagement is calculated at least in part based on a count of words spoken by the first sound source, a distance between the first sound source and the microphone array, or an amount of time the first sound source has been present in the environment during the audio-video communication session;
    - andthe second engagement is calculated at least in part based on a count of words spoken by the second sound source, a distance between the second sound source and the microphone array, or an amount of time the second sound source has been present in the environment during the audio-video communication session.

15. A system comprising:
- one or more processors; and
  
  a computer-readable non-transitory storage medium coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors to cause the system to;
  
  receive, during an audio-video communication session, audio input data from a microphone array comprising at least two microphones, wherein the audio input data is generated by a first sound source at a first location within an environment and a second sound source at a second location within the environment;
  
  determine a first classification for the first sound source and a second classification for the second sound source;
  
  predict a first engagement metric for the first sound source and a second engagement metric for the second sound source, wherein;
  
  the first engagement metric is based on the first classification and the second engagement metric is based on the second classification;
  
  the first engagement metric approximates an interest level of a receiving user for the first sound source; and
  
  the second engagement metric approximates an interest level from the receiving user for the second sound source;
  
  determine that the first engagement metric is greater than the second engagement metric;
  
  process the audio input data to generate an audio output signal, wherein the audio output signal amplifies sound generated by the first sound source and attenuates sound generated by the second sound source; and
  
  send the audio output signal to a computing device associated with the receiving user.
- View Dependent Claims (16, 17)
- - 16. The system of claim 15, wherein the first classification for the first sound source is a human voice, and wherein the second classification for the second sound source is a non-human sound.
  - 17. The system of claim 15, wherein the determining the first classification and the second classification is based on information received from a descriptive model for the audio-video communication session that comprises one or more descriptive characteristics about (1) an environment associated with the audio-video communication session;
    - (2) one or more people within the environment, or (3) one or more contextual elements associated with the audio-video communication session.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Harrison, Jason Francis, Razzaq, Shahid, Hwang, Eric W.
Primary Examiner(s)
Nguyen, Khai N.

Application Number

US15/949,011
Publication Number

US 20190313054A1
Time in Patent Office

568 Days
Field of Search

181210, 348 1401, 348 1402, 348 1403, 348 1404, 348 1405, 348 1406, 348 1407, 348 1408, 348 1409, 348 141, 348 1411, 348 1412, 348 1413, 348 1414, 348 1515, 348 1416, 381 7111, 381 7114, 381 731, 381 86, 381317, 382275, 37926503, 4554141, 455566, 701 36, 704270
US Class Current
CPC Class Codes

G06F 3/165   Management of the audio str...

G06Q 10/063114   Status monitoring or status...

H04N 7/147   Communication arrangements,...

Audio selection based on user engagement

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Audio selection based on user engagement

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links