Systems and methods for composition of audio content from multi-object audio

US 10,365,885 B1
Filed: 02/21/2018
Issued: 07/30/2019
Est. Priority Date: 02/21/2018
Status: Active Grant

First Claim

Patent Images

1. A method for composition of audio content comprising:

receiving an input audio feed including one or more objects distributed in multiple frames, wherein an object of interest in the one or more objects is identifiable based on a unique characteristic;

generating a fingerprint of at least a portion of the input audio feed;

retrieving, from a database, a fingerprint of the object of interest;

comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest to detect matched frames that include the fingerprint of the object of interest;

compositing the matched frames to generate a target audio stream having the object of interest; and

wherein the one or more objects distributed in multiple frames further includes a first object of interest and a second object of interest, further comprising;

generating a fingerprint of the first object of interest and a fingerprint of the second object of interest; and

comparing the fingerprint of at least the portion of the input audio feed respectively with;

(i) the fingerprint of the first object of interest, and(ii) the fingerprint of the second object of interest to detect matched frames that include the fingerprint of the first object of interest or the fingerprint of the second object of interest; and

compositing the matched frames to generate the target audio stream having the first object of interest or the second object of interest to be present in common or different frames.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are related to processing of one or more input audio feeds for generation of a target audio stream that includes at least one object of interest to a listener. In some embodiments, the target audio stream may exclusively or primarily include the sound of the object of interest to the listener, without including other persons. This allows a listener to focus on an object of his or her interest and not necessarily have to listen to the performances of other objects in the input audio feed. Some embodiments contemplate multiple audio feeds and/or with multiple objects of interest.

Citations

17 Claims

1. A method for composition of audio content comprising:
- receiving an input audio feed including one or more objects distributed in multiple frames, wherein an object of interest in the one or more objects is identifiable based on a unique characteristic;
  
  generating a fingerprint of at least a portion of the input audio feed;
  
  retrieving, from a database, a fingerprint of the object of interest;
  
  comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest to detect matched frames that include the fingerprint of the object of interest;
  
  compositing the matched frames to generate a target audio stream having the object of interest; and
  
  wherein the one or more objects distributed in multiple frames further includes a first object of interest and a second object of interest, further comprising;
  
  generating a fingerprint of the first object of interest and a fingerprint of the second object of interest; and
  
  comparing the fingerprint of at least the portion of the input audio feed respectively with;
  
  (i) the fingerprint of the first object of interest, and(ii) the fingerprint of the second object of interest to detect matched frames that include the fingerprint of the first object of interest or the fingerprint of the second object of interest; and
  
  compositing the matched frames to generate the target audio stream having the first object of interest or the second object of interest to be present in common or different frames.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest is associated with partial elimination of objects that do not correspond to the object of interest, from the target audio stream.
  - 3. The method of claim 1, wherein comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest is associated with complete elimination of objects that do not correspond to the object of interest, from the target audio stream.
  - 4. The method of claim 1, further comprising:
    - creating a timeline of the input audio feed; and
      
      annotating the timeline of the input audio feed at positions corresponding to positions of the matched frames.
  - 5. The method of claim 4, wherein the annotating allows playing of the target audio stream at the positions of the matched frames.
  - 6. The method of claim 4, wherein the annotating allows selective seeking the positions of the matched frames, without generation of the target audio stream.

7. A non-transitory computer-readable storage medium storing instructions configured for composition of audio content to perform a method comprising:
- receiving an input audio feed including one or more objects distributed in multiple frames, wherein an object of interest in the one or more objects is identifiable based on a unique characteristic;
  
  generating a fingerprint of at least a portion of the input audio feed;
  
  retrieving, from a database, a fingerprint of the object of interest;
  
  comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest to detect matched frames that include the fingerprint of the object of interest;
  
  compositing the matched frames to generate a target audio stream having the object of interest; and
  
  wherein the one or more objects distributed in multiple frames includes a first object of interest and a second object of interest, further comprising;
  
  generating a fingerprint of the first object of interest and a fingerprint of the second object of interest; and
  
  comparing the fingerprint of at least the portion of the input audio feed respectively with;
  
  (i) the fingerprint of the first object of interest and(ii) the fingerprint of the second object of interest to detect matched frames that include the fingerprint of the first object of interest or the fingerprint of the second object of interest; and
  
  compositing the matched frames to generate the target audio stream having the first object of interest or the second object of interest to be present in common or different frames.
- View Dependent Claims (8, 9, 10)
- - 8. The computer-readable storage medium of claim 7, wherein comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest is associated with partial elimination of objects that do not correspond to the object of interest, from the target audio stream.
  - 9. The computer-readable storage medium of claim 7, wherein comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest is associated with complete elimination of objects that do not correspond to the object of interest, from the target audio stream.
  - 10. The computer-readable storage medium of claim 7, the method further comprising:
    - creating a timeline of the input audio feed;
      
      annotating the timeline of the input audio feed at positions corresponding to positions of the matched frames.

11. An apparatus for composition of audio content comprising:
- a memory;
  
  one or more processors electronically coupled to the memory and configured for;
  
  receiving an input audio feed including one or more objects distributed in multiple frames, wherein an object of interest in the one or more objects is identifiable based on a unique characteristic;
  
  generating a fingerprint of at least a portion of the input audio feed;
  
  retrieving, from a database, a fingerprint of the object of interest;
  
  comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest to detect matched frames that include the fingerprint of the object of interest;
  
  compositing the matched frames to generate a target audio stream having the object of interest; and
  
  wherein the one or more objects distributed in multiple frames includes a first object of interest and a second object of interest, further comprising;
  
  generating a fingerprint of the first object of interest and a fingerprint of the second object of interest; and
  
  comparing the fingerprint of at least the portion of the input audio feed respectively with;
  
  (i) the fingerprint of the first object of interest and(ii) the fingerprint of the second object of interest to detect matched frames that include the fingerprint of the first object of interest or the fingerprint of the second object of interest; and
  
  compositing the matched frames to generate the target audio stream having the first object of interest or the second object of interest to be present in common or different frames.
- View Dependent Claims (12, 13, 14)
- - 12. The apparatus of claim 11, wherein comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest is associated with partial elimination of objects that do not correspond to the object of interest, from the target audio stream.
  - 13. The apparatus of claim 11, wherein comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest is associated with complete elimination of objects that do not correspond to the object of interest, from the target audio stream.
  - 14. The apparatus of claim 11, wherein the one or more processors are further configured for:
    - creating a timeline of the input audio feed; and
      
      annotating the timeline of the input audio feed at positions corresponding to positions of the matched frames.

15. A method for composition of audio content comprising:
- receiving an input audio feed including one or more objects distributed in multiple frames, wherein an object of interest in the one or more objects is identifiable based on a unique characteristic;
  
  generating a fingerprint of at least a portion of the input audio feed;
  
  retrieving, from a database, a fingerprint of the object of interest;
  
  comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest to detect matched frames that include the fingerprint of the object of interest; and
  
  compositing the matched frames to generate a target audio stream having the object of interest; and
  
  wherein the input audio feed includes a first feed and a second feed, wherein the matched frames include a first set of matched frames and a second set of matched frames, further comprising;
  
  generating a fingerprint of the first feed and a fingerprint of the second feed;
  
  comparing the fingerprint of the object of interest with the fingerprint of the first feed to detect a first set of matched frames;
  
  comparing the fingerprint of the object of interest with the fingerprint of the second feed to detect a second set of matched frames;
  
  upon detecting a frame in the first set of matched frames has a higher audio quality relative to a corresponding frame in the second set of matched frames, selecting the frame in the first set of matched frames; and
  
  generating the target audio stream having the object of interest, wherein the target audio stream includes the frame in the first set of matched frames.

16. A non-transitory computer-readable storage medium storing instructions configured for composition of audio content to perform a method comprising:
- receiving an input audio feed including one or more objects distributed in multiple frames, wherein an object of interest in the one or more objects is identifiable based on a unique characteristic;
  
  generating a fingerprint of at least a portion of the input audio feed;
  
  retrieving, from a database, a fingerprint of the object of interest;
  
  comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest to detect matched frames that include the fingerprint of the object of interest; and
  
  compositing the matched frames to generate a target audio stream having the object of interest; and
  
  wherein the input audio feed includes a first feed and a second feed, wherein the matched frames include a first set of matched frames and a second set of matched frames, further comprising;
  
  generating a fingerprint of the first feed and a fingerprint of the second feed;
  
  comparing the fingerprint of the object of interest with the fingerprint of the first feed to detect a first set of matched frames; and
  
  comparing the fingerprint of the object of interest with the fingerprint of the second feed to detect a second set of matched frames;
  
  upon detecting a frame in the first set of matched frames has a higher audio quality relative to a corresponding frame in the second set of matched frames, selecting the frame in the first set of matched frames; and
  
  generating the target audio stream having the object of interest, wherein the target audio stream includes the frame in the first set of matched frames.

17. An apparatus for composition of audio content comprising:
- a memory;
  
  one or more processors electronically coupled to the memory and configured for;
  
  receiving an input audio feed including one or more objects distributed in multiple frames, wherein an object of interest in the one or more objects is identifiable based on a unique characteristic;
  
  generating a fingerprint of at least a portion of the input audio feed;
  
  retrieving, from a database, a fingerprint of the object of interest;
  
  comparing the fingerprint of at least the portion of the input audio feed with the fingerprint of the object of interest to detect matched frames that include the fingerprint of the object of interest;
  
  compositing the matched frames to generate a target audio stream having the object of interest; and
  
  wherein the input audio feed includes a first feed and a second feed, wherein the matched frames include a first set of matched frames and a second set of matched frames, further comprising;
  
  generating a fingerprint of the first feed and a fingerprint of the second feed;
  
  comparing the fingerprint of the object of interest with the fingerprint of the first feed to detect a first set of matched frames; and
  
  comparing the fingerprint of the object of interest with the fingerprint of the second feed to detect a second set of matched frames;
  
  upon detecting a frame in the first set of matched frames has a higher audio quality relative to a corresponding frame in the second set of matched frames, selecting the frame in the first set of matched frames; and
  
  generating the target audio stream having the object of interest, wherein the target audio stream includes the frame in the first set of matched frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dish Network Technologies India Private Limited (Echostar Corporation)
Original Assignee
Sling Media Pvt Ltd. (Echostar Corporation)
Inventors
Naik Raikar, Yatish Jayant, Rasool, Mohammed, Pallapothu, Trinadha Harish Babu
Primary Examiner(s)
Gauthier, Gerald

Application Number

US15/901,703
Publication Number

US 20190258450A1
Time in Patent Office

524 Days
Field of Search

700 94, 704246, 704249, 704500, 707722, 707769, 348515, 84625, 345661, 381 22, 381 23, 381300, 455450, 705 1452, 709232, 715202, 715723, 725 1
US Class Current
CPC Class Codes

G06F 3/165 Management of the audio str...

G10L 25/51 for comparison or discrimin...

Systems and methods for composition of audio content from multi-object audio

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for composition of audio content from multi-object audio

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links