Eyes free entertainment

US 10,225,621 B1
Filed: 12/20/2017
Issued: 03/05/2019
Est. Priority Date: 12/20/2017
Status: Active Grant

First Claim

Patent Images

1. A method for converting audio-video content into audio-only content, the method comprising:

decomposing, by a computer system, the audio-video content into a plurality of frames and a sound component;

creating, by the computer system, an object layer, the creating the object layer comprising;

for each frame in the plurality of frames;

decomposing, by the computer system, the frame into one or more visual objects in the frame, andgenerating, by the computer system, a description of each of the one or more visual objects in the frame to create a plurality of object descriptions;

generating, by the computer system, an object layer audio component based on the plurality of object descriptions;

creating, by the computer system, a sound layer, the creating the sound layer comprising generating a sound layer audio component from the sound component;

creating, by the computer system, a motion layer, the creating the motion layer comprising;

analyzing, by the computer system, each frame in the plurality of frames to identify motion between consecutive frames, andgenerating, by the computer system, a motion layer audio component based on a description of the motion between consecutive frames;

generating, by the computer system, an audio only output of the audio-video content based on the object layer audio component, the sound layer audio component, and the motion layer audio component; and

transmitting, by the computer system, the audio only output to a device of a user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems and methods for converting audio-video content into audio-only content. Audio-video content is readily accessible, but for various reasons users often cannot consume content visually. In those circumstances, for example, when a user is interrupted during a movie to drive to pick up a spouse or child, the user may not want to forego consuming the audio-video content. The audio-video content can be converted into audio only content for the user to aurally consume, allowing the user to consume the content despite interruptions or other reasons for which the audio-video content cannot be consumed visually.

Citations

20 Claims

1. A method for converting audio-video content into audio-only content, the method comprising:
- decomposing, by a computer system, the audio-video content into a plurality of frames and a sound component;
  
  creating, by the computer system, an object layer, the creating the object layer comprising;
  
  for each frame in the plurality of frames;
  
  decomposing, by the computer system, the frame into one or more visual objects in the frame, andgenerating, by the computer system, a description of each of the one or more visual objects in the frame to create a plurality of object descriptions;
  
  generating, by the computer system, an object layer audio component based on the plurality of object descriptions;
  
  creating, by the computer system, a sound layer, the creating the sound layer comprising generating a sound layer audio component from the sound component;
  
  creating, by the computer system, a motion layer, the creating the motion layer comprising;
  
  analyzing, by the computer system, each frame in the plurality of frames to identify motion between consecutive frames, andgenerating, by the computer system, a motion layer audio component based on a description of the motion between consecutive frames;
  
  generating, by the computer system, an audio only output of the audio-video content based on the object layer audio component, the sound layer audio component, and the motion layer audio component; and
  
  transmitting, by the computer system, the audio only output to a device of a user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20)
- - 2. The method for converting audio-video content into audio-only content of claim 1, wherein:
    - creating the object layer further comprises associating source time codes with the object layer audio component;
      
      creating the sound layer further comprises associating the source time codes with the sound layer audio component;
      
      creating the motion layer further comprises associating the source time codes with the motion layer audio component; and
      
      generating the audio only output is further based on the source time codes.
  - 3. The method for converting audio-video content into audio-only content of claim 1, wherein creating the object layer further comprises:
    - identifying, by the computer system, the one or more visual objects using a catalog of object templates.
  - 4. The method for converting audio-video content into audio-only content of claim 1, wherein:
    - creating the object layer further comprises, for each frame of the plurality of frames;
      
      analyzing, by the computer system, the acuteness of each of the one or more visual objects in the frame,assigning, by the computer system, a relative acuteness to each of the one or more visual objects in the frame, andidentifying, by the computer system, focal point objects based on the relative acuteness of each of the one or more visual objects in the frame; and
      
      generating a description of each of the one or more visual objects in the frame to create a plurality of object descriptions comprises generating, by the computer system, a more detailed description of focal point objects than of background objects.
  - 5. The method for converting audio-video content into audio-only content of claim 1, wherein generating a description of each of the one or more visual objects in the frame to create a plurality of object descriptions comprises:
    - for each of the one or more visual objects in the frame;
      
      identify, by the computer system, a type of the visual object,select, by the computer system based on the type of the visual object, an analysis template that defines attributes of the visual object, andidentify and use, by the computer system, attribute values for the attributes of the visual object to describe the visual object.
  - 6. The method for converting audio-video content into audio-only content of claim 1, wherein generating an object layer audio component based on the plurality of object descriptions comprises:
    - grouping, by the computer system, the plurality of frames into a plurality of scenes;
      
      converting, by the computer system for each scene, the object descriptions associated with the scene into an audio only message to generate a plurality of audio only messages; and
      
      wherein the object layer audio component is generated using the plurality of audio only messages.
  - 7. The method for converting audio-video content into audio-only content of claim 1, wherein generating a sound layer audio component from the sound component comprises:
    - detecting, by the computer system, a language of speech in the sound component;
      
      converting, by the computer system, the speech to text in the language;
      
      assigning, by the computer system, attributes to each word in the text;
      
      assigning, by the computer system, an emotion to each word based on comparing the attributes for each word to language specific audio templates;
      
      converting, by the computer system, the text to a second language text; and
      
      generating, by the computer system, the sound layer audio component from the second language text based on the emotion assigned to each word.
  - 8. The method for converting audio-video content into audio-only content of claim 1, wherein analyzing each frame in the plurality of frames to identify motion between consecutive frames comprises:
    - for each of the one or more visual objects within each frame;
      
      locating, by the computer system, the visual object within a first frame;
      
      locating, by the computer system, the visual object within a consecutive frame;
      
      comparing, by the computer system, the location of the visual object within the first frame with the location of the visual object within the consecutive frame;
      
      identifying, by the computer system, motion of the visual object based on the comparing the location of the visual object operation; and
      
      determining, by the computer system, a type of the motion based on background objects in the first frame and the consecutive frame.
  - 9. The method for converting audio-video content into audio-only content of claim 8, wherein the type of the motion is one of object motion, camera motion, and editing motion.
  - 10. The method for converting audio-video content into audio-only content of claim 8, wherein analyzing each frame in the plurality of frames to identify motion between consecutive frames further comprises:
    - grouping, by the computer system, the plurality of frames into a plurality of scenes; and
      
      generating, by the computer system, an audio description of motion in each scene based on motion of each visual object within the scene.
  - 11. The method for converting audio-video content into audio-only content of claim 1, wherein generating an audio only output of the audio-video content based on the object layer audio component, the sound layer audio component, and the motion layer audio component comprises:
    - grouping, by the computer system, the plurality of frames into a plurality of scenes; and
      
      formatting, by the computer system for each scene of the plurality of scenes, the associated portion of the object layer audio component, the associated portion of the sound layer audio component, and the associated portion of the motion layer audio component using a smart language algorithm and natural language processing into human-understandable sentences.
  - 12. The method for converting audio-video content into audio-only content of claim 1, further comprising:
    - detecting, by the device of the user, that the user is in motion for a threshold period of time or a threshold distance;
      
      requesting, by the device of the user, an audio-video content conversion to audio only content from the computer system; and
      
      upon receiving the audio only content from the computer system, playing the audio only content.
  - 20. The system for converting audio-video content into audio-only content of claim 12, wherein the instructions that cause the processor to generate an audio only output of the audio-video content based on the object layer audio component, the sound layer audio component, and the motion layer audio component comprises instructions that cause the processor to:
    - group the plurality of frames into a plurality of scenes; and
      
      format, for each scene of the plurality of scenes, the associated portion of the object layer audio component, the associated portion of the sound layer audio component, and the associated portion of the motion layer audio component using a smart language algorithm and natural language processing into human-understandable sentences.

13. A system for converting audio-video content into audio-only content, the system comprising:
- a processor; and
  
  a memory having stored thereon instructions that, when executed by the processor, cause the processor to;
  
  decompose the audio-video content into a plurality of frames and a sound component;
  
  create an object layer, the create the object layer comprising;
  
  for each frame in the plurality of frames;
  
  decompose the frame into one or more visual objects in the frame, andgenerate a description of each of the one or more visual objects in the frame to create a plurality of object descriptions;
  
  generate an object layer audio component based on the plurality of object descriptions;
  
  create a sound layer, the create the sound layer comprising generating a sound layer audio component from the sound component;
  
  create a motion layer, the create the motion layer comprising;
  
  analyze each frame in the plurality of frames to identify motion between consecutive frames, andgenerate a motion layer audio component based on a description of the motion between consecutive frames;
  
  generate an audio only output of the audio-video content based on the object layer audio component, the sound layer audio component, and the motion layer audio component; and
  
  transmit the audio only output to a device of a user.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The system for converting audio-video content into audio-only content of claim 13, wherein:
    - the instructions that cause the processor to create the object layer further comprises instructions that cause the processor to associate source time codes with the object layer audio component;
      
      the instructions that cause the processor to create the sound layer further comprises instructions that cause the processor to associate the source time codes with the sound layer audio component;
      
      the instructions that cause the processor to create the motion layer further comprises instructions that cause the processor to associate the source time codes with the motion layer audio component; and
      
      the generate the audio only output is further based on the source time codes.
  - 15. The system for converting audio-video content into audio-only content of claim 13, wherein:
    - the instructions that cause the processor to create the object layer further comprises instructions that cause the processor to, for each frame of the plurality of frames;
      
      analyze the acuteness of each of the one or more visual objects in the frame,assign a relative acuteness to each of the one or more visual objects in the frame, andidentify focal point objects based on the relative acuteness of each of the one or more visual objects in the frame; and
      
      the instructions that cause the processor to generate a description of each of the one or more visual objects in the frame to create a plurality of object descriptions comprises instructions that cause the processor to generate a more detailed description of focal point objects than of background objects.
  - 16. The system for converting audio-video content into audio-only content of claim 13, wherein the instructions that cause the processor to generate a description of each of the one or more visual objects in the frame to create a plurality of object descriptions comprises instructions that cause the processor to:
    - for each of the one or more visual objects in the frame;
      
      identify a type of the visual object,select, based on the type of the visual object, an analysis template that defines attributes of the visual object, andidentify and use attribute values for the attributes of the visual object to describe the visual object.
  - 17. The system for converting audio-video content into audio-only content of claim 13, wherein the instructions that cause the processor to generate an object layer audio component based on the plurality of object descriptions comprises instructions that cause the processor to:
    - group the plurality of frames into a plurality of scenes;
      
      convert, for each scene, the object descriptions associated with the scene into an audio only message to generate a plurality of audio only messages; and
      
      wherein the object layer audio component is generated using the plurality of audio only messages.
  - 18. The system for converting audio-video content into audio-only content of claim 13, wherein the instructions that cause the processor to generate a sound layer audio component from the sound component comprises instructions that cause the processor to:
    - detect a language of speech in the sound component;
      
      convert the speech to text in the language;
      
      assign attributes to each word in the text;
      
      assign an emotion to each word based on comparing the attributes for each word to language specific audio templates;
      
      convert the text to a second language text; and
      
      generate the sound layer audio component from the second language text based on the emotion assigned to each word.
  - 19. The system for converting audio-video content into audio-only content of claim 13, wherein the instructions that cause the processor to analyze each frame in the plurality of frames to identify motion between consecutive frames comprises instructions that cause the processor to:
    - for each of the one or more visual objects within each frame;
      
      locate the visual object within a first frame;
      
      locate the visual object within a consecutive frame;
      
      compare the location of the visual object within the first frame with the location of the visual object within the consecutive frame;
      
      identify motion of the visual object based on the comparing the location of the visual object operation; and
      
      determine a type of the motion based on background objects in the first frame and the consecutive frame;
      
      group the plurality of frames into a plurality of scenes; and
      
      generate an audio description of motion in each scene based on motion of each object within the scene.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dish Network LLC (Echostar Corporation)
Original Assignee
Dish Network LLC (Echostar Corporation)
Inventors
Newell, Nicholas B., Kodam, Sheshank
Primary Examiner(s)
Chevalier, Robert

Application Number

US15/849,431
Time in Patent Office

440 Days
Field of Search

386285, 386278, 386239, 386248, 386321, 386337, 386338
US Class Current
CPC Class Codes

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G10L 15/26   Speech to text systems G10L...

H04N 21/2335   involving reformatting oper...

H04N 21/23412   for generating or manipulat...

H04N 21/23418   involving operations for an...

H04N 21/234336   by media transcoding, e.g. ...

H04N 21/2353   specifically adapted to con...

H04N 21/43074   of additional data with con...

H04N 21/43078   for seamlessly watching con...

H04N 21/8106   involving special audio dat...

H04N 21/84   Generation or processing of...

H04N 21/8547   involving timestamps for sy...

Eyes free entertainment

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Eyes free entertainment

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links