Encoding of video stream based on scene type

US 9,554,142 B2
Filed: 01/26/2012
Issued: 01/24/2017
Est. Priority Date: 01/28/2011
Status: Expired due to Fees

First Claim

Patent Images

1. A method for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by a video encoder to encode any given scene type, the method comprising:

receiving an input video stream;

dividing the input video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein the dividing comprises determining a given scene boundary according to relatedness of two temporally contiguous image frames in the input video stream, and wherein the determining comprises;

scaling one or more high frequency elements of each image frame;

converting pixel data in the image frames into frequency coefficients;

removing the one or more high frequency elements of each image frame based on the converted frequency coefficients;

analyzing the image frames to determine a difference between temporally contiguous image frames, wherein a score is computed based on the difference; and

identifying a degree of unrelatedness between the image frames when the score exceeds a preset limit, wherein the preset limit score is at a threshold where a scene change occurs;

determining scene type for each of the plurality of scenes; and

encoding each of the plurality of scenes according to the scene type.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An encoder for encoding a video stream or an image is described herein. The encoder receives an input video stream and outputs an encoded video stream that can be decoded at a decoder to recover, at least approximately, an instance of the input video stream. The encoder encodes a video stream by first identifying scene boundaries and encoding frames between scene boundaries using a set of parameters. For at least two different scene sequences, different sets of parameters are used, providing adaptive, scene-based encoding.

44 Citations

View as Search Results

30 Claims

1. A method for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by a video encoder to encode any given scene type, the method comprising:
- receiving an input video stream;
  
  dividing the input video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein the dividing comprises determining a given scene boundary according to relatedness of two temporally contiguous image frames in the input video stream, and wherein the determining comprises;
  
  scaling one or more high frequency elements of each image frame;
  
  converting pixel data in the image frames into frequency coefficients;
  
  removing the one or more high frequency elements of each image frame based on the converted frequency coefficients;
  
  analyzing the image frames to determine a difference between temporally contiguous image frames, wherein a score is computed based on the difference; and
  
  identifying a degree of unrelatedness between the image frames when the score exceeds a preset limit, wherein the preset limit score is at a threshold where a scene change occurs;
  
  determining scene type for each of the plurality of scenes; and
  
  encoding each of the plurality of scenes according to the scene type.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method for encoding a video stream as recited in claim 1, wherein each scene type is determined based on one or more criteria, the one or more criteria including:
    - a given scene'"'"'s position on the input video stream'"'"'s timeline;
      
      a length of the given scene;
      
      a motion estimation in the given scene;
      
      a effective difference in the given scene from a previous scene;
      
      a spectral data size of the given scene;
      
      a optical character recognition in the given scene;
      
      or a screenplay structure information of the given scene.
  - 3. The method for encoding a video stream as recited in claim 1, wherein the determination of a scene type further comprises utilizing facial recognition.
  - 4. The method of claim 2, wherein the screenplay structure information includes a relative attention parameter, wherein the relative attention parameter approximates a predetermined estimation of a relative amount of viewer attention to be expected for a segment of the input video stream that comprises the given scene.
  - 5. The method of claim 2, wherein the screenplay structure information further includes one or more of:
    - a time range definition;
      
      a textual information from the given scene;
      
      a audio content associated with the given scene;
      
      a close captioning information associated with the given scene;
      
      ora meta data associated with the given scene.
  - 6. The method for encoding a video stream as recited in claim 1, wherein a given scene type includes one or more of:
    - a fast motion;
      
      a static;
      
      a talking head;
      
      a text;
      
      a mostly black images;
      
      a short scenes;
      
      a scroll credits;
      
      a title scene;
      
      a miscellaneous;
      
      ora default.
  - 7. The method for encoding a video stream as recited in claim 1, further comprising:
    - determining that a first image frame is temporally contiguous to a second image frame when the first image frame has at least one adjacent position to the second image frame in the input video stream'"'"'s timeline.
  - 8. The method of claim 1, wherein the difference is tracked by one of a recursive filter or an adaptive filter.
  - 9. The method for encoding a video stream as recited in claim 1, wherein the predetermined encoder parameters includes one or more of:
    - a motion estimation range search;
      
      a deblocking amount factor;
      
      a quantizer;
      
      or a reference frame numbers.

10. A method for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by a video encoder to encode any given scene type, the method comprising:
- receiving an input video stream;
  
  receiving scene boundary information that indicates positions in the input video stream where scene transitions occur, wherein a scene transition is determined based on relatedness of two temporally contiguous image frames in the input video stream;
  
  dividing the input video stream into a plurality of scenes based on the scene boundary information, each scene comprising a plurality of temporally contiguous image frames;
  
  determining scene type for each of the plurality of scenes based on an assessment in a predetermined scale computed by performing a sequential decision-making waterfall process; and
  
  encoding each of the plurality of scenes according to the scene type,the performing of the sequential decision-making waterfall process comprises;
  
  determining a position of a given scene on a timeline of the input video stream to assign a score based on a predetermined scale according to the position;
  
  determining a play-time length of the given scene to assign a score based on a predetermined scale according to the play-time length;
  
  determining a motion estimation in the given scene to assign a score based on a predetermined scale according to a magnitude of a motion vector, wherein the motion estimation is a measure of the magnitude of the motion vector;
  
  determining a difference in the given scene from a previous scene to assign a score based on a predetermined scale according to the difference;
  
  determining a spectral data size of the given scene to assign a score based on a predetermined scale according to the spectral data size;
  
  identifying facial structures utilizing facial recognition to assign a score based on a predetermined scale according to a number of the facial structures;
  
  identifying textual information using optical character recognition in the given scene to assign a score based on a predetermined scale according to an amount of content of the textual information; and
  
  determining a level of audience interest from screenplay structure information of the given scene to assign a score based on a predetermined scale according to the level of audience interest.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, wherein the screenplay structure information includes a relative attention parameter, wherein the relative attention parameter approximates a predetermined estimation of a relative amount of viewer attention to be expected for a segment of the input video stream that comprises the given scene.
  - 12. The method of claim 10, wherein the screenplay structure information further includes one or more of:
    - a time range definition;
      
      a textual information from the given scene;
      
      a audio content associated with the given scene;
      
      a close captioning information associated with the given scene;
      
      ora meta data associated with the given scene.
  - 13. The method for encoding a video stream as recited in claim 10, wherein a given scene type includes one or more of:
    - a fast motion;
      
      a static;
      
      a talking head;
      
      a text;
      
      a scroll credits;
      
      a title scene;
      
      a mostly black images;
      
      ora short scenes.
  - 14. The method for encoding a video stream as recited in claim 10, wherein a first image frame is temporally contiguous to a second image frame when the first image frame has at least one adjacent position to the second image frame in the input video stream'"'"'s timeline.
  - 15. The method for encoding a video stream as recited in claim 10, wherein the predetermined encoder parameters includes one or more of:
    - a motion estimation range search;
      
      a deblocking amount factor;
      
      a quantizer;
      
      ora reference frame numbers.

16. A video encoding apparatus for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by the video encoder to encode any given scene type, the apparatus comprising:
- an input module for receiving an input video stream;
  
  a video processing module to divide the video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein the video processing module determines a given scene boundary according to the relatedness of two temporally contiguous image frames in the input video stream, wherein said determining comprises;
  
  scaling one or more high frequency elements of each image frame, wherein a transform coder converts pixel data in the image frames into frequency coefficients,removing the one or more high frequency elements of each image frame based on the converted frequency coefficients,analyzing the image frames to determine a difference between temporally contiguous image frames, wherein a score is computed based on the difference, andidentifying a degree of unrelatedness between the image frames when the score exceeds a preset limit, wherein the preset limit score is at a threshold where a scene change occurs;
  
  the video processing module to determine a scene type for each of the plurality of scenes; and
  
  a video encoding module to encode each of the plurality of scenes according to the scene type.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
- - 17. The video encoding apparatus as recited in claim 16, wherein the video processing module determines each scene type based on one or more criteria, the one or more criteria including:
    - a given scene'"'"'s position on the input video stream'"'"'s timeline;
      
      a length of the given scene;
      
      a motion estimation in the given scene;
      
      a effective difference in the given scene from a previous scene;
      
      a spectral data size of the given scene;
      
      a optical character recognition in the given scene;
      
      ora screenplay structure information of the given scene.
  - 18. The video encoding apparatus as recited in claim 17, wherein the screenplay structure information utilized by the video encoding apparatus includes a relative attention parameter, wherein the relative attention parameter approximates a predetermined estimation of a relative amount of viewer attention to be expected for a segment of the input video stream that comprises the given scene.
  - 19. The video encoding apparatus as recited in claim 17, wherein the screenplay structure information utilized by the video encoding apparatus further includes one or more of:
    - a time range definition;
      
      a textual information from the given scene;
      
      a audio content associated with the given scene;
      
      a close captioning information associated with the given scene;
      
      ora meta data associated with the given scene.
  - 20. The video encoding apparatus as recited in claim 17, wherein the video processing module utilizes facial recognition to determine scene type.
  - 21. The video encoding apparatus as recited in claim 16, wherein a given scene type assigned by the video processing module includes one or more of:
    - a fast motion;
      
      a static;
      
      a talking head;
      
      a text;
      
      a mostly black images;
      
      a short scenes;
      
      a scroll credits;
      
      a title scene;
      
      a miscellaneous;
      
      ora default.
  - 22. The video encoding apparatus as recited in claim 16, wherein the video processing module further comprises:
    - determining that a first image frame is temporally contiguous to a second image frame when the first image frame has at least one adjacent position to the second image frame in the input video stream'"'"'s timeline.
  - 23. The video encoding apparatus as recited in claim 16, wherein the video processing module utilizes one of a recursive filter or an adaptive filter to track the differences.
  - 24. The video encoding apparatus as recited in claim 16, wherein the predetermined encoder parameters utilized by video encoding module includes one or more of:
    - a motion estimation range search;
      
      a quantizer;
      
      ora reference frame numbers.

25. A video encoding apparatus for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by the video encoder to encode any given scene type, the apparatus comprising:
- receiving means for receiving an input video stream;
  
  dividing means for dividing the input video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein the dividing means determines a given scene boundary according to the relatedness of two temporally contiguous image frames in the input video stream;
  
  determining means for determining a scene type for each of the plurality of scenes based on an assessment in a predetermined scale computed by performing a sequential decision-making waterfall process, wherein the performing of the sequential decision-making waterfall process comprises;
  
  determining a position of a given scene on a timeline of the input video stream to assign a score based on a predetermined scale according to the position;
  
  determining a play-time length of the given scene to assign a score based on a predetermined scale according to the play-time length;
  
  determining a motion estimation in the given scene to assign a score based on a predetermined scale according to a magnitude of a motion vector, wherein the motion estimation is a measure of the magnitude of the motion vector;
  
  determining a difference in the given scene from a previous scene to assign a score based on a predetermined scale according to the difference;
  
  determining a spectral data size of the given scene to assign a score based on a predetermined scale according to the spectral data size;
  
  identifying facial structures utilizing facial recognition to assign a score based on a predetermined scale according to a number of the facial structures;
  
  identifying textual information using optical character recognition in the given scene to assign a score based on a predetermined scale according to an amount of content of the textual information; and
  
  determining a level of audience interest from screenplay structure information of the given scene to assign a score based on a predetermined scale according to the level of audience interest; and
  
  encoding means for encoding each of the plurality of scenes based on the given scene'"'"'s previously determined encoder parameters that were determined according to the scene type associated with each of the plurality of scenes.

26. A method for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by a video encoder to encode any given scene type, the method comprising:
- receiving an input video stream;
  
  dividing the input video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein a given scene boundary is determined according to a screenplay structure information of the input video stream, wherein the screenplay structure information includes information of story line organization of the input video stream;
  
  determining scene type for each of the plurality of scenes based on an assessment in a predetermined scale computed by performing a sequential decision-making waterfall process; and
  
  encoding each of the plurality of scenes according to the scene type,wherein the performing of the sequential decision-making waterfall process comprises;
  
  determining a position of a given scene on a timeline of the input video stream to assign a score based on a predetermined scale according to the position;
  
  determining a play-time length of the given scene to assign a score based on a predetermined scale according to the play-time length;
  
  determining a motion estimation in the given scene to assign a score based on a predetermined scale according to a magnitude of a motion vector, wherein the motion estimation is a measure of the magnitude of the motion vector;
  
  determining a difference in the given scene from a previous scene to assign a score based on a predetermined scale according to the difference;
  
  determining a spectral data size of the given scene to assign a score based on a predetermined scale according to the spectral data size;
  
  identifying facial structures utilizing facial recognition to assign a score based on a predetermined scale according to a number of the facial structures;
  
  identifying textual information using optical character recognition in the given scene to assign a score based on a predetermined scale according to an amount of content of the textual information; and
  
  determining a level of audience interest from screenplay structure information of the given scene to assign a score based on a predetermined scale according to the level of audience interest.
- View Dependent Claims (27, 28, 29, 30)
- - 27. The method for encoding a video stream as recited in claim 26, further comprising:
    - determining that a first image frame is temporally contiguous to a second image frame when the first image frame has at least one adjacent position to the second image frame in the input video stream'"'"'s timeline.
  - 28. The method of claim 26, wherein the screenplay structure information includes a relative attention parameter, wherein the relative attention parameter approximates a predetermined estimation of a relative amount of viewer attention to be expected for each of a plurality of video segments of the input video stream, wherein each of the plurality of video segments could comprise a plurality of scenes.
  - 29. The method of claim 26, wherein the screenplay structure information further includes one or more of:
    - a time range definition;
      
      a textual information from the given scene;
      
      an audio content associated with the given scene;
      
      ora close captioning information associated with the given scene.
  - 30. The method for encoding a video stream as recited in claim 26, wherein a given scene type includes one or more of:
    - an action scene;
      
      an opening scene;
      
      a head-shot scene;
      
      or a dialog scene.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Eye Lo, LLC
Original Assignee
Eye Lo, LLC
Inventors
Guerrero, Rodolfo Vargas
Primary Examiner(s)
Diep, Nhon
Assistant Examiner(s)
BRUMFIELD, SHANIKA M

Application Number

US13/359,435
Publication Number

US 20120195370A1
Time in Patent Office

1,825 Days
Field of Search

37524001-2402
US Class Current

1/1
CPC Class Codes

H04N 19/107   between spatial and tempora...

H04N 19/109   among a plurality of tempor...

H04N 19/117   Filters, e.g. for pre-proce...

H04N 19/124   Quantisation

H04N 19/142   Detection of scene cut or s...

H04N 19/179   the unit being a scene or a...

H04N 19/196   being specially adapted for...

H04N 19/86   involving reduction of codi...

H04N 19/87   involving scene cut or scen...

Encoding of video stream based on scene type

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

44 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Encoding of video stream based on scene type

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

44 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links