Encoding of video stream based on scene type
First Claim
1. A method for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by a video encoder to encode any given scene type, the method comprising:
- receiving an input video stream;
dividing the input video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein the dividing comprises determining a given scene boundary according to relatedness of two temporally contiguous image frames in the input video stream, and wherein the determining comprises;
scaling one or more high frequency elements of each image frame;
converting pixel data in the image frames into frequency coefficients;
removing the one or more high frequency elements of each image frame based on the converted frequency coefficients;
analyzing the image frames to determine a difference between temporally contiguous image frames, wherein a score is computed based on the difference; and
identifying a degree of unrelatedness between the image frames when the score exceeds a preset limit, wherein the preset limit score is at a threshold where a scene change occurs;
determining scene type for each of the plurality of scenes; and
encoding each of the plurality of scenes according to the scene type.
1 Assignment
0 Petitions
Accused Products
Abstract
An encoder for encoding a video stream or an image is described herein. The encoder receives an input video stream and outputs an encoded video stream that can be decoded at a decoder to recover, at least approximately, an instance of the input video stream. The encoder encodes a video stream by first identifying scene boundaries and encoding frames between scene boundaries using a set of parameters. For at least two different scene sequences, different sets of parameters are used, providing adaptive, scene-based encoding.
44 Citations
30 Claims
-
1. A method for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by a video encoder to encode any given scene type, the method comprising:
-
receiving an input video stream; dividing the input video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein the dividing comprises determining a given scene boundary according to relatedness of two temporally contiguous image frames in the input video stream, and wherein the determining comprises; scaling one or more high frequency elements of each image frame; converting pixel data in the image frames into frequency coefficients; removing the one or more high frequency elements of each image frame based on the converted frequency coefficients; analyzing the image frames to determine a difference between temporally contiguous image frames, wherein a score is computed based on the difference; and identifying a degree of unrelatedness between the image frames when the score exceeds a preset limit, wherein the preset limit score is at a threshold where a scene change occurs; determining scene type for each of the plurality of scenes; and encoding each of the plurality of scenes according to the scene type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by a video encoder to encode any given scene type, the method comprising:
-
receiving an input video stream; receiving scene boundary information that indicates positions in the input video stream where scene transitions occur, wherein a scene transition is determined based on relatedness of two temporally contiguous image frames in the input video stream; dividing the input video stream into a plurality of scenes based on the scene boundary information, each scene comprising a plurality of temporally contiguous image frames; determining scene type for each of the plurality of scenes based on an assessment in a predetermined scale computed by performing a sequential decision-making waterfall process; and encoding each of the plurality of scenes according to the scene type, the performing of the sequential decision-making waterfall process comprises; determining a position of a given scene on a timeline of the input video stream to assign a score based on a predetermined scale according to the position; determining a play-time length of the given scene to assign a score based on a predetermined scale according to the play-time length; determining a motion estimation in the given scene to assign a score based on a predetermined scale according to a magnitude of a motion vector, wherein the motion estimation is a measure of the magnitude of the motion vector; determining a difference in the given scene from a previous scene to assign a score based on a predetermined scale according to the difference; determining a spectral data size of the given scene to assign a score based on a predetermined scale according to the spectral data size; identifying facial structures utilizing facial recognition to assign a score based on a predetermined scale according to a number of the facial structures; identifying textual information using optical character recognition in the given scene to assign a score based on a predetermined scale according to an amount of content of the textual information; and determining a level of audience interest from screenplay structure information of the given scene to assign a score based on a predetermined scale according to the level of audience interest. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A video encoding apparatus for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by the video encoder to encode any given scene type, the apparatus comprising:
-
an input module for receiving an input video stream; a video processing module to divide the video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein the video processing module determines a given scene boundary according to the relatedness of two temporally contiguous image frames in the input video stream, wherein said determining comprises; scaling one or more high frequency elements of each image frame, wherein a transform coder converts pixel data in the image frames into frequency coefficients, removing the one or more high frequency elements of each image frame based on the converted frequency coefficients, analyzing the image frames to determine a difference between temporally contiguous image frames, wherein a score is computed based on the difference, and identifying a degree of unrelatedness between the image frames when the score exceeds a preset limit, wherein the preset limit score is at a threshold where a scene change occurs; the video processing module to determine a scene type for each of the plurality of scenes; and a video encoding module to encode each of the plurality of scenes according to the scene type. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A video encoding apparatus for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by the video encoder to encode any given scene type, the apparatus comprising:
-
receiving means for receiving an input video stream; dividing means for dividing the input video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein the dividing means determines a given scene boundary according to the relatedness of two temporally contiguous image frames in the input video stream; determining means for determining a scene type for each of the plurality of scenes based on an assessment in a predetermined scale computed by performing a sequential decision-making waterfall process, wherein the performing of the sequential decision-making waterfall process comprises; determining a position of a given scene on a timeline of the input video stream to assign a score based on a predetermined scale according to the position; determining a play-time length of the given scene to assign a score based on a predetermined scale according to the play-time length; determining a motion estimation in the given scene to assign a score based on a predetermined scale according to a magnitude of a motion vector, wherein the motion estimation is a measure of the magnitude of the motion vector; determining a difference in the given scene from a previous scene to assign a score based on a predetermined scale according to the difference; determining a spectral data size of the given scene to assign a score based on a predetermined scale according to the spectral data size; identifying facial structures utilizing facial recognition to assign a score based on a predetermined scale according to a number of the facial structures; identifying textual information using optical character recognition in the given scene to assign a score based on a predetermined scale according to an amount of content of the textual information; and determining a level of audience interest from screenplay structure information of the given scene to assign a score based on a predetermined scale according to the level of audience interest; and encoding means for encoding each of the plurality of scenes based on the given scene'"'"'s previously determined encoder parameters that were determined according to the scene type associated with each of the plurality of scenes.
-
-
26. A method for encoding a video stream using scene types each having a predefined set of one or more of a plurality of encoder parameters used by a video encoder to encode any given scene type, the method comprising:
-
receiving an input video stream; dividing the input video stream into a plurality of scenes based on scene boundaries, each scene comprising a plurality of temporally contiguous image frames, wherein a given scene boundary is determined according to a screenplay structure information of the input video stream, wherein the screenplay structure information includes information of story line organization of the input video stream; determining scene type for each of the plurality of scenes based on an assessment in a predetermined scale computed by performing a sequential decision-making waterfall process; and encoding each of the plurality of scenes according to the scene type, wherein the performing of the sequential decision-making waterfall process comprises; determining a position of a given scene on a timeline of the input video stream to assign a score based on a predetermined scale according to the position; determining a play-time length of the given scene to assign a score based on a predetermined scale according to the play-time length; determining a motion estimation in the given scene to assign a score based on a predetermined scale according to a magnitude of a motion vector, wherein the motion estimation is a measure of the magnitude of the motion vector; determining a difference in the given scene from a previous scene to assign a score based on a predetermined scale according to the difference; determining a spectral data size of the given scene to assign a score based on a predetermined scale according to the spectral data size; identifying facial structures utilizing facial recognition to assign a score based on a predetermined scale according to a number of the facial structures; identifying textual information using optical character recognition in the given scene to assign a score based on a predetermined scale according to an amount of content of the textual information; and determining a level of audience interest from screenplay structure information of the given scene to assign a score based on a predetermined scale according to the level of audience interest. - View Dependent Claims (27, 28, 29, 30)
-
Specification