ENCODING BLOCKS IN VIDEO FRAMES CONTAINING TEXT USING HISTOGRAMS OF GRADIENTS

US 20160014421A1
Filed: 07/14/2014
Published: 01/14/2016
Est. Priority Date: 07/14/2014
Status: Active Grant

First Claim

Patent Images

1. An apparatus, comprising:

a block processing pipeline configured to process blocks of pixels from video frames;

wherein the block processing pipeline comprises a block input component;

wherein, for each of a plurality of blocks of pixels from a video frame, the block input component is configured to;

receive input data representing the block of pixels;

compute gradient values for the block of pixels in two or more directions;

compute one or more histograms representing statistics derived from the gradient values for the block of pixels;

determine a likelihood that the block of pixels represents a portion of the video frame that contains text, wherein to determine the likelihood that the block of pixels represents a portion of the video frame that contains text, the block input component is configured to determine a presence or absence of a dominant gradient direction in the block of pixels, dependent on the one or more computed histograms; and

determine one or more parameter values for encoding the block of pixels, dependent on the likelihood that the block of pixels represents a portion of the video frame that contains text.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A block input component of a video encoding pipeline may, for a block of pixels in a video frame, compute gradients in multiple directions, and may accumulate counts of the computed gradients in one or more histograms. The block input component may analyze the histogram(s) to compute block-level statistics and determine whether a dominant gradient direction exists in the block, indicating the likelihood that it represents an image containing text. If text is likely, various encoding parameter values may be selected to improve the quality of encoding for the block (e.g., by lowering a quantization parameter value). The computed statistics or selected encoding parameter values may be passed to other stages of the pipeline, and used to bias or control selection of a prediction mode, an encoding mode, or a motion vector. Frame-level or slice-level parameter values may be generated from gradient histograms of multiple blocks.

38 Citations

20 Claims

1. An apparatus, comprising:
- a block processing pipeline configured to process blocks of pixels from video frames;
  
  wherein the block processing pipeline comprises a block input component;
  
  wherein, for each of a plurality of blocks of pixels from a video frame, the block input component is configured to;
  
  receive input data representing the block of pixels;
  
  compute gradient values for the block of pixels in two or more directions;
  
  compute one or more histograms representing statistics derived from the gradient values for the block of pixels;
  
  determine a likelihood that the block of pixels represents a portion of the video frame that contains text, wherein to determine the likelihood that the block of pixels represents a portion of the video frame that contains text, the block input component is configured to determine a presence or absence of a dominant gradient direction in the block of pixels, dependent on the one or more computed histograms; and
  
  determine one or more parameter values for encoding the block of pixels, dependent on the likelihood that the block of pixels represents a portion of the video frame that contains text.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The apparatus of claim 1,wherein the one or more parameter values comprise a quantization parameter value;
    - andwherein, in response to a determination that it is likely that the block of pixels represents a portion of the video frame that contains text, the block input component is configured to compute a quantization parameter value for encoding the block of pixels that is lower than a quantization parameter value used for encoding blocks of pixels that do not represent portions of the video frame that contains text.
  - 3. The apparatus of claim 1, wherein the block input component is further configured to pass data representing the gradient values, the one or more histograms, the determined likelihood, or the one or more parameter values usable in encoding the block of pixels to one or more components in a subsequent stage of the block processing pipeline.
  - 4. The apparatus of claim 3,wherein the block processing pipeline further comprises an intra-estimation stage;
    - wherein the data comprises a parameter value indicating a dominant gradient direction in the block of pixels;
      
      wherein to pass the data, the block input component is configured to pass the data to a component of the intra-estimation stage; and
      
      wherein the component of the intra-estimation stage is configured to use the parameter value indicating the dominant gradient direction to bias selection of a prediction mode.
  - 5. The apparatus of claim 4, wherein to use the parameter value indicating the dominant gradient direction to bias selection of a prediction mode, the component of the intra-estimation stage is configured to compute a cost for each of two or more candidate predication modes, wherein the computed cost for each of the two or more candidate predication modes is dependent on the parameter value indicating the dominant gradient direction.
  - 6. The apparatus of claim 3,wherein the block processing pipeline further comprises a mode decision stage that is configured to determine a mode in which the block of pixels is to be encoded dependent, at least in part, on a respective cost of encoding the block of pixels in each of two or more modes;
    - wherein to pass the data, the block input component is configured to pass the data to a component of the mode decision stage; and
      
      wherein the component of the mode decision stage is configured to include the data as an input to bias or control the determination of the mode in which the block of pixels is to be encoded.
  - 7. The apparatus of claim 3,wherein the block processing pipeline further comprises a motion estimation stage that is configured to select a motion vector from among two or more candidate motion vectors;
    - wherein to pass the data, the block input component is configured to pass the data to a component of the motion estimation stage; and
      
      wherein the component of the mode decision stage is configured to include the data as an input to bias or control the selection of the motion vector from among the two or more candidate motion vectors.

8. A method, comprising:
- inputting data representing a block of pixels from a video frame to a video encoding pipeline comprising a plurality of stages, each stage configured to perform at least one operation on blocks of pixels passing through the pipeline; and
  
  performing, by one or more stages of the pipeline;
  
  computing gradient values for the block of pixels in two or more directions;
  
  computing one or more histograms representing statistics derived from the gradient values for the block of pixels;
  
  determining that the block of pixels represents a portion of the video frame that is likely to contain text, wherein said determining comprises determining that there is a dominant gradient direction in the block of pixels, dependent on the one or more computed histograms;
  
  in response to said determining that the block of pixels represents a portion of the video frame that is likely to contain text, determining a quantization parameter value for use in encoding the block of pixels in the video encoding pipeline; and
  
  making the quantization parameter value available to one or more operations of the video encoding pipeline.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 9. The method of claim 8, wherein said determining a quantization parameter value comprises computing a quantization parameter for use in a luma reconstruction operation of the video encoding pipeline that is lower than a quantization parameter used in a luma reconstruction operation performed on a block of pixels that represents a portion of the video frame that does not contain text.
  - 10. The method of claim 8, wherein said determining a quantization parameter value comprises computing a quantization parameter for use in a chroma reconstruction operation of the video encoding pipeline.
  - 11. The method of claim 8, further comprising:
    - determining one or more other parameter values for use in encoding the block of pixels in the video encoding pipeline, dependent on said determining that the block of pixels represents a portion of the video frame that is likely to contain text; and
      
      making the one or more other parameter values available to one or more operations of the video encoding pipeline.
  - 12. The method of claim 8,wherein said computing the gradient values for the block of pixels in two or more directions comprises computing unsigned gradient values for the block of pixels in two or more directions;
    - andwherein said computing one or more histograms comprises computing statistics derived from the unsigned gradient values for the block of pixels in the two or more directions.
  - 13. The method of claim 8,wherein said computing gradient values for the block of pixels in two or more directions comprises computing horizontal gradient values and vertical gradient values for the block of pixels;
    - wherein said computing one or more histograms comprises computing a histogram of the horizontal gradient values and a histogram of the vertical gradient values; and
      
      wherein each bin of the histogram of the horizontal gradient values and each bin of the histogram of the vertical gradient values comprises a count of the computed gradient values having a magnitude in a respective range of gradient magnitude values.
  - 14. The method of claim 8,wherein said computing gradient values for the block of pixels in two or more directions comprises computing horizontal gradient values and vertical gradient values at multiple points within the block of pixels;
    - andwherein said computing one or more histograms comprises computing, dependent on the horizontal gradient values and vertical gradient values for the block of pixels, an angle representing a gradient direction at each of the multiple points within the block of pixels.
  - 15. The method of claim 8,wherein said computing one or more histograms further comprises computing a histogram of the angles representing the gradient directions at each of the multiple points within the block of pixels;
    - andwherein each bin of the histogram of the angles comprises a count of the computed angles that fall within in a respective range of angles.
  - 16. The method of claim 8, further comprising:
    - determining one or more other parameter values for use in encoding the block of pixels;
      
      for each of one or more other blocks of pixels in the video frame or in a slice of the video frame;
      
      computing gradient values for the other block of pixels in two or more directions;
      
      computing one or more other histograms representing statistics derived from the gradient values for the other block of pixels;
      
      determining a likelihood that the other block of pixels represents a portion of the video frame that contains text, dependent on the one or more other histograms; and
      
      determining one or more parameter values for use in encoding the other block of pixels in the video encoding pipeline, dependent on the determined likelihood;
      
      accumulating statistics for the block of pixels and the one or more other blocks of pixels in the video frame or in the slice of the video frame, dependent on the computed gradient values, the computed histograms, the determined likelihood, or the determined parameter values for the block of pixels and the one or more other blocks of pixels; and
      
      computing one or more slice-level or frame-level parameter values for use in encoding the video frame or a subsequent video frame, dependent on the accumulated statistics.
  - 17. The method of claim 8,wherein the method further comprises, prior to said receiving input data representing a block of pixels from a video frame:
    - receiving input data representing a plurality of training blocks of pixels, each representing an image, wherein for each of the plurality of training blocks of pixels, the presence or absence of text in the image is known;
      
      for each of the plurality of training blocks of pixels;
      
      computing gradient values for the training block of pixels in two or more directions; and
      
      computing one or more histograms representing statistics derived from the gradient values for the training block of pixels; and
      
      determining a decision function usable to classify other blocks of pixels in terms of the likelihood that they represent portions of a video frame that contain text, dependent on the computed gradient values for the plurality of training blocks or on the computed histograms for the plurality of training blocks; and
      
      wherein said determining that there is a dominant gradient direction in the block of pixels, dependent on the one or more computed histograms, comprises applying the decision function to the one or more computed histograms.
  - 18. The method of claim 8, where said determining that the block of pixels represents a portion of the video frame that is likely to contain text is further dependent on a measure of variance that was computed for the block of pixels.

19. A device, comprising:
- a memory; and
  
  an apparatus configured to process video frames and to store the processed video frames as frame data to the memory;
  
  wherein the apparatus is configured to;
  
  receive input data representing a block of pixels from a video frame;
  
  compute gradient values for the block of pixels in two or more directions;
  
  compute one or more histograms representing statistics derived from the gradient values for the block of pixels;
  
  store data representing the one or more histograms in a data structure in the memory;
  
  determine a classification parameter value for the block of pixels, wherein the classification parameter value indicates a likelihood that the block of pixels represents a portion of the video frame that contains text, wherein to determine the classification parameter value, the apparatus is configured to determine a presence or absence of a dominant gradient direction in the block of pixels, dependent on the one or more computed histograms;
  
  store the classification parameter value in the data structure in the memory; and
  
  perform an encoding operation for the block of pixels, dependent on the stored data representing the one or more histograms or the stored classification parameter.
- View Dependent Claims (20)
- - 20. The device of claim 19,wherein the apparatus comprises a block processing pipeline;
    - wherein the apparatus is further configured to;
      
      determine one or more parameter values for encoding the block of pixels, dependent on the determined classification parameter value; and
      
      store the one or more parameter values in the data structure; and
      
      wherein to perform the encoding operation for the block of pixels, the apparatus is further configured to;
      
      retrieve the stored data representing the one or more histograms, the stored classification parameter, or the one or more stored parameter values from the data structure in a stage of the block processing pipeline other than a stage of the block processing pipeline in which it was stored in the data structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Cote, Guy, Shi, Xiaojin

Granted Patent

US 9,380,312 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06T 1/20   Processor architectures; Pr...

G06V 20/62   Text, e.g. of license plate...

H04N 19/124   Quantisation

H04N 19/139   Analysis of motion vectors,...

H04N 19/14   Coding unit complexity, e.g...

H04N 19/176   the region being a block, e...

H04N 19/196   being specially adapted for...

H04N 19/42   characterised by implementa...

ENCODING BLOCKS IN VIDEO FRAMES CONTAINING TEXT USING HISTOGRAMS OF GRADIENTS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

38 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

ENCODING BLOCKS IN VIDEO FRAMES CONTAINING TEXT USING HISTOGRAMS OF GRADIENTS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links