Visual processing using temporal and spatial interpolation

US 10,547,858 B2
Filed: 08/17/2017
Issued: 01/28/2020
Est. Priority Date: 02/19/2015
Status: Active Grant

First Claim

Patent Images

1. A method for upscaling at least a section of low-resolution video data using a convolutional neural network (CNN), the method comprising the steps of:

receiving at least three consecutive frames of low-resolution video data;

inputting the at least three consecutive frames of low-resolution video data into an initial layer of the CNN;

extracting, using a plurality of hidden convolutional layers of the CNN, low-resolution features from the at least three consecutive frames of low-resolution video data; and

enhancing, using a hidden convolutional layer of the CNN, the extracted low-resolution features from the three or more consecutive frames of low-resolution video data to generate a higher-resolution target section of video data corresponding to a middle frame of the at least three consecutive frames of low-resolution video data,wherein the CNN is trained on training data including ground truth sections of video data with corresponding sequences of three or more consecutive frames of sub-sampled video data to reproduce ground truth sections of video data from the corresponding frames of sub-sampled video data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for enhancing at least a section of lower-quality visual data using a hierarchical algorithm, the method comprising receiving at least a plurality of neighbouring sections of lower-quality visual data. A plurality of input sections from the received plurality of neighbouring sections of lower quality visual data are selected and features are extracted from those plurality of input sections of lower-quality visual data. A target section based on the extracted features from the plurality of input sections of lower-quality visual data is then enhanced.

Citations

40 Claims

1. A method for upscaling at least a section of low-resolution video data using a convolutional neural network (CNN), the method comprising the steps of:
- receiving at least three consecutive frames of low-resolution video data;
  
  inputting the at least three consecutive frames of low-resolution video data into an initial layer of the CNN;
  
  extracting, using a plurality of hidden convolutional layers of the CNN, low-resolution features from the at least three consecutive frames of low-resolution video data; and
  
  enhancing, using a hidden convolutional layer of the CNN, the extracted low-resolution features from the three or more consecutive frames of low-resolution video data to generate a higher-resolution target section of video data corresponding to a middle frame of the at least three consecutive frames of low-resolution video data,wherein the CNN is trained on training data including ground truth sections of video data with corresponding sequences of three or more consecutive frames of sub-sampled video data to reproduce ground truth sections of video data from the corresponding frames of sub-sampled video data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15)
- - 2. The method according to claim 1, wherein the higher-resolution target section of video data corresponds to a single frame of the low-resolution video data.
  - 3. The method according to claim 1, wherein the higher-resolution target section of video data does not correspond to one of the received frames of low-resolution video data.
  - 4. The method according to claim 1, wherein at least one of the frames of low-resolution video data occurs sequentially before the target section.
  - 5. The method according to claim 1, wherein at least one of the frames of low-resolution video data occurs sequentially after the target section.
  - 6. The method according to claim 1, wherein the CNN includes a plurality of layers.
  - 7. The method according to claim 6, wherein the plurality of layers includes one or more of a sequential layer, a recurrent layer, a recursive layer, a branching layer, or a merging layer.
  - 8. The method according to claim 1, further comprising using a predetermined selection criterion to determine the hierarchical CNN to be used from a library of CNNs.
  - 9. The method according to claim 8, wherein the predetermined selection criterion is based on generating the higher-resolution target section of video data the low-resolution video data, based on a measure of at least one of:
    - an error rate;
      
      a bit error rate;
      
      a peak signal-to-noise ratio;
      
      or a structural similarity index.
  - 10. The method according to claim 1, further comprising selecting the CNN from a library of CNNs based on standardised features extracted from the received frames of low-resolution video data.
  - 11. The method according to claim 1, further comprising developing the CNN using a learned approach that includes machine learning techniques.
  - 12. The method according to claim 1, further comprising performing image enhancement, using super-resolution techniques, with the CNN.
  - 13. The method according to claim 1, wherein the lower-quality visual data contains a higher amount of artefacts than the higher-quality visual data.
  - 15. The method of claim 1, wherein the at least three consecutive frames of low-resolution video data include multiple repetitions of a first frame or a last frame of video data.

14. A computer program product embodied on a non-transitory storage medium and comprising instructions that, when executed, cause a system to upscale at least a section of low-resolution video data using a CNN, by performing the steps of:
- receiving at least three consecutive frames of low-resolution video data;
  
  inputting the at least three consecutive frames of low-resolution video data into an initial layer of the CNN;
  
  extracting, using a plurality of hidden convolutional layers of the CNN, low-resolution features from the at least three consecutive frames of low-resolution video data; and
  
  enhancing, using a hidden convolutional layer of the CNN, the extracted low-resolution features from the three or more consecutive frames of low-resolution video data to generate a higher-resolution target section of video data corresponding to a middle frame of the at least three consecutive frames of low-resolution video data,wherein the CNN is trained on training data including ground truth sections of video data with corresponding sequences of three or more consecutive frames of sub-sampled video data to reproduce ground truth sections of video data from the corresponding frames of sub-sampled video data.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 16. The computer program product of claim 14, wherein the higher-resolution target section of video data corresponds to a single frame of the low-resolution video data.
  - 17. The computer program product of claim 14, wherein the higher-resolution target section of video data does not correspond to one of the received frames of low-resolution video data.
  - 18. The computer program product of claim 14, wherein at least one of the frames of low-resolution video data occurs sequentially before the target section.
  - 19. The computer program product of claim 14, wherein at least one of the frames of low-resolution video data occurs sequentially after the target section.
  - 20. The computer program product of claim 14, wherein the CNN includes a plurality of layers.
  - 21. The computer program product of claim 20, wherein the plurality of layers includes one or more of a sequential layer, a recurrent layer, a recursive layer, a branching layer, or a merging layer.
  - 22. The computer program product of claim 14, further comprising instructions that, when executed, cause the system to use a predetermined selection criterion to determine the CNN to be used from a library of CNNs.
  - 23. The computer program product of claim 22, wherein the predetermined selection criterion is based on generating the higher-resolution target section of video data the low-resolution video data, based on a measure of at least one of:
    - an error rate;
      
      a bit error rate;
      
      a peak signal-to-noise ratio;
      
      or a structural similarity index.
  - 24. The computer program product of claim 14, further comprising instructions that, when executed, cause the system to select the CNN from a library of CNNs based on standardized features extracted from the received frames of low-resolution video data.
  - 25. The computer program product of claim 14, further comprising instructions that, when executed, cause the system to develop the CNN using a learned approach that includes machine learning techniques.
  - 26. The computer program product of claim 14, further comprising instructions that, when executed, cause the system to perform image enhancement, using super-resolution techniques, with the CNN.
  - 27. The computer program product of claim 14, wherein the lower-quality visual data contains a higher amount of artefacts than the higher-quality visual data.

28. A system for upscaling at least a section of low-resolution video data using a CNN, the system comprising:
- a least one processor;
  
  memory storing instructions executable that, when executed the at least one processor, cause the system to;
  
  receive at least three consecutive frames of low-resolution video data;
  
  inputting the at least three consecutive frames of low-resolution video data into an initial layer of the CNN;
  
  extract, using a plurality of hidden convolutional layers of the CNN, low-resolution features from the at least three consecutive frames of low-resolution video data; and
  
  enhance, using a hidden convolutional layer of the CNN, the extracted low-resolution features from the three or more consecutive frames of low-resolution video data to generate a higher-resolution target section of video data corresponding to a middle frame of the at least three consecutive frames of low-resolution video data,wherein the CNN is trained on training data including ground truth sections of video data with corresponding sequences of three or more consecutive frames of sub-sampled video data to reproduce ground truth sections of video data from the corresponding frames of sub-sampled video data.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
- - 29. The system of claim 28, wherein the higher-resolution target section of video data corresponds to a single frame of the low-resolution video data.
  - 30. The system of claim 28, wherein the higher-resolution target section of video data does not correspond to one of the received frames of low-resolution video data.
  - 31. The system of claim 28, wherein at least one of the frames of low-resolution video data occurs sequentially before the target section.
  - 32. The system of claim 28, wherein at least one of the frames of low-resolution video data occurs sequentially after the target section.
  - 33. The system of claim 28, wherein the CNN includes a plurality of layers.
  - 34. The system of claim 33, wherein the plurality of layers includes one or more of a sequential layer, a recurrent layer, a recursive layer, a branching layer, or a merging layer.
  - 35. The system of claim 28, wherein the instructions, when executed by the at least one processor, further cause the system to use a predetermined selection criterion to determine the CNN to be used from a library of CNNs.
  - 36. The system of claim 35, wherein the predetermined selection criterion is based on generating the higher-resolution target section of video data the low-resolution video data, based on a measure of at least one of:
    - an error rate;
      
      a bit error rate;
      
      a peak signal-to-noise ratio;
      
      or a structural similarity index.
  - 37. The system of claim 28, wherein the instructions, when executed by the at least one processor, further cause the system to select the CNN from a library of CNNs based on standardized features extracted from the received frames of low-resolution video data.
  - 38. The system of claim 28, wherein the instructions, when executed by the at least one processor, further cause the system to develop the CNN using a learned approach that includes machine learning techniques.
  - 39. The system of claim 28, wherein the instructions, when executed by the at least one processor, further cause the system to perform image enhancement, using super-resolution techniques, with the CNN.
  - 40. The system of claim 28, wherein the lower-quality visual data contains a higher amount of artefacts than the higher-quality visual data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Magic Pony Technology Limited (X Holdings Corp.)
Original Assignee
Magic Pony Technology Limited (X Holdings Corp.)
Inventors
Wang, Zehan, Bishop, Robert David, Shi, Wenzhe, Caballero, Jose, Aitken, Andrew Peter, Totz, Johannes
Primary Examiner(s)
Williams, Jeffery A

Application Number

US15/679,660
Publication Number

US 20170347110A1
Time in Patent Office

894 Days
Field of Search

37524001
US Class Current
CPC Class Codes

G06F 18/22   Matching criteria, e.g. pro...

G06N 3/04   Architecture, e.g. intercon...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/049   Temporal neural networks, e...

G06N 3/08   Learning methods

G06T 2207/10016   Video; Image sequence

G06T 2207/20081   Training; Learning

G06T 2207/20084   Artificial neural networks ...

G06T 3/40   Scaling of whole images or ...

G06T 3/4007   based on interpolation, e.g...

G06T 3/4046   using neural networks

G06T 3/4053   based on super-resolution, ...

G06T 5/00   Image enhancement or restor...

G06T 5/60   using machine learning, e.g...

G06T 5/70   Denoising; Smoothing

G06T 7/11   Region-based segmentation

H04N 19/117   Filters, e.g. for pre-proce...

H04N 19/142   Detection of scene cut or s...

H04N 19/154   Measured or subjectively es...

H04N 19/172 : the region being a picture,...

H04N 19/176 : the region being a block, e...

H04N 19/177 : the unit being a group of p...

H04N 19/31 : in the temporal domain

H04N 19/33 : in the spatial domain

H04N 19/36 : Scalability techniques invo...

H04N 19/46 : Embedding additional inform...

H04N 19/463 : by compressing encoding par...

H04N 19/59 : involving spatial sub-sampl...

H04N 19/63 : using sub-band based transf...

H04N 19/80 : Details of filtering operat...

H04N 19/86 : involving reduction of codi...

H04N 19/87 : involving scene cut or scen...

H04N 7/0117 : involving conversion of the...

View All

Visual processing using temporal and spatial interpolation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

40 Claims

Specification

Solutions

Use Cases

Quick Links

Visual processing using temporal and spatial interpolation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

40 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links