Video/audio signal processing method and video-audio signal processing apparatus

US 7,356,082 B1
Filed: 11/29/1999
Issued: 04/08/2008
Est. Priority Date: 11/29/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A video/audio signal processing method for processing supplied compression-encoded video/audio signals, said method comprising the steps of:

parsing said video/audio signals in a compressed domain of the video/audio signals and extracting therefrom motion vectors of said video/audio signals, DCT-coefficients and macroblock-type;

using said extracted motion vectors, DCT-coefficients and macroblock-type to extract at least one compressed domain feature point representing characteristics of said video/audio signals in a compressed domain of said video/audio signals;

performing motion estimation of the extracted feature points;

tracking the feature points associated with a motion vector through a pre-set number of frames of said video/audio signals; and

calculating and extracting the block signature for the current block of high relevance as selected in a discrete-cosine-transform domain using part or all of DCT-coefficients in a block,wherein said extraction step includes a step of calculating the block relevance metric of all blocks according to said DCT-coefficients in the current frame to determine a block having high relevance as a candidate of the feature point selected as the next feature point based on said motion estimation step,wherein said extraction step includes a step of performing inverse transform of transforming said compressed domain only for the blocks of high relevance selected by said metric calculating step and of performing motion compensation for a prediction coded macroblock or a bidirectionally prediction coded macroblock.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A metadata extraction unit has a feature point selection and motion estimation unit 62 for extracting at least one feature point representing characteristics of the video/audio signals in a compressed domain of the video/audio signals. Thus, reduction of time or cost for processing can be realized and it makes it possible to process effectively.

Citations

47 Claims

1. A video/audio signal processing method for processing supplied compression-encoded video/audio signals, said method comprising the steps of:
- parsing said video/audio signals in a compressed domain of the video/audio signals and extracting therefrom motion vectors of said video/audio signals, DCT-coefficients and macroblock-type;
  
  using said extracted motion vectors, DCT-coefficients and macroblock-type to extract at least one compressed domain feature point representing characteristics of said video/audio signals in a compressed domain of said video/audio signals;
  
  performing motion estimation of the extracted feature points;
  
  tracking the feature points associated with a motion vector through a pre-set number of frames of said video/audio signals; and
  
  calculating and extracting the block signature for the current block of high relevance as selected in a discrete-cosine-transform domain using part or all of DCT-coefficients in a block,wherein said extraction step includes a step of calculating the block relevance metric of all blocks according to said DCT-coefficients in the current frame to determine a block having high relevance as a candidate of the feature point selected as the next feature point based on said motion estimation step,wherein said extraction step includes a step of performing inverse transform of transforming said compressed domain only for the blocks of high relevance selected by said metric calculating step and of performing motion compensation for a prediction coded macroblock or a bidirectionally prediction coded macroblock.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 2. The video/audio processing method according to claim 1, wherein said inverse transform is inverse discrete cosine transform.
  - 3. The video/audio processing method according to claim 2 including calculating a block signature for the current block of high relevance as selected in a pel domain.
  - 4. The video/audio processing method according to claim 2, wherein said block relevance metric calculating step calculates a block relevance metric in the case when the current macro-block is an intra-type macroblock and the reference macroblock is a prediction coded macroblock or a bidirectionally prediction coded macroblock, said block relevance metric being calculated using a relevance measure as found based on the motion vector and the prediction error energy for an associated block by taking into account the reference macroblock.
  - 5. The video/audio processing method according to claim 2 includingsetting the block relevance metric to zero in the case when the current macroblock is a prediction coded macroblock or a bidirectionally prediction coded macroblock;
    - andupdating the list of already tracked feature points from the reference frame.
  - 6. The video/audio processing method according to claim 2 including calculating a block relevance metric in the case when the current macro-block is an intra-coded macroblock and the reference macro-block is also an intra-coded macroblock, said block relevance metric being calculated using a relevance measure as found based on the DCT activity from a block in the current macroblock and on the DCT activity as found by taking into account the reference macroblock.
  - 7. The video/audio processing method according to claim 6, wherein said estimated camera motion is used to facilitate a transcoding process between one compressed video representation into an other compressed video representation.
  - 8. The video/audio processing method according to claim 1, wherein said current frame includes an arbitrarily shaped video object plane.
  - 9. The video/audio processing method according to claim 1 including calculating and extracting a block signature for the current block of high relevance as selected in a discrete cosine transform domain using part or all of individually weighted discrete cosine transform coefficients in a block.
  - 10. The video/audio processing method according to claim 1, wherein said motion estimation step includes a step of calculating an estimated motion vector, the position of a reference block and a search area in a reference frame.
  - 11. The video/audio processing method according to claim 10 including applying inverse transform of transforming said compressed domain to all blocks in an intra-macroblock in a search area of a reference frame.
  - 12. The video/audio processing method according to claim 11, wherein said inverse transform is inverse discrete cosine transform.
  - 13. The video/audio processing method according to claim 12 including performing inverse IDCT and motion compensation on all blocks in a prediction coded macroblock or in a bidirectional prediction coded macroblock in a search area of a reference frame.
  - 14. The video/audio processing method according to claim 10, wherein said motion estimation step and said feature point tracking step includea step of performing motion prediction or feature point tracking in a pel area for all search locations in the reference frame around the predicted motion vector in order to find the best motion vector which depicts the lowest distance of the current block to the reference block in terms of the sum of absolute error, mean square error or any other distance criteria.
  - 15. The video/audio processing method according to claim 14, wherein said motion estimation block performs motion estimation with variable block sizes.
  - 16. The video/audio processing method according to claim 14 including saving as a feature point list a feature point location, a block signature, a motion vector and the block distance for the best block position in a reference frame.
  - 17. The video/audio processing method according to claim 10, wherein said motion estimation block and said feature point tracking step include:
    - a step of performing motion estimation or feature point tracking in a discrete cosine transform domain for all search locations in the reference frame around the predicted motion vector in order to find the best motion vector which depicts the lowest distance of the current block to the reference block in terms of sum of absolute errors, mean square errors or any other distance criteria; and
      
      a step of calculating the block signature in the DCT domain of the block having said best motion vector position.
  - 18. The video/audio processing method according to claim 17 including saving the feature point location, the block signature, motion vector and the block distance for the best block position in a reference frame as a feature point list.
  - 19. The video/audio processing method according to claim 1, wherein the motion vector and the block signature for all relevant current blocks are determined.
  - 20. The video/audio processing method according to claim 1, wherein the video/audio signals are compression-encoded in accordance with MPEG1, MPEG2, MPEG4, DV, MJPEG, ITU-T recommendations H.261 or H.263.
  - 21. The video/audio processing method according to claim 1, wherein the extracted feature points are used along with metadata associated with these feature points for object motion estimation.
  - 22. The video/audio processing method according to claim 1, wherein the extracted feature points are used along with metadata associated with these feature points for estimating the camera motion.
  - 23. The video/audio processing method according to claim 1, wherein the extracted feature points are used along with metadata associated with these feature points for calculating a motion activity model for video.

24. A video/audio signal processing apparatus for processing supplied compression-encoded video/audio signals, comprising:
- means for parsing said video/audio signals in a compressed domain of the video/audio signals to extract therefrom motion vectors of said video/audio signals, DCT-coefficients and macroblock-type;
  
  extraction means for using said extracted motion vectors, DCT-coefficients and macroblock-type to extract at least one compressed domain feature point representing characteristics of said video/audio signals in a compressed domain of said video/audio signals;
  
  means for performing motion estimation of the extracted feature points;
  
  means for tracking the feature points associated with a motion vector through a pre-set number of frames of said video/audio signals; and
  
  calculating and extraction means for calculating and extracting the block signature for the current block of high relevance as selected in a discrete-cosine-transform domain using part or all of DCT-coefficients in a block,wherein said extraction means calculates the block relevance metric of all blocks according to said DCT-coefficients in the current frame to determine a block having high relevance as a candidate of the feature point selected as the next feature point based on said motion estimation step,wherein said extraction means includes means for performing inverse transform of transforming said compressed domain only for the blocks of high relevance selected by said metric calculating means and of performing motion compensation for a prediction coded macroblock or a bidirectionally prediction coded macroblock.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
- - 25. The video/audio processing apparatus according to claim 24, wherein said inverse transform is inverse discrete cosine transform.
  - 26. The video/audio processing apparatus according to claim 25, wherein said extraction means calculates and extracts a block signature for the current block of high relevance as selected in a discrete cosine transform domain using part or all of discrete cosine transform coefficients in a block.
  - 27. The video/audio processing apparatus according to claim 25, wherein said extraction means calculates and extracts a block signature for the current block of high relevance as selected in a discrete cosine transform domain using part or all of individually weighted discrete cosine transform coefficients in a block.
  - 28. The video/audio processing apparatus according to claim 25, wherein said extraction means calculates a block signature for the current block of high relevance as selected in a pel domain.
  - 29. The video/audio processing apparatus according to claim 25, wherein said block relevance metric calculating means calculates a block relevance metric in the case when the current macro-block is an intra-type macroblock and the reference macroblock is a prediction coded macroblock or a bidirectionally prediction coded macroblock, said block relevance metric being calculated using a relevance measure as found based on the motion vector and the prediction error energy for an associated block by taking into account the reference macroblock.
  - 30. The video/audio processing apparatus according to claim 25, wherein said extraction means sets the block relevance metric to zero in the case when the current macroblock is a prediction coded macroblock or a bidirectionally prediction coded macroblock and updates the list of already tracked feature points from the reference frame.
  - 31. The video/audio processing apparatus according to claim 25, wherein said extraction means calculates a block relevance metric in the case when the current macro-block is an intra-coded macroblock and the reference macro-block is also an intra-coded macroblock, said block relevance metric being calculated using a relevance measure as found based on the DCT activity from a block in the current macroblock and on the DCT activity as found by taking into account the reference macroblock.
  - 32. The video/audio processing apparatus according to claim 24, wherein said current frame includes an arbitrarily shaped video object plane.
  - 33. The video/audio processing apparatus according to claim 24, wherein said motion estimation means calculates an estimated motion vector, the position of a reference block and a search area in a reference frame.
  - 34. The video/audio processing apparatus according to claim 33, wherein said motion estimation means applies inverse transform of transforming said compressed domain to all blocks in an intra-macroblock in a search area of a reference frame.
  - 35. The video/audio processing apparatus according to claim 34, wherein said inverse transform is inverse discrete cosine transform.
  - 36. The video/audio processing apparatus according to claim 35, wherein said motion estimation means performs IDCT and motion compensation on all blocks in a prediction coded macroblock or in a bidirectional prediction coded macroblock in a search area of a reference frame.
  - 37. The video/audio processing apparatus according to claim 33, wherein said motion estimation means and said feature point tracking means performs motion prediction or feature point tracking in a pel area for all search locations in the reference frame around the predicted motion vector in order to find the best motion vector which depicts the lowest distance of the current block to the reference block in terms of the sum of absolute error, mean square error or any other distance criteria.
  - 38. The video/audio processing apparatus according to claim 37, wherein said motion estimation block performs motion estimation with variable block sizes.
  - 39. The video/audio processing apparatus according to claim 37, wherein said motion estimation means and said feature point tracking means saves a feature point location, a block signature, a motion vector and the block distance for the best block position in a reference frame as a feature point list.
  - 40. The video/audio processing apparatus according to claim 33, wherein said motion estimation block and said feature point tracking means performs motion estimation or feature point tracking in a discrete cosine transform domain for all search locations in the reference frame around the predicted motion vector in order to find the best motion vector which depicts the lowest distance of the current block to the reference block in terms of sum of absolute errors, mean square errors or any other distance criteria to calculate the block signature in the DCT domain of the block having said best motion vector position.
  - 41. The video/audio processing apparatus according to claim 40, wherein said motion estimation block and said feature point tracking means saves the feature point location, the block signature, motion vector and the block distance for the best block position in a reference frame as a feature point list.
  - 42. The video/audio processing apparatus according to claim 24, wherein the motion vector and the block signature for all relevant current blocks are determined.
  - 43. The video/audio processing apparatus according to claim 24, wherein the video/audio signals are compression-encoded in accordance with MPEG1, MPEG2, MPEG4, DV, MJPEG, ITU-T recommendations H.261 or H.263.
  - 44. The video/audio processing apparatus according to claim 24, wherein the extracted feature points are used along with metadata associated with these feature points for object motion estimation.
  - 45. The video/audio processing apparatus according to claim 24, wherein the extracted feature points are used along with metadata associated with these feature points for estimating the camera motion.
  - 46. The video/audio processing apparatus according to claim 45, wherein said estimated camera motion is used to facilitate a transcoding process between one compressed video representation into an other compressed video representation.
  - 47. The video/audio processing apparatus according to claim 24, wherein the extracted feature points are used along with metadata associated with these feature points for calculating a motion activity model for video.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Kuhn, Peter M.
Primary Examiner(s)
An; Shawn S.

Application Number

US09/890,230
Time in Patent Office

3,053 Days
Field of Search

375/240.16, 375/240.15, 375/240.12, 375/240.14, 375/240.2, 375/240.25, 375/240.26, 375/240.08, 348/699, 348/700, 382/233, 382/235, 382/243, 382/250, 382/238
US Class Current

375/240.16
CPC Class Codes

G06F 16/739   in form of a video summary,...

G06F 16/745   the internal structure of a...

G06F 16/786   using motion, e.g. object m...

G06F 16/7864   using domain-transform feat...

G06T 7/20   Analysis of motion motion e...

G06V 20/40   in video content extracting...

H04N 19/14   Coding unit complexity, e.g...

H04N 19/48   using compressed domain pro...

H04N 19/513   Processing of motion vectors

H04N 19/527   Global motion vector estima...

H04N 19/54   using feature points or meshes

H04N 19/547   Motion estimation performed...

H04N 19/61   in combination with predict...

H04N 19/70   characterised by syntax asp...

H04N 19/87   involving scene cut or scen...

H04N 19/90   using coding techniques not...

H04N 23/6811   based on the image signal

Video/audio signal processing method and video-audio signal processing apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

47 Claims

Specification

Solutions

Use Cases

Quick Links

Video/audio signal processing method and video-audio signal processing apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

47 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links