Semantics-based motion estimation for multi-view video coding

US 7,778,328 B2
Filed: 03/31/2004
Issued: 08/17/2010
Est. Priority Date: 08/07/2003
Status: Active Grant

First Claim

Patent Images

1. A difference vector estimation method comprising:

identifying, by a computer, one or more pixels in a first frame of a multi-view video sequence;

constraining a search range associated with a second frame of said multi-view video sequence to a first area vertically centered on an epipolar line in the second frame, wherein said epipolar line corresponds to said one or more pixels in the first frame, the first area is defined by a having a vertical height specified by a first correlation between efficient compression and semantic accuracy received by the computer from a user, wherein said vertical height increases if the first correlation is weighted toward efficient compression and said vertical height decreases if the first correlation is weighted toward semantic accuracy, wherein semantic accuracy relies on use of geometric configurations of cameras capturing the multi-view video sequence, wherein the vertical direction is defined as the direction perpendicular to said epipolar line, and wherein said search range is further constrained using a disparity vector computed for said one or more pixels in the first frame and wherein the constrained search range is repositioned relative to said epipolar line using said disparity vector in addition to constraining said vertical height using the first correlation;

searching the second frame within said constrained search range for a match of said one or more pixels identified in the first frame for subsequent use in computing a difference vector for said one or more pixels in the first frame, said difference vector to be transmitted as part of a compressed representation of the first frame;

receiving a second correlation between efficient compression and semantic accuracy from the user; and

searching a third frame within a search range constrained by a second correlation between efficient compression and semantic accuracy, the second correlation specified by the user and a value of the second correlation is different from a value of the first correlation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A motion estimation method and apparatus for video coding of a multi-view sequence is described. In one embodiment, a motion estimation method includes identifying one or more pixels in a first frame of a multi-view video sequence, and constraining a search range associated with a second frame of the multi-view video sequence based on an indication of a desired correlation between efficient coding and semantic accuracy. The semantic accuracy relies on use of geometric configurations of cameras capturing the multi-view video sequence. The method further includes searching the second frame within the constrained search range for a match of the pixels identified in the first frame.

20 Citations

View as Search Results

27 Claims

1. A difference vector estimation method comprising:
- identifying, by a computer, one or more pixels in a first frame of a multi-view video sequence;
  
  constraining a search range associated with a second frame of said multi-view video sequence to a first area vertically centered on an epipolar line in the second frame, wherein said epipolar line corresponds to said one or more pixels in the first frame, the first area is defined by a having a vertical height specified by a first correlation between efficient compression and semantic accuracy received by the computer from a user, wherein said vertical height increases if the first correlation is weighted toward efficient compression and said vertical height decreases if the first correlation is weighted toward semantic accuracy, wherein semantic accuracy relies on use of geometric configurations of cameras capturing the multi-view video sequence, wherein the vertical direction is defined as the direction perpendicular to said epipolar line, and wherein said search range is further constrained using a disparity vector computed for said one or more pixels in the first frame and wherein the constrained search range is repositioned relative to said epipolar line using said disparity vector in addition to constraining said vertical height using the first correlation;
  
  searching the second frame within said constrained search range for a match of said one or more pixels identified in the first frame for subsequent use in computing a difference vector for said one or more pixels in the first frame, said difference vector to be transmitted as part of a compressed representation of the first frame;
  
  receiving a second correlation between efficient compression and semantic accuracy from the user; and
  
  searching a third frame within a search range constrained by a second correlation between efficient compression and semantic accuracy, the second correlation specified by the user and a value of the second correlation is different from a value of the first correlation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein the position of the epipolar line depends on the geometric configurations of the cameras.
  - 3. The method of claim 1 wherein the one or more pixels in the first frame represent a block.
  - 4. The method of claim 1 further comprising:
    - computing the epipolar line in the second frame.
  - 5. The method of claim 4 wherein the epipolar line is computed using a fundamental matrix.
  - 6. The method of claim 1 wherein constraining the search range comprises:
    - determining parameters of a window covering an initial seed and the epipolar line based on the first correlation between efficient compression and semantic accuracy.
  - 7. The method of claim 1 further comprising:
    - communicating to a user a user interface facilitating user input of the first correlation between efficient compression and semantic accuracy.
  - 8. The method of claim 7 wherein the user interface provides a slider to enable the user to specify the first correlation between efficient compression and semantic accuracy.
  - 9. The method of claim 7 wherein the user interface allows the user to modify a previously specified correlation between efficient compression and semantic accuracy at any time.

10. A non-transitory computer readable memory medium that provides computer program instructions, which when executed on a computer processor cause the processor to perform operations comprising:
- identifying one or more pixels in a first frame of a multi-view video sequence;
  
  constraining a search range associated with a second frame of the multi-view video sequence to a first area vertically centered on an epipolar line in the second frame, wherein said epipolar line corresponds to the one or more pixels in the first frame, the first area is defined by having a vertical height specified by a first correlation between efficient compression and semantic accuracy received from a user, wherein said vertical height increases if the first correlation is weighted toward efficient compression and said vertical height decreases if the first correlation is weighted toward semantic accuracy, wherein semantic accuracy relies on use of geometric configurations of cameras capturing the multi-view video sequence, wherein the vertical dimension is defined as the direction perpendicular to said epipolar line and wherein said search range is further constrained using a disparity vector computed for said one or more pixels of the first frame and wherein said constrained search range is repositioned relative to the said epipolar line using said disparity vector in addition to constraining said vertical height using the first correlation;
  
  searching the second frame within said constrained search range for a match of said one or more pixels identified in the first frame for subsequent use in computing a difference vector for the one or more pixels, said difference vector to be transmitted as part of a compressed representation of the first frame;
  
  receiving a second correlation between efficient compression and semantic accuracy from the user; and
  
  searching a third frame within a search range constrained by a second correlation between efficient compression and semantic accuracy, the second correlation specified by the user and a value of the second correlation is different from a value of the first correlation.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The computer readable memory medium of claim 10 wherein the position of the epipolar line depends on the geometric configurations of the cameras.
  - 12. The computer readable memory medium of claim 10 wherein the one or more pixels in the first frame represent a block.
  - 13. The computer readable memory medium of claim 10 wherein the operations further comprise:
    - computing the epipolar line in the second frame.
  - 14. The computer readable memory medium of claim 13 wherein the epipolar line is computed using a fundamental matrix.
  - 15. The computer readable memory medium of claim 10 wherein constraining the search range comprises:
    - determining parameters of a window covering an initial seed and the epipolar line based on the first correlation between efficient compression and semantic accuracy.
  - 16. The computer readable memory medium of claim 10 wherein the operations further comprise:
    - communicating to a user a user interface facilitating user input of the first correlation between efficient compression and semantic accuracy.

17. A computerized system comprising:
- a memory; and
  
  at least one processor coupled to the memory, the at least one processor executing a set of instructions which cause the at least one processor to identify one or more pixels in a first frame of a multi-view video sequence, constrain a search range associated with a second frame of the multi-view video sequence to a first area vertically centered on an epipolar line in the second frame, wherein said epipolar line corresponds to said one or more pixels in the first frame, the first area is defined by having a vertical height specified by a first correlation between efficient compression and semantic accuracy received from a user, wherein said vertical height increases if the first correlation is weighted toward efficient compression and said vertical height decreases if the first correlation is weighted toward semantic accuracy, wherein semantic accuracy relies on use of geometric configurations of cameras capturing the multi-view video sequence, wherein the vertical dimension is defined as the direction perpendicular to said epipolar line, and wherein said search range is further constrained using a disparity vector computed for said one or more pixels in the first frame and wherein said constrained search range is repositioned relative to said epipolar line using said disparity vector in addition to constraining said vertical height using the first correlation,search the second frame within said constrained search range for a match of said one or more pixels identified in the first frame for subsequent use in computing a difference vector for said one or more pixels in the first frame, said difference vector to be transmitted as part of a compressed representation of the first frame,receive a second correlation between efficient and semantic accuracy from the user, andsearch a third frame within a search range constrained by a second correlation between efficient compression and semantic accuracy, the second correlation specified by the user and the second correlation different from the first correlation.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The system of claim 17 wherein the position of the epipolar line depends on the geometric configurations of the cameras.
  - 19. The system of claim 17 wherein the one or more pixels in the first frame represent a block.
  - 20. The system of claim 17 wherein the processor is to constrain the search range by determining parameters of a window covering an initial seed and the epipolar line based on the first correlation between efficient compression and semantic accuracy.
  - 21. The system of claim 17 wherein the processor is further to communicate to the user a user interface facilitating user input of the first correlation between efficient compression and semantic accuracy.

22. A difference vector estimation apparatus comprising:
- a block identifier to identify one or more pixels in a first frame of a multi-view video sequence;
  
  a search range determinator to constrain a search range associated with a second frame of the multi-view video sequence to a first area vertically centered on an epipolar line in the second frame, wherein said epipolar line corresponds to said one or more pixels in the first frame, the first area is defined by having a vertical height specified by a first correlation between efficient compression and semantic accuracy received from a user, wherein said vertical height increases if the first correlation is weighted toward efficient compression and said vertical height decreases if the first correlation is weighted toward semantic accuracy, wherein semantic accuracy relies on use of geometric configurations of cameras capturing the multi-view video sequence, wherein the vertical direction is defined as the direction perpendicular to the epipolar line wherein said search range determinator is configured to further constrain the search range using a disparity vector computed for said one or more pixels in the first frame and wherein said constrained search range is repositioned relative said epipolar line using said disparity vector in addition to constraining said vertical height using the first correlation; and
  
  a searcher to search the second image within said constrained search range for a match of said one or more pixels identified in the first frame for use by a difference vector calculator to compute a difference vector for the one or more pixels, said difference vector to be transmitted as part of a compressed representation of the first frame, and to search a third image within a search range constrained by a second correlation between efficient compression and semantic accuracy, the second correlation received from the user and different from the first correlation.
- View Dependent Claims (23, 24, 25, 26, 27)
- - 23. The apparatus of claim 22 wherein the position of the epipolar line depends on the geometric configurations of the cameras.
  - 24. The apparatus of claim 22 wherein the one or more pixels in the first frame represent a block.
  - 25. The apparatus of claim 22 wherein the search range determinator is further to compute the epipolar line in the second frame.
  - 26. The apparatus of claim 22 wherein the search range determinator is to constrain the search range by determining parameters of a window covering an initial seed and the epipolar line based on the first correlation between efficient compression and semantic accuracy.
  - 27. The apparatus of claim 22 wherein the search range determinator is further to communicate to the user a user interface facilitating user input of the first correlation between efficient compression and semantic accuracy.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Inventors
Vedula, Sundar, Puri, Rohit, Tabatabai, Ali J.
Primary Examiner(s)
Dastouri; Mehrdad
Assistant Examiner(s)
HALLENBECK-HUBER, JEREMIAH CHARLES

Application Number

US10/816,051
Publication Number

US 20050031035A1
Time in Patent Office

2,330 Days
Field of Search

375/240.12, 348/48, 382/154
US Class Current

375/240.12
CPC Class Codes

H04N 19/57   Motion estimation character...

H04N 19/597   specially adapted for multi...

H04N 19/61   in combination with predict...

Semantics-based motion estimation for multi-view video coding

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

27 Claims

Specification

Use Cases

Quick Links

Others

Semantics-based motion estimation for multi-view video coding

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

27 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others