Systems and methods for error resilient scheme for low latency H.264 video coding

US 9,124,757 B2
Filed: 10/03/2011
Issued: 09/01/2015
Est. Priority Date: 10/04/2010
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a virtual meeting room (VMR) engine capable of converting and composing in real time a plurality of video conference feeds from a plurality of participants to a composite video and audio stream compatible with each of a plurality of video conference endpoints, wherein the plurality of video conference endpoints are of different types, anda media processing node to support the VMR engine having a video encoder, wherein the video encoder, in operation,encodes and organizes a plurality of picture frames of a video stream at a plurality of temporal layers in a hierarchical P-structure, wherein the organization includes varying a number of layers of the hierarchical P-structure based on a frame rate of the video stream in order to ensure an identical structure length for different video streams of respective different frame rates;

records one or more encoded reference frames of the video stream in a display picture buffer (DPB) associated with the video encoder, wherein each of the reference frames has been encoded by the video encoder;

transmits the plurality of encoded picture frames of the video stream over a network to a video decoder, wherein the video decoder is at one of the plurality of video conference endpoints;

in response to the video decoder providing a negative feedback on one or more frames lost en route from the video encoder to the video decoder, selects one of the reference frames in the DPB that is earlier in time than the one or more lost frames; and

transmits the selected reference frame to the video decoder; and

the video decoder, which in operation,receives the video stream transmitted over the network;

transmits the negative feedback on the one or more lost frames to the video encoder through a back channel mechanism to trigger the selection of the reference frames; and

recovers the one or more lost frames of the plurality of encoded picture frames during decoding of the video stream using a combination of

1) decoding the picture frames of a lower temporal layer in the hierarchical P-structure than a temporal layer of the one or more lost frames and

2) using the selected reference frame as a restarting point for continued decoding of the video stream.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A new approach is proposed that contemplates systems and methods to support error resilient coding of H.264 compatible video streams for low latency/delay multimedia communication applications by utilizing and integrating a plurality of error resilient H.264 encoding/decoding schemes in an efficient manner. These error resilient H.264 encoding/decoding schemes can be used to offer a better quality video even when there is network loss of picture frames in the video stream. It has the ability to recover from such loss and recover faster than other techniques without requiring additional data/frames to be sent over the network to achieve the same level of recovery.

122 Citations

21 Claims

1. A system, comprising:
- a virtual meeting room (VMR) engine capable of converting and composing in real time a plurality of video conference feeds from a plurality of participants to a composite video and audio stream compatible with each of a plurality of video conference endpoints, wherein the plurality of video conference endpoints are of different types, anda media processing node to support the VMR engine having a video encoder, wherein the video encoder, in operation,encodes and organizes a plurality of picture frames of a video stream at a plurality of temporal layers in a hierarchical P-structure, wherein the organization includes varying a number of layers of the hierarchical P-structure based on a frame rate of the video stream in order to ensure an identical structure length for different video streams of respective different frame rates;
  
  records one or more encoded reference frames of the video stream in a display picture buffer (DPB) associated with the video encoder, wherein each of the reference frames has been encoded by the video encoder;
  
  transmits the plurality of encoded picture frames of the video stream over a network to a video decoder, wherein the video decoder is at one of the plurality of video conference endpoints;
  
  in response to the video decoder providing a negative feedback on one or more frames lost en route from the video encoder to the video decoder, selects one of the reference frames in the DPB that is earlier in time than the one or more lost frames; and
  
  transmits the selected reference frame to the video decoder; and
  
  the video decoder, which in operation,receives the video stream transmitted over the network;
  
  transmits the negative feedback on the one or more lost frames to the video encoder through a back channel mechanism to trigger the selection of the reference frames; and
  
  recovers the one or more lost frames of the plurality of encoded picture frames during decoding of the video stream using a combination of
  
  1) decoding the picture frames of a lower temporal layer in the hierarchical P-structure than a temporal layer of the one or more lost frames and
  
  2) using the selected reference frame as a restarting point for continued decoding of the video stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, wherein the video encoder and the video decoder are associated with different media processing nodes at different geographical locations.
  - 3. The system of claim 1, wherein the video encoder encodes the plurality of picture frames of the video stream as P picture frames using backward prediction under H.264 coding.
  - 4. The system of claim 1, wherein the video decoder predicts and decodes an encoded picture frame for a given temporal layer only from a temporal layer equal or lower than the given temporal layer.
  - 5. The system of claim 1, wherein the DPB contains enough pictures to cover a round trip communication delay between the video encoder and the video decoder.
  - 6. The system of claim 1, wherein the video decoder predicts a picture frame using the selected reference frame.
  - 7. The system of claim 1, wherein the video encoder records multiple encoded reference frames of the video stream in the DPB.
  - 8. The system of claim 1, wherein the video encoder transmits the reference frame to the video decoder even when a layer 0 frame in the hierarchical P-structure has been lost.
  - 9. The system of claim 1, wherein the video encoder selects the reference frame in a round robin fashion.
  - 10. The system of claim 1, wherein the video encoder selects the reference frame that is immediately before the one or more lost picture frames indicated by the video decoder.
  - 11. The system of claim 1, wherein the video encoder selects the reference frame based on an indication of a loss of picture frames in the video stream without detailed information on the one or more lost picture frames.

12. A method, comprising:
- encoding and organizing, by a video encoder, a plurality of picture frames of a video stream at a plurality of temporal layers in a hierarchical P-structure, wherein the organizing includes varying a number of layers of the hierarchical P-structure based on a frame rate of the video stream in order to ensure an identical structure length for different video streams of respective different frame rates;
  
  recording, by the video encoder, one or more encoded reference frames of the video stream in a display picture buffer (DPB);
  
  transmitting, by the video encoder, the plurality of encoded picture frames of the video stream over a network to a video decoder, wherein the video decoder is at one of a plurality of video conference endpoints, and the plurality of video conference endpoints are of different types;
  
  accepting, by the video decoder, the video stream transmitted over the network, wherein one or more frames of the plurality of encoded picture frames of the video stream are lost en route from the video encoder to the video decoder;
  
  transmitting, by the video decoder, negative feedback on the one or more lost frames to the video encoder through a back channel mechanism to trigger a selection of the reference frames;
  
  in response to the video decoder providing the negative feedback on the one or more lost frames, selecting, by the video encoder, a reference frame in the DPB that is earlier in time than the one or more lost frames;
  
  transmitting, by the video encoder, the selected reference frame over the network to the video decoder; and
  
  recovering the one or more lost frames of the plurality of encoded picture frames during decoding of the video stream using a combination of
  
  1) decoding the encoded picture frames of a lower temporal layer in the hierarchical P-structure than a temporal layer of the one or more lost frames and
  
  2) using the selected reference frame as a restarting point for continued decoding of the video stream.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The method of claim 12, further comprising encoding the plurality of picture frames of the video stream as P picture frames using backward prediction under H.264 coding.
  - 14. The method of claim 12, further comprising predicting and decoding an encoded picture frame for a given temporal layer only from a temporal layer equal or lower than the given temporal layer.
  - 15. The method of claim 12, further comprising recording enough pictures in the DPB to cover a round trip communication delay between the video encoder and the video decoder.
  - 16. The method of claim 12, further comprising predicting a picture frame using the selected reference frame.
  - 17. The method of claim 12, further comprising recording multiple encoded reference frames of the video stream in the DPB.
  - 18. The method of claim 12, further comprising transmitting the selected reference frame even when a layer 0 frame in the hierarchical P-structure has been lost.
  - 19. The method of claim 12, wherein the reference frame is selected in a round robin fashion.
  - 20. The method of claim 12, wherein the selected reference frame is immediately before the one or more lost picture frames.
  - 21. The method of claim 12, wherein the reference frame is selected based on an indication of a loss of picture frames in the video stream without detailed information on the one or more lost picture frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verizon Patent and Licensing Incorporated (Verizon Communications Inc.)
Original Assignee
Blue Jeans Network, Inc. (Verizon Communications Inc.)
Inventors
Weber, Emmanuel
Primary Examiner(s)
Kelley, Christopher S
Assistant Examiner(s)
Zhou, Zhihan

Application Number

US13/251,913
Publication Number

US 20120082226A1
Time in Patent Office

1,429 Days
Field of Search

375/240.12, 375/240.25, 375/240.21, 375/240.27, 375/240.29, 348/14.08, 348/14.09
US Class Current

1/1
CPC Class Codes

H04N 19/114   Adapting the group of pictu...

H04N 19/166   concerning the amount of tr...

H04N 19/177   the unit being a group of p...

H04N 19/36   Scalability techniques invo...

H04N 19/895   in combination with error c...

H04N 7/141   between two video terminals...

H04N 7/152   Multipoint control units th...

Systems and methods for error resilient scheme for low latency H.264 video coding

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

122 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Systems and methods for error resilient scheme for low latency H.264 video coding

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

122 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others