System and method for providing error resilence, random access and rate control in scalable video communications

US 8,718,137 B2
Filed: 08/12/2011
Issued: 05/06/2014
Est. Priority Date: 03/03/2006
Status: Active Grant

First Claim

Patent Images

1. A video communication system comprising:

a communication network,a conferencing server (including a combination of hardware and software) disposed in the network and linked to at least one receiving and at least one transmitting endpoint by at least one communication channel each over the communication network,at least one endpoint that transmits coded digital video using a scalable video coding format, andat least one receiving endpoint for decoding a digital video signal coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability,wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers,wherein the conferencing server is configured to selectively eliminate or modify portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer, prior to creating the output video signal that is forwarded to the at least one receiving endpoint, so that use of lower spatial or quality layer data is signaled or explicitly coded in the output video signal for use in decoding pictures at resolutions higher than the base spatial or quality layer, andwherein the conferencing server is further configured to control the transmission rate of the output video signal that is forwarded to the at least one receiving endpoint so that the retained portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer do not adversely affect the smoothness of the output bit rate.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for error resilient transmission, rate control, and random access in video communication systems that use scalable video coding are provided. Error resilience is obtained by using information from low resolution layers to conceal or compensate loss of high resolution layer information. The same mechanism is used for rate control by selectively eliminating high resolution layer information from transmitted signals, which elimination can be compensated at the receiver using information from low resolution layers. Further, random access or switching between low and high resolutions is also achieved by using information from low resolution layers to compensate for high resolution spatial layer packets that may have not been received prior to the switching time.

63 Citations

View as Search Results

36 Claims

1. A video communication system comprising:
- a communication network,a conferencing server (including a combination of hardware and software) disposed in the network and linked to at least one receiving and at least one transmitting endpoint by at least one communication channel each over the communication network,at least one endpoint that transmits coded digital video using a scalable video coding format, andat least one receiving endpoint for decoding a digital video signal coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability,wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers,wherein the conferencing server is configured to selectively eliminate or modify portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer, prior to creating the output video signal that is forwarded to the at least one receiving endpoint, so that use of lower spatial or quality layer data is signaled or explicitly coded in the output video signal for use in decoding pictures at resolutions higher than the base spatial or quality layer, andwherein the conferencing server is further configured to control the transmission rate of the output video signal that is forwarded to the at least one receiving endpoint so that the retained portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer do not adversely affect the smoothness of the output bit rate.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein the scalable video coding format is based on hybrid coding such as in H.264, VC-1 or AVS standards, and wherein the lower spatial or quality layer data that is signaled for use or explicitly coded in the output video signal forwarded to the at least one receiving endpoint is comprised of at least one of:
    - motion vector data,coded prediction error difference,intra data, andreference picture indicators,wherein the data is further appropriately scaled to the desired target resolution when explicitly coded in the output video signal that is transmitted to the one or more receiving endpoints.
  - 3. The system of claim 1 wherein the server is further configured to create the output video signal that is forwarded to the at least one receiving endpoint as one of:
    - a Transcoding Multipoint Control Unit using cascaded decoding and encoding;
      
      a Switching Multipoint Control Unit by selecting which input to transmit as output;
      
      a Scalable Video Communication Server using selective multiplexing; and
      
      a Compositing Scalable Video Communication Server using selective multiplexing and bitstream-level compositing.
  - 4. The system of claim 1, wherein an encoder of the at least one transmitting endpoint is configured to encode transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“
    - R”
      
      ) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure and such that the decoder can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with the encoder, and wherein the server selectively eliminates portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer in non-R frames only, prior to creating the output video signal that is forwarded to the at least one receiving endpoint.
  - 5. The system of claim 1, wherein the selective elimination or modification by the conferencing server is performed according to desired output bit rate requirements.
  - 6. The system of claim 1, wherein the at least one receiving endpoint is configured to display the decoded output picture at a desired spatial resolution that falls in between an immediately lower and an immediately higher spatial layer provided by the received coded video signal.
  - 7. The system of claim 6, wherein the at least one receiving endpoint is further configured to operate a decoding loop of the immediately higher spatial layer at the desired spatial resolution by scaling all coded data of the immediately higher spatial layer to the desired spatial resolution, and wherein the resultant drift is eliminated by using at least one of:
    - periodic intra pictures,periodic use of intra base layer mode,full resolution decoding of at least the lowest temporal layer of the immediately higher spatial layer.
  - 8. The system of claim 1, wherein the scalable video coding format is further configured with at least one of:
    - periodic intra pictures;
      
      periodic intra macroblocks; and
      
      threaded picture prediction;
      
      in order to avoid drift when the higher than the base spatial or quality layer'"'"'s coded information that is modified or eliminated corresponds to the base temporal layer.
  - 9. The system of claim 1, wherein the receiving endpoint is further configured to operate at least one decoding loop for spatial or quality layers higher than the target spatial or quality layer for at least the base temporal layer, so that when the at least one receiving endpoint switches target layers it can immediately display decoded pictures at the new target layer resolution.

10. A video communication system comprising:
- a communication network,one endpoint (including a combination of hardware and software) that transmits coded digital video using a scalable video coding format, andat least one receiving endpoint for decoding a digital video signal coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability,wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers,wherein the transmitting endpoint is configured to selectively eliminate or modify portions of its coded video signal that correspond to layers higher than the base spatial or quality layer, prior to creating the output video signal that is forwarded to the at least one receiving endpoint, so that use of lower spatial or quality layer data is signaled or explicitly coded in the output video signal for use in decoding pictures at resolutions higher than the base spatial or quality layer, andwherein the transmitting endpoint is further configured to control the transmission rate of the output video signal that is transmitted to the at least one receiving endpoint so that the retained portions of its input video signal that correspond to layers higher than the base spatial or quality layer do not adversely affect the smoothness of the output bit rate.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, wherein the scalable video coding format is based on hybrid coding such as in H.264, VC-1 or AVS standards, and wherein the lower spatial or quality layer data that is signaled for use or explicitly coded in the output video signal forwarded to the at least one receiving endpoint is comprised of at least one of:
    - motion vector data;
      
      coded prediction error difference;
      
      intra data; and
      
      reference picture indicators,wherein the data is further appropriately scaled to the desired target resolution when explicitly coded in the output video signal that is transmitted to the one or more receiving endpoints.
  - 12. The system of claim 10, wherein the transmitting endpoint is configured to encode transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“
    - R”
      
      ) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure and such that the decoder can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with the encoder, and wherein the transmitting endpoint selectively eliminates portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer in non-R frames only, prior to creating the output video signal that is transmitted to the at least one receiving endpoint.
  - 13. The system of claim 10, wherein the selective elimination or modification by the transmitting endpoint is performed according to desired output bit rate requirements.
  - 14. The system of claim 10, wherein the at least one receiving endpoint is configured to display the decoded output picture at a desired spatial resolution that falls in between an immediately lower and an immediately higher spatial layer provided by the received coded video signal.
  - 15. The system of claim 10, wherein the at least one receiving endpoint is further configured to operate a decoding loop of the immediately higher spatial layer at the desired spatial resolution by scaling all coded data of the immediately higher spatial layer to the desired spatial resolution, and wherein the resultant drift is eliminated by using at least one of:
    - periodic intra pictures,periodic use of intra base layer mode,full resolution decoding of at least the lowest temporal layer of the immediately higher spatial layer.
  - 16. The system of claim 10, wherein the scalable video coding format is further configured with at least one of:
    - periodic intra pictures;
      
      periodic intra macroblocks; and
      
      threaded picture prediction,in order to avoid drift when the higher than the base spatial or quality layer'"'"'s coded information that is modified or eliminated corresponds to the base temporal layer.
  - 17. The system of claim 10, wherein the receiving endpoint is further configured to operate at least one decoding loop for spatial or quality layers higher than the target spatial or quality layer for at least the base temporal layer, so that when the at least one receiving endpoint switches target layers it can immediately display decoded pictures at the new target layer resolution.

18. A method for video communication over a communication network, having a conferencing server disposed therein and linked to at least one receiving and at least one transmitting endpoint by at least one communication channel each over the communication network, the at least one endpoint transmitting coded digital video using a scalable video coding format, and the at least one receiving endpoint for decoding a digital video signal coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability, wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers,the method comprising:
- at the conferencing server (including a combination of hardware and software), selectively eliminating or modifying portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer prior to creating the output video signal that is forwarded to the at least one receiving endpoint, so that use of lower spatial or quality layer data is signaled or explicitly coded in the output video signal for use in decoding pictures at resolutions higher than the base spatial or quality layer, andat the conferencing server, controlling the transmission rate of the output video signal that is forwarded to the at least one receiving endpoint so that the retained portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer do not adversely affect the smoothness of the output bit rate.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
- - 19. The method of claim 18, wherein the scalable video coding format is based on hybrid coding such as in H.264, VC-1 or AVS standards, and wherein the lower spatial or quality layer data that is signaled for use or explicitly coded in the output video signal forwarded to the at least one receiving endpoint is comprised of at least one of:
    - motion vector data,coded prediction error difference,intra data, andreference picture indicators,wherein the data is further appropriately scaled to the desired target resolution when explicitly coded in the output video signal that is transmitted to the one or more receiving endpoints.
  - 20. The method of claim 18, wherein the server is further configured to create the output video signal that is forwarded to the at least one receiving endpoint as one of:
    - a Transcoding Multipoint Control Unit using cascaded decoding and encoding;
      
      a Switching Multipoint Control Unit by selecting which input to transmit as output;
      
      a Scalable Video Communication Server using selective multiplexing; and
      
      a Compositing Scalable Video Communication Server using selective multiplexing and bitstream-level compositing.
  - 21. The method of claim 18, further comprising, at an encoder of the at least one transmitting endpoint, encoding transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“
    - R”
      
      ) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure and such that the decoder can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with the encoder, and wherein the server selectively eliminates or modifies portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer in non-R frames only, prior to creating the output video signal that is forwarded to the at least one receiving endpoint.
  - 22. The method of claim 18, further comprising, at the conferencing server performing the selective elimination or modification according to desired output bit rate requirements.
  - 23. The method of claim 18, further comprising, at the at least one receiving endpoint displaying the decoded output picture at a desired spatial resolution that falls in between an immediately lower and an immediately higher spatial layer provided by the received coded video signal.
  - 24. The method of claim 23, further comprising, at the at least one receiving endpoint, operating a decoding loop of the immediately higher spatial layer at the desired spatial resolution by scaling all coded data of the immediately higher spatial layer to the desired spatial resolution, and wherein the resultant drift is eliminated by using at least one of:
    - periodic intra pictures,periodic use of intra base layer mode,full resolution decoding of at least the lowest temporal layer of the immediately higher spatial layer.
  - 25. The method of claim 18, wherein the scalable video coding format is further configured with at least one of:
    - periodic intra pictures;
      
      periodic intra macroblocks; and
      
      threaded picture prediction;
      
      in order to avoid drift when the higher than the base spatial or quality layer'"'"'s coded information that is modified or eliminated corresponds to the base temporal layer.
  - 26. The method of claim 18, further comprising, at the at least one receiving endpoint operating at least one decoding loop for spatial or quality layers higher than the target spatial or quality layer for at least the base temporal layer, so that when the at least one receiving endpoint switches target layers it can immediately display decoded pictures at the new target layer resolution.

27. A video communication method comprising:
- a communication network,one endpoint (including a combination of hardware and software) that transmits coded digital video using a scalable video coding format, andat least one receiving endpoint for decoding a digital video signal coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability,wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers,at the transmitting endpoint, selectively eliminating or modifying portions of its coded video signal that correspond to layers higher than the base spatial or quality layer, prior to creating the output video signal that is forwarded to the at least one receiving endpoint, so that use of lower spatial or quality layer data is signaled or explicitly coded in the output video signal for use in decoding pictures at resolutions higher than the base spatial or quality layer, andat the transmitting endpoint, controlling the transmission rate of the output video signal that is transmitted to the at least one receiving endpoint so that the retained portions of its input video signal that correspond to layers higher than the base spatial or quality layer do not adversely affect the smoothness of the output bit rate.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34)
- - 28. The method of claim 27, wherein the scalable video coding format is based on hybrid coding such as in H.264, VC-1 or AVS standards, and wherein the lower spatial or quality layer data that is signaled for use or explicitly coded in the output video signal forwarded to the at least one receiving endpoint is comprised of at least one of:
    - motion vector data;
      
      coded prediction error difference;
      
      intra data; and
      
      reference picture indicators,wherein the data is further appropriately scaled to the desired target resolution when explicitly coded in the output video signal that is transmitted to the one or more receiving endpoints.
  - 29. The method of claim 27, further comprising, at the transmitting endpoint encoding transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“
    - R”
      
      ) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure and such that the decoder can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with the encoder, and wherein the transmitting endpoint selectively eliminates or modifies portions of its input video signal that correspond to layers higher than the base spatial or quality layer in non-R frames only, prior to creating the output video signal that is transmitted to the at least one receiving endpoint.
  - 30. The method of claim 27, further comprising, at the transmitting endpoint performing the selective elimination or modification according to desired output bit rate requirements.
  - 31. The method of claim 27, further comprising, at the at least one receiving endpoint displaying the decoded output picture at a desired spatial resolution that falls in between an immediately lower and an immediately higher spatial layer provided by the received coded video signal.
  - 32. The method of claim 31, further comprising, at the at least one receiving endpoint operating a decoding loop of the immediately higher spatial layer at the desired spatial resolution by scaling all coded data of the immediately higher spatial layer to the desired spatial resolution, and wherein the resultant drift is eliminated by using at least one of:
    - periodic intra pictures,periodic use of intra base layer mode,full resolution decoding of at least the lowest temporal layer of the immediately higher spatial layer.
  - 33. The method of claim 27, wherein the scalable video coding format is further configured with at least one of:
    - periodic intra pictures;
      
      periodic intra macroblocks; and
      
      threaded picture prediction,in order to avoid drift when the higher than the base spatial or quality layer'"'"'s coded information that is modified or eliminated corresponds to the base temporal layer.
  - 34. The method of claim 27, further comprising, at the receiving endpoint operating at least one decoding loop for spatial or quality layers higher than the target spatial or quality layer for at least the base temporal layer, so that when the at least one receiving endpoint switches target layers it can immediately display decoded pictures at the new target layer resolution.

35. A non-transitory computer readable medium comprising a set of executable instructions to direct a processor to:
- communicate over a communication network, having a conferencing server disposed therein and linked to at least one receiving and at least one transmitting endpoint by at least one communication channel each over the communication network, the at least one endpoint transmitting coded digital video using a scalable video coding format, and the at least one receiving endpoint for decoding a digital video signal coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability, wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers,at the conferencing server (including a combination of hardware and software), selectively eliminate or modify portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer prior to creating the output video signal that is forwarded to the at least one receiving endpoint, so that use of lower spatial or quality layer data is signaled or explicitly coded in the output video signal for use in decoding pictures at resolutions higher than the base spatial or quality layerat the conferencing server, control the transmission rate of the output video signal that is forwarded to the at least one receiving endpoint so that the retained portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer do not adversely affect the smoothness of the output bit rate.

36. A non-transitory computer readable medium comprising a set of executable instructions to direct a processor to:
- communicate over a communication network coupled to one endpoint (including a combination of hardware and software) that transmits coded digital video using a scalable video coding format, and further coupled to at least one receiving endpoint for decoding a digital video signal coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability,wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers,at the transmitting endpoint, selectively eliminate or modify portions of its coded video signal that correspond to layers higher than the base spatial or quality layer, prior to creating the output video signal that is forwarded to the at least one receiving endpoint, so that use of lower spatial or quality layer data is signaled or explicitly coded in the output video signal for use in decoding pictures at resolutions higher than the base spatial or quality layer, andat the transmitting endpoint, control the transmission rate of the output video signal that is transmitted to the at least one receiving endpoint so that the retained portions of its input video signal that correspond to layers higher than the base spatial or quality layer do not adversely affect the smoothness of the output bit rate.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vidyo Incorporated (Enghouse Systems Limited)
Original Assignee
Vidyo Incorporated (Enghouse Systems Limited)
Inventors
Eleftheriadis, Alexandros, Hong, Danny, Shapiro, Ofer, Wiegand, Thomas
Primary Examiner(s)
HOLDER, BRADLEY W

Application Number

US13/209,023
Publication Number

US 20110305275A1
Time in Patent Office

998 Days
Field of Search

375/240.12, 375/240, 375/240.01
US Class Current

375/240.12
CPC Class Codes

H04N 19/29   involving scalability at th...

H04N 19/30   using hierarchical techniqu...

H04N 19/31   in the temporal domain

H04N 19/33   in the spatial domain

H04N 19/36   Scalability techniques invo...

H04N 19/44   Decoders specially adapted ...

H04N 19/593   involving spatial predictio...

H04N 19/65   using error resilience

H04N 19/89   involving methods or arrang...

H04N 19/895   in combination with error c...

H04N 7/15   Conference systems

System and method for providing error resilence, random access and rate control in scalable video communications

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

63 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for providing error resilence, random access and rate control in scalable video communications

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

63 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links