Method and apparatus for dynamically adjusting the playout delay of audio signals
First Claim
1. A method for dynamically adjusting playout delay of audio signals encoded into a sequence of voice packets and transmitted from a transmitting end through a packet-switched network to a receiving end, said method comprising the steps of:
- storing a plurality of said voice packets in a jitter buffer at said receiving end, and dynamically determining whether to adjust silence length in said voice packets based on the number of said voice packets in said jitter buffer in order to adjust said playout delay;
dividing said jitter buffer into three zones for temporarily storing said voice packets, and providing dynamic adjustment of silence length to extend or shrink said playout delay; and
dynamically adjusting the sizes of said three zones of said jitter buffer according to the number of said voice packets in said jitter buffer;
wherein said step of dynamically adjusting the sizes of said three zones further comprises the steps of;
mapping said jitter buffer into five zones according to the number of said voice packets in said jitter buffer, said five zones including a no data to play zone A0, an extending silence zone A1, a normal delay zone A2, a shrinking silence zone A3, and a discarding voice packet zone A4, thereby said jitter buffer being divided into said zone A1, said zone A2, and said zone A3 with said zone A2 having a lower bound of normal delay L and an upper bound of normal delay U;
using a probability model to obtain PTn(Ai) of said zone Ai over a next time interval [Tn,Tn+1], said PTn(Ai) being the probability that the number of said voice packets in said jitter buffer falls into said zone Ai in the time interval [Tn,Tn+1], i being an integer number from 0 to 4 and n being a natural number; and
comparing pre-defined values TA0, TA1 and TA3, with said probability PTn(A0), PTn(A1), and PTn(A3) to determine whether to adjust said upper bound of normal delay U and said lower bound of normal delay L.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a method and apparatus for dynamically adjusting the playout delay for audio signals, which mainly includes three parts of dynamic adjustment, i.e., playout delay, silence length, and jitter buffer size. In the invention, the time for playout delay is real-time adjusted according to the probability distribution of the number of packets buffered in a jitter buffer. A voice active detection mechanism is taken to detect silence within a voice packet. By dynamically adjusting the silence length in the voice packets, the present invention reduces the network variation impact on the voice quality. It also overcomes the drawback of conventional techniques for estimating playout delay, and reduces the whole computation complexity of the playout delay for the voice packets.
-
Citations
6 Claims
-
1. A method for dynamically adjusting playout delay of audio signals encoded into a sequence of voice packets and transmitted from a transmitting end through a packet-switched network to a receiving end, said method comprising the steps of:
-
storing a plurality of said voice packets in a jitter buffer at said receiving end, and dynamically determining whether to adjust silence length in said voice packets based on the number of said voice packets in said jitter buffer in order to adjust said playout delay; dividing said jitter buffer into three zones for temporarily storing said voice packets, and providing dynamic adjustment of silence length to extend or shrink said playout delay; and dynamically adjusting the sizes of said three zones of said jitter buffer according to the number of said voice packets in said jitter buffer; wherein said step of dynamically adjusting the sizes of said three zones further comprises the steps of; mapping said jitter buffer into five zones according to the number of said voice packets in said jitter buffer, said five zones including a no data to play zone A0, an extending silence zone A1, a normal delay zone A2, a shrinking silence zone A3, and a discarding voice packet zone A4, thereby said jitter buffer being divided into said zone A1, said zone A2, and said zone A3 with said zone A2 having a lower bound of normal delay L and an upper bound of normal delay U; using a probability model to obtain PTn(Ai) of said zone Ai over a next time interval [Tn,Tn+1], said PTn(Ai) being the probability that the number of said voice packets in said jitter buffer falls into said zone Ai in the time interval [Tn,Tn+1], i being an integer number from 0 to 4 and n being a natural number; and comparing pre-defined values TA0, TA1 and TA3, with said probability PTn(A0), PTn(A1), and PTn(A3) to determine whether to adjust said upper bound of normal delay U and said lower bound of normal delay L. - View Dependent Claims (2, 3)
-
-
4. An apparatus used in a packet-switched network for dynamically adjusting playout delay of audio signals, comprising:
-
a jitter buffer for temporarily storing a plurality of received voice packets, and delaying and re-ordering playout time of said voice packets; a dynamic playout delay adjustment module for dividing said jitter buffer into three zones, and dynamically extending or shrinking silence length of said voice packets to adjust said playout delay of said voice packets according to the number of said voice packets in said jitter buffer; a dynamic silence length adjustment module for dynamically adjusting a shrinking size or an extending size of said silence length according to the number of said voice packets in said jitter buffer; and a dynamic jitter buffer zone adjustment module for dynamically adjusting the sizes of said three zones of said jitter buffer according to the number of said voice packets in said jitter buffer; wherein at least one of said jitter buffer, said dynamic playout delay adjustment module, said dynamic silence length adjustment module and said dynamic jitter buffer zone adjustment module in said apparatus is a hardware module, and said jitter buffer is mapped into an extending silence zone A1 in which the number of said voice packets in said jitter buffer is below a lower bound of normal delay L, a normal delay zone A2 in which the number of said voice packets in said jitter buffer is in a normal range between said lower bound of normal delay L and an upper bound of normal delay U, and a shrinking silence zone A3 in which the number of said voice packets in said jitter buffer is above said upper bound of normal delay U;
when said jitter buffer contains no voice packets for playout, said jitter buffer falls into a no data to play zone A0; and
when said jitter buffer contains more voice packets for playout than a maximum acceptable delay Max, said jitter buffer falls into a discarding voice packet zone A4. - View Dependent Claims (5, 6)
-
Specification