Sound packet reproducing method, sound packet reproducing apparatus, sound packet reproducing program, and recording medium
First Claim
1. A reproducing method for receiving a stream of sent audio packets containing an audio code generated by encoding an input audio data stream frame by frame and reproducing an audio signal, comprising:
- (a) storing received packets in a receiving buffer;
(b) detecting, in a control part, a largest delay jitter and a number of buffered packets, the largest jitter being any of a largest value or statistical value of jitter obtained by observing arrival jitter of the received packets over a predetermined period of time and the number of buffered packets being a number of packets stored in the receiving buffer;
(c) obtaining, in a control part and based on the largest delay jitter, an optimum number of buffered packets by using a predetermined relation between the largest delay jitter and the optimum number of buffered packets, the optimum number of buffered packets being an optimum number of packets to be stored in the receiving buffer;
(d) determining, in the control part, on a scale of a plurality of levels, a difference between the detected number of buffered packets and the optimum number of buffered packets;
(e) retrieving, by the control part, a packet corresponding to a current frame from the receiving buffer and decoding an audio code in the packet to obtain a decoded audio data stream in the current frame; and
(f) performing, in a consumption adjusting part, any of expansion, reduction, and preservation of a waveform of the decoded audio data stream in accordance with a rule to make the number of buffered packets close to the optimum number of buffered packets, the rule being established for each level of the difference, and outputting a result as audio data of the current frame,wherein step (f) includes obtaining a pitch length of the decoded audio data stream, analyzing the audio data stream to determine whether the audio data stream is in a voice segment or a non-voice segment, and performing any of expansion, reduction, and preservation by inserting or removing a waveform corresponding to the pitch length in the decoded audio string or by not changing the decoded audio signal string, on the basis of a result of the determination of voice/non-voice segment and a result of the determination of the difference.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention prevents a receiving buffer from becoming empty by: storing received packets in the receiving buffer; detecting the largest arrival delay jitter of the packets and the buffer level of the receiving buffer by a state detecting part; obtaining an optimum buffer level for the largest delay jitter using a predetermined table by a control part; determining, based on the detected buffer level and the optimum buffer level, the level of urgency about the need to adjust the buffer level; expanding or reducing the waveform of a decoded audio data stream of the current frame decoded from a packet read out of the receiving buffer by a consumption adjusting part to adjust the consumption of reproduction frames on the basis of the urgency level, the detected buffer level, and the optimum buffer level.
-
Citations
12 Claims
-
1. A reproducing method for receiving a stream of sent audio packets containing an audio code generated by encoding an input audio data stream frame by frame and reproducing an audio signal, comprising:
-
(a) storing received packets in a receiving buffer; (b) detecting, in a control part, a largest delay jitter and a number of buffered packets, the largest jitter being any of a largest value or statistical value of jitter obtained by observing arrival jitter of the received packets over a predetermined period of time and the number of buffered packets being a number of packets stored in the receiving buffer; (c) obtaining, in a control part and based on the largest delay jitter, an optimum number of buffered packets by using a predetermined relation between the largest delay jitter and the optimum number of buffered packets, the optimum number of buffered packets being an optimum number of packets to be stored in the receiving buffer; (d) determining, in the control part, on a scale of a plurality of levels, a difference between the detected number of buffered packets and the optimum number of buffered packets; (e) retrieving, by the control part, a packet corresponding to a current frame from the receiving buffer and decoding an audio code in the packet to obtain a decoded audio data stream in the current frame; and (f) performing, in a consumption adjusting part, any of expansion, reduction, and preservation of a waveform of the decoded audio data stream in accordance with a rule to make the number of buffered packets close to the optimum number of buffered packets, the rule being established for each level of the difference, and outputting a result as audio data of the current frame, wherein step (f) includes obtaining a pitch length of the decoded audio data stream, analyzing the audio data stream to determine whether the audio data stream is in a voice segment or a non-voice segment, and performing any of expansion, reduction, and preservation by inserting or removing a waveform corresponding to the pitch length in the decoded audio string or by not changing the decoded audio signal string, on the basis of a result of the determination of voice/non-voice segment and a result of the determination of the difference. - View Dependent Claims (2, 3)
-
-
4. A reproducing method for receiving a stream of sent audio packets containing an audio code generated by encoding an input audio data stream frame by frame and reproducing an audio signal, comprising:
-
(a) storing received packets in a receiving buffer; (b) detecting, in a control part, a largest delay jitter and a number of buffered packets, the largest jitter being any of a largest value or statistical value of jitter obtained by observing arrival jitter of the received packets over a predetermined period of time and the number of buffered packets being a number of packets stored in the receiving buffer; (c) obtaining, in a control part and based on the largest delay jitter, an optimum number of buffered packets by using a predetermined relation between the largest delay jitter and the optimum number of buffered packets, the optimum number of buffered packets being an optimum number of packets to be stored in the receiving buffer; (d) determining, in the control part, on a scale of a plurality of levels, a difference between the detected number of buffered packets and the optimum number of buffered packets; (e) retrieving, by the control part, a packet corresponding to a current frame from the receiving buffer and decoding an audio code in the packet to obtain a decoded audio data stream in the current frame; and (f) performing, in a consumption adjusting part, any of expansion, reduction, and preservation of a waveform of the decoded audio data stream in accordance with a rule to make the number of buffered packets close to the optimum number of buffered packets, the rule being established for each level of the difference, and outputting a result as audio data of the current frame, wherein step (f) includes obtaining the pitch length of the decoded audio data stream, analyzing the decoded audio data stream to determine which of a voiced sound segment, an unvoiced sound segment, a background noise segment, and a silence segment the decoded audio data stream is in, and performing any of expansion, reduction, and preservation of the decoded audio data stream by inserting or removing a waveform corresponding to the pitch length in the decoded audio data stream or by not changing the decoded audio data stream, on the basis of the result of the segment determination and the result of the determination of the difference level. - View Dependent Claims (5)
-
-
6. A reproducing apparatus for audio packets which receives a stream of sent audio packets containing an audio code generated by encoding an input audio data stream frame by frame and reproduces an audio signal, comprising:
-
a packet receiving part configured to receive audio packets from a packet communication network; a receiving buffer configured to temporarily store the received packets and configured to read out packets in response to a request; a state detecting part configured to detect a largest delay jitter and a number of buffered packets, the largest jitter being any of a largest value or statistical value of jitter obtained by observing arrival jitter of the received packets over a predetermined period of time and the number of buffered packets being a number of packets stored in the receiving buffer; a control part configured to obtain based on the largest delay jitter an optimum number of buffered packets by using a predetermined relation between the largest delay jitter and the optimum number of buffered packets, the optimum number of buffered packets being an optimum number of packets to be stored in the receiving buffer, determine, on a scale of a plurality of levels, a difference between the detected number of buffered packets and the optimum number of buffered packets, and generate a control signal to perform any of expansion, reduction, and preservation of a waveform of the decoded audio data stream in accordance with a rule to make the number of buffered packets close to the optimum number of buffered packets, the rule being established for each level of the difference; an audio packet decoding part configured to decode an audio code in a packet corresponding to a current frame extracted from the receiving buffer to obtain a decoded audio data stream in the current frame; a consumption adjusting part configured to perform any of expansion, reduction, and preservation of the waveform of the decoded audio data stream in accordance with the control signal and configured to output a result as sound data of the current frame; and an audio analyzing part configured to analyze the decoded audio data stream to determine whether the decoded audio data stream is in a voice segment or a non-voice segment, the audio analyzing part providing a result of the determination to the control part, the audio control part obtaining a pitch length of the decoded audio data stream and providing the pitch length to the consumption adjusting part, wherein, the control part provides control to cause the consumption adjusting part to perform any of expansion, reduction, and preservation of the decoded audio data stream of the current frame, on the basis of a result of the segment determination and a result of the difference level determination, and the consumption adjusting part inserts or removes a waveform corresponding to the pitch length in the decoded audio data stream or does not change the decoded audio data stream, in accordance with the control. - View Dependent Claims (7, 8)
-
-
9. A reproducing apparatus for audio packets which receives a stream of sent audio packets containing an audio code generated by encoding an input audio data stream frame by frame and reproduces an audio signal, comprising:
-
a packet receiving part configured to receive audio packets from a packet communication network; a receiving buffer configured to temporarily store the received packets and reading out packets in response to a request; a state detecting part configured to detect a largest delay jitter and a number of buffered packets, the largest jitter being any of a largest value or statistical value of jitter obtained by observing arrival jitter of the received packets over a predetermined period of time and the number of buffered packets being a number of packets stored in the receiving buffer; a control part configured to obtain based on the largest delay jitter an optimum number of buffered packets by using a predetermined relation between the largest delay jitter and the optimum number of buffered packets, the optimum number of buffered packets being the optimum number of packets to be stored in the receiving buffer, determine, on a scale of a plurality of levels, a difference between the detected number of buffered packets and the optimum number of buffered packets, and generate a control signal for instructing to perform any of expansion, reduction, and preservation of a waveform of the decoded audio data stream in accordance with a rule to make the number of buffered packets close to the optimum number of buffered packets, the rule being established for each level of the difference; an audio packet decoding part configured to decode an audio code in a packet corresponding to a current frame extracted from the receiving buffer to obtain a decoded audio data stream in the current frame; and a consumption adjusting configured to perform any of expansion, reduction, and preservation of the waveform of the decoded audio data stream in accordance with the control signal and outputs a result as sound data of the current frame, wherein the audio analyzing part analyzes the decoded audio data stream to determine whether the decoded audio data stream includes a voiced sound segment, an unvoiced sound segment, a background noise segment, and a silence segment, provides a result of the determination to the control part, obtains a pitch length of the decoded audio data stream, and provides the pitch length to the consumption adjusting part; the control part provides a control based on a result of the segment determination and a result of the difference level determination to the consumption adjusting part to perform any of expansion, reduction, and preservation of the decoded audio data stream of a current frame; and the consumption adjusting part, in accordance with the control, inserts or removes a waveform corresponding to the pitch length in the decoded audio data stream or does not change the decoded audio data stream. - View Dependent Claims (10)
-
-
11. A computer-readable recording medium storing computer-readable instructions thereon, the computer-readable instructions when executed by a computer cause the computer to perform the method comprising:
-
storing received packets in a receiving buffer; detecting a largest delay jitter and a number of buffered packets, the largest jitter being any of a largest value or statistical value of jitter obtained by observing arrival jitter of the received packets over a predetermined period of time and the number of buffered packets being a number of packets stored in the receiving buffer; obtaining, based on the largest delay jitter, an optimum number of buffered packets by using a predetermined relation between the largest delay jitter and the optimum number of buffered packets, the optimum number of buffered packets being an optimum number of packets to be stored in the receiving buffer; determining, on a scale of a plurality of levels, a difference between the detected number of buffered packets and the optimum number of buffered packets; retrieving a packet corresponding to a current frame from the receiving buffer and decoding an audio code in the packet to obtain a decoded audio data stream in the current frame; and performing any of expansion, reduction, and preservation of a waveform of the decoded audio data stream in accordance with a rule to make the number of buffered packets close to the optimum number of buffered packets, the rule being established for each level of the difference, and outputting a result as audio data of the current frame, wherein the performing includes obtaining a pitch length of the decoded audio data stream, analyzing the audio data stream to determine whether the audio data stream is in a voice segment or a non-voice segment, and performing any of expansion, reduction, and preservation by inserting or removing a waveform corresponding to the pitch length in the decoded audio string or by not changing the decoded audio signal string, on the basis of a result of the determination of voice/non-voice segment and a result of the determination of the difference level.
-
-
12. A computer-readable medium storing computer-readable instructions thereon, the computer readable instructions when executed by a computer cause the computer to perform the method comprising:
-
storing received packets in a receiving buffer; detecting a largest delay jitter and a number of buffered packets, the largest jitter being any of a largest value or statistical value of jitter obtained by observing arrival jitter of the received packets over a predetermined period of time and the number of buffered packets being a number of packets stored in the receiving buffer; obtaining, based on the largest delay jitter, an optimum number of buffered packets by using a predetermined relation between the largest delay jitter and the optimum number of buffered packets, the optimum number of buffered packets being an optimum number of packets to be stored in the receiving buffer; determining, on a scale of a plurality of levels, a difference between the detected number of buffered packets and the optimum number of buffered packets; retrieving a packet corresponding to a current frame from the receiving buffer and decoding an audio code in the packet to obtain a decoded audio data stream in the current frame; and performing any of expansion, reduction, and preservation of a waveform of the decoded audio data stream in accordance with a rule to make the number of buffered packets close to the optimum number of buffered packets, the rule being established for each level of the difference, and outputting a result as audio data of the current frame, wherein the performing includes obtaining the pitch length of the decoded audio data stream, analyzing the decoded audio data stream to determine which of a voiced sound segment, an unvoiced sound segment, a background noise segment, and a silence segment the decoded audio data stream is in, and performing any of expansion, reduction, and preservation of the decoded audio data stream by inserting or removing a waveform corresponding to the pitch length in the decoded audio data stream or by not changing the decoded audio data stream, on the basis of a result of the segment determination and a result of the determination of the difference level.
-
Specification