Packet prioritization and associated bandwidth and buffer management techniques for audio over IP
First Claim
1. A method for processing voice communications over a data network, comprising:
- (a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments;
(b) processing at least one selected first segment of the voice stream, wherein the processing step comprises at least one of the following substeps;
(i) determining whether or not the contents of the selected first segment are the product of voice activity and, when the contents are determined not to be the product of voice activity, a level of confidence that the voice activity determination is accurate;
(ii) determining a type of voice activity associated with the contents of the first segment; and
(iii) comparing the first segment with a second segment of the voice stream to determine a degree of acoustic similarity between the first and second segments, wherein the processing of the first segment is based on at least one of the level of confidence, the type of voice activity, and the degree of acoustic similarity.
27 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to voice communication devices in which an audio stream is divided into a sequence of individual packets, each of which is routed via pathways that can vary depending on the availability of network resources. All embodiments of the invention rely on an acoustic prioritization agent that assigns a priority value to the packets. The priority value is based on factors such as whether the packet contains voice activity and the degree of acoustic similarity between this packet and adjacent packets in the sequence. A confidence level, associated with the priority value, may also be assigned. In one embodiment, network congestion is reduced by deliberately failing to transmit packets that are judged to be acoustically similar to adjacent packets; the expectation is that, under these circumstances, traditional packet loss concealment algorithms in the receiving device will construct an acceptably accurate replica of the missing packet. In another embodiment, the receiving device can reduce the number of packets stored in its jitter buffer, and therefore the latency of the speech signal, by selectively deleting one or more packets within sustained silences or non-varying speech events. In both embodiments, the ability of the system to drop appropriate packets may be enhanced by taking into account the confidence levels associated with the priority assessments.
-
Citations
74 Claims
-
1. A method for processing voice communications over a data network, comprising:
-
(a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments;
(b) processing at least one selected first segment of the voice stream, wherein the processing step comprises at least one of the following substeps;
(i) determining whether or not the contents of the selected first segment are the product of voice activity and, when the contents are determined not to be the product of voice activity, a level of confidence that the voice activity determination is accurate;
(ii) determining a type of voice activity associated with the contents of the first segment; and
(iii) comparing the first segment with a second segment of the voice stream to determine a degree of acoustic similarity between the first and second segments, wherein the processing of the first segment is based on at least one of the level of confidence, the type of voice activity, and the degree of acoustic similarity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for managing a receive buffer, comprising:
-
providing a receive buffer, the receive buffer containing a plurality of packets associated with voice communications; and
based on a level of importance associated with at least some of the plurality of packets, removing at least some of the packets from the receive buffer while leaving other packets in the receive buffer. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A system for transmitting voice communications over a data network, comprising:
-
(a) an input operable to receive a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments;
(b) a packet protocol interface operable to convert at least one selected first segment of the voice stream into at least a first packet; and
(c) an acoustic prioritization agent operable to control processing of at least one of the first segment and the at least a first packet based on at least one of a level of confidence that the contents of the selected first segment are not the product of voice activity, a type of voice activity associated with the contents of the first segment, and a degree of acoustic similarity between the first segment and a second segment of the voice stream. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53)
-
-
54. A system for managing a receive buffer, comprising:
-
a receive buffer containing a plurality of packets associated with voice communications; and
a buffer manager operable to remove at least some of the packets from the receive buffer while leaving other packets in the receive buffer based on a level of importance associated with at least some of the plurality of packets. - View Dependent Claims (55, 56, 57, 58, 59)
-
-
60. A packet, comprising:
-
a packet header comprising transmission information and a payload comprising one or more frames of a voice stream, wherein at least one of the packet header and payload comprises a value of the value marker is indicative of a level of importance of the payload to maintaining a selected quality of voice communication. - View Dependent Claims (61, 62)
-
-
63. A method for processing voice communications over a data network, comprising:
-
(a) receiving a first voice stream from a first user, the voice stream comprising a plurality of temporally distinct segments associated with a plurality of packets and the voice stream being a part of a session between at least the first user and a second user, wherein the session has an associated at least one of a a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer and (b) comparing the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer with a predetermined threshold;
(i) when the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer exceeds the predetermined threshold, not transmitting at least some of the plurality of packets and (ii) when the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer is less than the predetermined threshold, transmitting the at least some of the plurality of packets. - View Dependent Claims (64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74)
-
Specification