Packet prioritization and associated bandwidth and buffer management techniques for audio over IP
First Claim
1. A method for processing voice communications over a data network, comprising:
- (a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and
(b) processing at least first, second and third segments of the voice stream according to the following substeps;
(i) selecting the first segment, wherein the contents of the selected first segment are not product of voice activity;
(ii) determining that the contents of the selected first segment are not the product of voice activity;
(iii) determining a level of confidence that the voice activity determination for the selected first segment is accurate;
(iv) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected first segment to a selected endpoint;
(v) selecting the second segment, wherein the contents of the selected second segment are the product of voice activity and wherein the second and third segments are temporally adjacent to one another;
(vi) determining that the contents of the selected segment are the product of voice activity;
(vii) comparing the selected second segment with the third segment to determine a degree of acoustic similarity between the second and third segments; and
(viii) when the selected second segment is similar to the third segment, at least one of not transmitting the selected second segment to the selected endpoint and dropping the second segment during transmission.
27 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to voice communication devices in which an audio stream is divided into a sequence of individual packets, each of which is routed via pathways that can vary depending on the availability of network resources. All embodiments of the invention rely on an acoustic prioritization agent that assigns a priority value to the packets. The priority value is based on factors such as whether the packet contains voice activity and the degree of acoustic similarity between this packet and adjacent packets in the sequence. A confidence level, associated with the priority value, may also be assigned. In one embodiment, network congestion is reduced by deliberately failing to transmit packets that are judged to be acoustically similar to adjacent packets; the expectation is that, under these circumstances, traditional packet loss concealment algorithms in the receiving device will construct an acceptably accurate replica of the missing packet. In another embodiment, the receiving device can reduce the number of packets stored in its jitter buffer, and therefore the latency of the speech signal, by selectively deleting one or more packets within sustained silences or non-varying speech events. In both embodiments, the ability of the system to drop appropriate packets may be enhanced by taking into account the confidence levels associated with the priority assessments.
140 Citations
51 Claims
-
1. A method for processing voice communications over a data network, comprising:
-
(a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing at least first, second and third segments of the voice stream according to the following substeps; (i) selecting the first segment, wherein the contents of the selected first segment are not product of voice activity; (ii) determining that the contents of the selected first segment are not the product of voice activity; (iii) determining a level of confidence that the voice activity determination for the selected first segment is accurate; (iv) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected first segment to a selected endpoint; (v) selecting the second segment, wherein the contents of the selected second segment are the product of voice activity and wherein the second and third segments are temporally adjacent to one another; (vi) determining that the contents of the selected segment are the product of voice activity; (vii) comparing the selected second segment with the third segment to determine a degree of acoustic similarity between the second and third segments; and (viii) when the selected second segment is similar to the third segment, at least one of not transmitting the selected second segment to the selected endpoint and dropping the second segment during transmission. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer readable circuit containing processor executable instructions to perform steps comprising:
-
(a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing at least first, second and third segments of the voice stream according to the following substeps; (i) selecting the first segment, wherein the contents of the selected first segment are not product of voice activity; (ii) determining that the contents of the selected first segment are not the product of voice activity; (iii) determining a level of confidence that the voice activity determination for the selected first segment is accurate; (iv) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected first segment to a selected endpoint; (v) selecting the second segment, wherein the contents of the selected second segment are the product of voice activity and wherein the second and third segments are temporally adjacent to one another; (vi) determining that the contents of the selected segment are the product of voice activity; (vii) comparing the selected second segment with the third segment to determine a degree of acoustic similarity between the second and third segments; and (viii) when the selected second segment is similar to the third segment, at least one of not transmitting the selected second segment to the selected endpoint and dropping the second segment during transmission.
-
-
21. A logic circuit configured to perform steps comprising:
-
(a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing at least first, second and third segments of the voice stream according to the following substeps; (i) selecting the first segment, wherein the contents of the selected first segment are not product of voice activity; (ii) determining that the contents of the selected first segment are not the product of voice activity; (iii) determining a level of confidence that the voice activity determination for the selected first segment is accurate; (iv) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected first segment to a selected endpoint; (v) selecting the second segment, wherein the contents of the selected second segment are the product of voice activity and wherein the second and third segments are temporally adjacent to one another; (vi) determining that the contents of the selected segment are the product of voice activity; (vii) comparing the selected second segment with the third segment to determine a degree of acoustic similarity between the second and third segments; and (viii) when the selected second segment is similar to the third segment, at least one of not transmitting the selected second segment to the selected endpoint and dropping the second segment during transmission.
-
-
22. A method for processing voice communications over a data network, comprising:
-
(a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing the segments of the voice stream according to the following rules; (i) determining whether or not the content of a selected segment is a product of voice activity; (ii) when the content of the selected segment is determined not to be the product of voice activity, determining a level of confidence that the voice activity determination for the selected segment is accurate; (iii) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected segment to a selected endpoint; (iv) when the content of the selected segment is determined to be the product of voice activity, comparing the selected segment with at least one temporally adjacent segment to determine a degree of acoustic similarity between the selected and at least one temporally adjacent segments; and (v) when the selected segment is similar to the at least one temporally adjacent segment, at least one of not transmitting the selected segment to the selected endpoint and transmitting a packet comprising the selected segment with a level of importance lower than a packet comprising a dissimilar segment. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer readable medium comprising processor-executable instructions operable to perform steps comprising:
-
(a) receiving a voice stream from a user, the voice stream comprising a plurally of temporally distinct segments; and (b) processing the segments of the voice stream according to the following rules; (i) determining whether or not the content of a selected segment is a product of voice activity (iii) when the content of the selected segment is determined not to be the product of voice activity, determining a level of confidence that the voice activity determination for the selected segment is accurate; (iii) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected segment to a selected endpoint; (iv) when the content of the selected segment is determined to be the product of voice activity, comparing the selected segment with at least one temporally adjacent segment to determine a degree of acoustic similarity between the selected and at least one temporally adjacent segments; and (v) when the selected segment is similar to the at least one temporally adjacent segment, at least one of not transmitting the selected segment to the selected endpoint and transmitting a packet comprising the selected segment with a level of importance lower than a packet comprising a dissimilar segment. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A logic circuit operable to perform steps comprising:
-
(a) receiving a voice stream from a user, the voice stream comprising a plurality of temporally distinct segments; and (b) processing the segments of the voice stream according to the following rules; (i) determining whether or not the content of a selected segment is a product of voice activity; (iii) when the content of the selected segment is determined not to be the product of voice activity, determining a level of confidence that the voice activity determination for the selected segment is accurate; (iii) when the level of confidence is one of less than and greater than a predetermined threshold, not transmitting the selected segment to a selected endpoint; (iv) when the content of the selected segment is determined to be the product of voice activity, comparing the selected segment with at least one temporally adjacent segment to determine a degree of acoustic similarity between the selected and at least one temporally adjacent segments; and (v) when the selected segment is similar to the at least one temporally adjacent segment, at least one of not transmitting the selected segment to the selected endpoint and transmitting a packet comprising the selected segment with a level of importance lower than a packet comprising a dissimilar segment. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51)
-
Specification