Packet prioritization and associated bandwidth and buffer management techniques for audio over IP

US 8,370,515 B2
Filed: 03/26/2010
Issued: 02/05/2013
Est. Priority Date: 09/30/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method for processing voice communications over a data network, comprising:

receiving a voice stream from a first user, the voice stream comprising a plurality of temporally distinct segments associated with a plurality of packets and the voice stream being a part of a session between at least the first user and a second user, wherein the session has an associated at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer andcomparing the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer with a predetermined threshold;

(i) when the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer exceeds the predetermined threshold, not transmitting at least some of the plurality of packets and(ii) when the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer is less than the predetermined threshold, transmitting the at least some of the plurality of packets; and

further comprising at least one of the following steps;

determining whether or not the contents of a selected first segment of the plurality of temporally distinct segments of the voice stream are the product of voice activity and, when the contents are determined not to be the product of voice activity, indicate a level of confidence that the voice activity determination is accurate; and

comparing the selected first segment with a second segment of the plurality of temporally distinct segments to determine a degree of acoustic similarity between the first and second segments, wherein the processing of the first segment is based on at least one of the level of confidence, the type of voice activity, and the degree of acoustic similarity.

View all claims

17 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is directed to voice communication devices in which an audio stream is divided into a sequence of individual packets, each of which is routed via pathways that can vary depending on the availability of network resources. All embodiments of the invention rely on an acoustic prioritization agent that assigns a priority value to the packets. The priority value is based on factors such as whether the packet contains voice activity and the degree of acoustic similarity between this packet and adjacent packets in the sequence. A confidence level, associated with the priority value, may also be assigned. In one embodiment, network congestion is reduced by deliberately failing to transmit packets that are judged to be acoustically similar to adjacent packets; the expectation is that, under these circumstances, traditional packet loss concealment algorithms in the receiving device will construct an acceptably accurate replica of the missing packet. In another embodiment, the receiving device can reduce the number of packets stored in its jitter buffer, and therefore the latency of the speech signal, by selectively deleting one or more packets within sustained silences or non-varying speech events. In both embodiments, the ability of the system to drop appropriate packets may be enhanced by taking into account the confidence levels associated with the priority assessments.

243 Citations

11 Claims

1. A method for processing voice communications over a data network, comprising:
- receiving a voice stream from a first user, the voice stream comprising a plurality of temporally distinct segments associated with a plurality of packets and the voice stream being a part of a session between at least the first user and a second user, wherein the session has an associated at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer andcomparing the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer with a predetermined threshold;
  
  (i) when the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer exceeds the predetermined threshold, not transmitting at least some of the plurality of packets and(ii) when the at least one of a jitter value, a latency value, a number of missing packets, a number of packets received out-of-order, a processing delay, a propagation delay, a receive buffer delay, and a number of packets enqueued in a receive buffer is less than the predetermined threshold, transmitting the at least some of the plurality of packets; and
  
  further comprising at least one of the following steps;
  
  determining whether or not the contents of a selected first segment of the plurality of temporally distinct segments of the voice stream are the product of voice activity and, when the contents are determined not to be the product of voice activity, indicate a level of confidence that the voice activity determination is accurate; and
  
  comparing the selected first segment with a second segment of the plurality of temporally distinct segments to determine a degree of acoustic similarity between the first and second segments, wherein the processing of the first segment is based on at least one of the level of confidence, the type of voice activity, and the degree of acoustic similarity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising:
    - based on the at least one of the level of confidence, type of voice activity and the degree of acoustic similarity, assigning an importance to the first segment.
  - 3. The method of claim 2, wherein the importance is a value marker and further comprising:
    - incorporating the value marker into a first packet comprising the first segment.
  - 4. The method of claim 2, wherein the importance is a service class assigned to a first packet comprising the first segment.
  - 5. The method of claim 2, wherein the importance is a transmission priority assigned to a first packet comprising the first segment.
  - 6. The method of claim 1, wherein in the processing step a first packet comprising the first segment is not transmitted when the at least one of the level of confidence and the degree of acoustic similarity is one of less than and greater than a predetermined threshold.
  - 7. The method of claim 6, wherein a first packet associated with the first segment is not transmitted and further comprising:
    - later reconstructing the first segment with a packet loss concealment algorithm.
  - 8. The method of claim 1, wherein the type of voice activity is a plosive.
  - 9. The method of claim 1, wherein the at least one step is the determining step and wherein the level of confidence is based on a degree of similarity of temporally adjacent packets.
  - 10. The method of claim 1, wherein the at least one step is the determining step and wherein the level of confidence is based on the significance of audio in a packet to receiver understanding or fidelity.
  - 11. The method of claim 1, wherein the at least one step is the determining step and wherein the level of confidence permits a voice activity detector to provide a ternary output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Arlington Technologies, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Avaya Incorporated
Inventors
Gentle, Christopher R., Michaelis, Paul Roller
Primary Examiner(s)
Tran, Philip B

Application Number

US12/748,094
Publication Number

US 20100182930A1
Time in Patent Office

1,047 Days
Field of Search

709/231, 709/238, 709223-224
US Class Current

709/231
CPC Class Codes

H04L 47/10   Flow control; Congestion co...

H04L 47/12   Avoiding congestion; Recove...

H04L 47/2408   for supporting different se...

H04L 47/2416   Real-time traffic

H04L 47/2433   Allocation of priorities to...

H04L 47/283   in response to processing d...

H04L 47/31   by tagging of packets, e.g....

H04L 47/32   by discarding or delaying d...

H04L 65/1101   Session protocols

H04L 65/70   Media network packetisation

Packet prioritization and associated bandwidth and buffer management techniques for audio over IP

First Claim

17 Assignments

0 Petitions

Accused Products

Abstract

243 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Packet prioritization and associated bandwidth and buffer management techniques for audio over IP

First Claim

17 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

243 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links