Method and apparatus for enhancing voice intelligibility in voice-over-IP network applications with late arriving packets

US 20060074681A1
Filed: 09/24/2004
Published: 04/06/2006
Est. Priority Date: 09/24/2004
Status: Active Grant

First Claim

Patent Images

1. A method for playing out speech received as a sequence of encoded speech packets over a packet-based communications network, the method comprising the steps of:

determining that a given speech packet has not been received prior to a time when said given speech packet is to be decoded for playout;

replacing said given speech packet with replacement speech data with use of a packet loss concealment technique;

playing out said replacement speech data in place of said given speech packet;

receiving said given speech packet at a time subsequent to said playing out of said replacement speech data;

modifying said given speech packet which has been received to generate a time scale modified version thereof, said time scale modified version of said given speech packet comprising speech having a reduced time length relative to said given speech packet; and

playing out said time scale modified version of said given speech packet after said replacement speech packet has been played out.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for enhancing voice intelligibility for network communications of speech such as, for example, VoIP (Voice-Over-Internet-Protocol), in the presence of packets which arrive too late for normal playout. When a late speech packet is received by a speech decoder, that packet and, if necessary, one or more additional packets subsequent thereto, are played out over a shorter than normal duration so that the decoder can “catch up” with the encoder. Since a voice frame is usually decoded in several sub-frames—typically two or three—this shortened playout may be achieved, for example, by skipping one sub-frame from each frame to be shortened.

30 Citations

View as Search Results

20 Claims

1. A method for playing out speech received as a sequence of encoded speech packets over a packet-based communications network, the method comprising the steps of:
- determining that a given speech packet has not been received prior to a time when said given speech packet is to be decoded for playout;
  
  replacing said given speech packet with replacement speech data with use of a packet loss concealment technique;
  
  playing out said replacement speech data in place of said given speech packet;
  
  receiving said given speech packet at a time subsequent to said playing out of said replacement speech data;
  
  modifying said given speech packet which has been received to generate a time scale modified version thereof, said time scale modified version of said given speech packet comprising speech having a reduced time length relative to said given speech packet; and
  
  playing out said time scale modified version of said given speech packet after said replacement speech packet has been played out.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein said step of determining that said given speech packet has not been received prior to the time when said given speech packet is to be decoded for playout comprises determining that a jitter buffer is empty at said time when said given speech packet is to be decoded for playout.
  - 3. The method of claim 1 where said replacement speech data is generated based on a previous speech packet in said sequence of encoded speech packets.
  - 4. The method of claim 3 wherein said packet loss concealment technique comprises replacing said given speech packet with a duplicate of an immediately previous speech packet in said sequence of encoded speech packets.
  - 5. The method of claim 1 wherein said time scale modified version of said given speech packet is generated from said given speech packet with use of a pitch synchronous overlap add (PSOLA) technique.
  - 6. The method of claim 1 wherein said given speech packet comprises a speech frame consisting of a plurality of sub-frames, and wherein said time scale modified version of said given speech packet is generated from said given speech packet by eliminating one or more of said plurality of sub-frames therefrom.
  - 7. The method of claim 1 further comprising the step of determining that said given speech packet which has been received at a time subsequent to said playing out of said replacement speech data has also been received at a time prior to a predetermined time limit after said time when said given speech packet was to be decoded for playout.
  - 8. The method of claim 1 further comprising the steps of:
    - receiving one or more speech packets subsequent to said given speech packet in said sequence of speech packets;
      
      modifying a number of said subsequent speech packets to generate a corresponding time scale modified version thereof, said time scale modified version of each of said number of subsequent speech packets comprising speech having a reduced time length relative to said corresponding subsequent speech packet; and
      
      playing out each of said number of said time scale modified versions of said subsequent speech packets after said time scale modified version of said given speech packet has been played out.
  - 9. The method of claim 8 wherein said number has a fixed value such that after said number of said time scale modified versions of said subsequent speech packets have been played out, said sequence of encoded speech packets as received are synchronized with said playing out thereof.
  - 10. The method of claim 1 wherein the speech received as a sequence of encoded speech packets over a packet-based communications network comprises Voice-over-IP.

11. An apparatus for playing out speech received as a sequence of encoded speech packets over a packet-based communications network, the apparatus comprising a processor adapted to:
- determine that a given speech packet has not been received prior to a time when said given speech packet is to be decoded for playout;
  
  replace said given speech packet with replacement speech data with use of a packet loss concealment technique;
  
  play out said replacement speech data in place of said given speech packet;
  
  receive said given speech packet at a time subsequent to said playing out of said replacement speech data;
  
  modify said given speech packet which has been received to generate a time scale modified version thereof, said time scale modified version of said given speech packet comprising speech having a reduced time length relative to said given speech packet; and
  
  play out said time scale modified version of said given speech packet after said replacement speech packet has been played out.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The apparatus of claim 11 wherein said determining that said given speech packet has not been received prior to the time when said given speech packet is to be decoded for playout comprises determining that a jitter buffer is empty at said time when said given speech packet is to be decoded for playout.
  - 13. The apparatus of claim 11 where said replacement speech data is generated based on a previous speech packet in said sequence of encoded speech packets.
  - 14. The apparatus of claim 13 wherein said packet loss concealment technique comprises replacing said given speech packet with a duplicate of an immediately previous speech packet in said sequence of encoded speech packets.
  - 15. The apparatus of claim 11 wherein said time scale modified version of said given speech packet is generated from said given speech packet with use of a pitch synchronous overlap add (PSOLA) technique.
  - 16. The apparatus of claim 11 wherein said given speech packet comprises a speech frame consisting of a plurality of sub-frames, and wherein said time scale modified version of said given speech packet is generated from said given speech packet by eliminating one or more of said plurality of sub-frames therefrom.
  - 17. The apparatus of claim 11 wherein said processor is further adapted to determine that said given speech packet which has been received at a time subsequent to said playing out of said replacement speech data has also been received at a time prior to a predetermined time limit after said time when said given speech packet was to be decoded for playout.
  - 18. The apparatus of claim 11 wherein said processor is further adapted to:
    - receive one or more speech packets subsequent to said given speech packet in said sequence of speech packets;
      
      modify a number of said subsequent speech packets to generate a corresponding time scale modified version thereof, said time scale modified version of each of said number of subsequent speech packets comprising speech having a reduced time length relative to said corresponding subsequent speech packet; and
      
      play out each of said number of said time scale modified versions of said subsequent speech packets after said time scale modified version of said given speech packet has been played out.
  - 19. The apparatus of claim 18 wherein said number has a fixed value such that after said number of said time scale modified versions of said subsequent speech packets have been played out, said sequence of encoded speech packets as received are synchronized with said playing out thereof.
  - 20. The apparatus of claim 11 wherein the speech received as a sequence of encoded speech packets over a packet-based communications network comprises Voice-over-IP.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WSOU Investments, LLC (WSOU Holdings, LLC)
Original Assignee
Alcatel-Lucent USA, Inc. (Nokia Corporation)
Inventors
Recchione, Michael Charles, Lee, Minkyu, McGowan, James William, Janiszewski, Thomas John

Granted Patent

US 7,783,482 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270
CPC Class Codes

G10L 19/005   Correction of errors induce...

G10L 21/0364   for improving intelligibility

G10L 21/04   Time compression or expansion

Method and apparatus for enhancing voice intelligibility in voice-over-IP network applications with late arriving packets

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

30 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for enhancing voice intelligibility in voice-over-IP network applications with late arriving packets

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links