Robust real-time speech codec

US 20050228651A1
Filed: 03/31/2004
Published: 10/13/2005
Est. Priority Date: 03/31/2004
Status: Active Grant

First Claim

Patent Images

1. In an audio processing tool, a method comprising:

processing plural frames for an audio signal, wherein the plural frames include a mix of one or more intra frames and one or more predicted frames, wherein at least one of the one or more predicted frames uses long-term prediction from outside of the predicted frame, and wherein each of the one or more intra frames uses no long-term prediction from outside of the intra frame; and

outputting a result.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various strategies for rate/quality control and loss resiliency in an audio codec are described. The various strategies can be used in combination or independently. For example, a real-time speech codec uses intra frame coding/decoding, adaptive multi-mode forward error correction [“FEC”], and rate/quality control techniques. Intra frames help a decoder recover quickly from packet losses, while compression efficiency is still emphasized with predicted frames. Various strategies for inserting intra frames and signaling intra/predicted frames are described. With the adaptive multi-mode FEC, an encoder adaptively selects between multiple modes to efficiently and quickly provide a level of FEC that takes into account the bandwidth currently available for FEC. The FEC information itself may be predictively encoded and decoded relative to primary encoded information. Various rate/quality and FEC control strategies allow additional adaptation to available bandwidth and network conditions.

169 Citations

View as Search Results

70 Claims

1. In an audio processing tool, a method comprising:
- processing plural frames for an audio signal, wherein the plural frames include a mix of one or more intra frames and one or more predicted frames, wherein at least one of the one or more predicted frames uses long-term prediction from outside of the predicted frame, and wherein each of the one or more intra frames uses no long-term prediction from outside of the intra frame; and
  
  outputting a result.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1 wherein the audio processing tool is a real-time speech encoder that uses linear prediction and the result is encoded speech.
  - 3. The method of claim 1 further comprising adjusting intra frame usage during encoding to change intra frame rate and/or locations.
  - 4. The method of claim 1 wherein the audio processing tool is a real-time speech decoder that uses linear prediction and the result is reconstructed speech.
  - 5. The method of claim 1 wherein at least one of the one or more intra frames uses short-term prediction from outside of the intra frame in linear prediction filtering.
  - 6. The method of claim 1 wherein the processing comprises, for each of the one or more intra frames, reconstructing an excitation using one or more excitation codebook index values but no pitch values introducing long-term prediction.
  - 7. The method of claim 1 wherein the one or more predicted frames each include plural predicted sub-frames and no intra sub-frames.
  - 8. The method of claim 1 wherein the one or more intra frames each include plural intra sub-frames and no predicted sub-frames.
  - 9. The method of claim 1 wherein at least one of the intra frames includes at least one intra sub-frame and at least one predicted sub-frame that uses prediction within the intra frame.
  - 10. The method of claim 1 wherein each of the one or more intra frames and the one or more predicted frames are sub-classes of voiced frames.
  - 11. The method of claim 1 wherein grouping of plural consecutive intra frames prevents prediction over the intra frame grouping.
  - 12. The method of claim 1 further comprising, for the one or more predicted frames but not the one or more intra frames, interpolating linear prediction coefficient information across frames.
  - 13. The method of claim 12 wherein the information comprises LSP values.
  - 14. The method of claim 1 wherein frame-level type signaling information differentiates the one or more intra frames from the one or more predicted frames.
  - 15. The method of claim 1 wherein each of the plural frames is encapsulated in a single packet.
  - 16. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 1.

17. In an audio encoder, a method of encoding plural frames for an audio signal, the method comprising:
- encoding plural predicted frames of the plural frames; and
  
  encoding plural intra frames of the plural frames, wherein the encoder sets intra frame usage and inserts the plural intra frames among the plural predicted frames according to the intra frame usage.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The method of claim 17 wherein the encoder sets an intra frame rate indicating a regular interval between intra frames.
  - 19. The method of claim 18 wherein the encoder adjusts the intra frame rate during encoding based at least in part on network loss rate and/or decoder loss rate.
  - 20. The method of claim 17 wherein the encoder selectively inserts intra frames to be quick recovery locations.
  - 21. The method of claim 17 wherein the audio encoder is a real-time speech encoder that uses linear prediction.
  - 22. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 17.

23. In an audio decoder, a method comprising:
- decoding plural frames for an audio signal, wherein the plural frames include one or more intra frames and one or more predicted frames, and wherein frame-level type signaling information differentiates the one or more intra frames from the one or more predicted frames in a bitstream; and
  
  outputting decoded information.
- View Dependent Claims (24, 25, 26, 27)
- - 24. The method of claim 23 wherein the frame-level type signaling information is a single bit per frame.
  - 25. The method of claim 23 wherein the frame-level type signaling information uses multiple bits per frame.
  - 26. The method of claim 23 wherein the audio decoder is a real-time speech decoder that uses linear prediction.
  - 27. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 23.

28. In a speech processing tool, a method comprising:
- processing a frame for a speech signal, including processing primary encoded information for the frame and one or more versions of forward error correction information for the frame, wherein the primary encoded information comprises plural parameter values, and wherein each of the one or more versions of forward error correction information comprises a subset of the plural parameter values selected based at least in part on an estimate of extra available bits; and
  
  outputting a result.
- View Dependent Claims (29, 30, 31, 32, 33, 34)
- - 29. The method of claim 28 wherein the subset is also selected based at least in part on network loss rate or decoder loss rate
  - 30. The method of claim 28 wherein the subset is also selected based at least in part on frame class.
  - 31. The method of claim 28 wherein the primary encoded information is packed into a single packet with forward error correction information for a preceding frame.
  - 32. The method of claim 28 wherein the speech processing tool is a real-time speech encoder that uses linear prediction, wherein the result is encoded speech, and wherein the plural parameter values are plural linear prediction parameter values.
  - 33. The method of claim 28 wherein the speech processing tool is a real-time speech decoder that uses linear prediction, wherein the result is reconstructed speech, and wherein the plural parameter values are plural linear prediction parameter values.
  - 34. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 28.

35. In a speech processing tool, a method comprising:
- processing a frame for a speech signal, including processing primary encoded information for the frame and plural versions of forward error correction information for the frame, wherein the primary encoded information comprises plural parameter values, and wherein each of the plural versions of forward error correction information comprises a different subset of the plural parameter values for the frame; and
  
  outputting a result.
- View Dependent Claims (36, 37, 38, 39)
- - 36. The method of claim 35 wherein each of the plural versions of forward error correction information is packed into a different packet for network transmission.
  - 37. The method of claim 35 wherein the speech processing tool is a real-time speech encoder that uses linear prediction, wherein the result is encoded speech, and wherein the plural parameter values are plural linear prediction parameter values.
  - 38. The method of claim 35 wherein the speech processing tool is a real-time speech decoder that uses linear prediction, wherein the result is reconstructed speech, and wherein the plural parameter values are plural linear prediction parameter values.
  - 39. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 35.

40. In an audio processing tool, a method comprising:
- processing encoded information for an audio signal, wherein the encoded information includes forward error correction information for a first frame and primary encoded information for a second frame, and wherein at least some of the forward error correction information for the first frame is predictively encoded relative to the primary encoded information for the second frame; and
  
  outputting a result.
- View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48)
- - 41. The method of claim 40 wherein a single packet includes the forward error correction information for the first frame and the primary encoded information for the second frame.
  - 42. The method of claim 41 wherein the single packet further includes forward error correction information for one or more other frames.
  - 43. The method of claim 40 wherein the second frame is the current frame and the first frame is a preceding frame.
  - 44. The method of claim 40 wherein the forward error correction information for the first frame comprises linear prediction coefficient information predicted from corresponding coefficient information for the second frame.
  - 45. The method of claim 44 wherein the forward error correction information for the first frame comprises one or more excitation parameters predicted from corresponding excitation parameters for the second frame.
  - 46. The method of claim 40 wherein the audio processing tool is a real-time speech encoder and the result is encoded speech.
  - 47. The method of claim 40 wherein the audio processing tool is a real-time speech decoder and the result is reconstructed speech.
  - 48. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 40.

49. In a real-time speech encoder that uses linear prediction, a method comprising:
- encoding a speech signal as plural linear prediction parameters, including adjusting bitrate and quality for a current frame of the speech signal based at least in part on (a) complexity of the current frame, (b) complexity and/or rate of at least some surrounding segments of the speech signal, (c) desired operating rate, (d) currently available network bandwidth, and (e) current network congestion or noise conditions or decoder feedback; and
  
  outputting encoded speech.
- View Dependent Claims (50, 51)
- - 50. The method of claim 49 wherein the encoder performs the adjusting bitrate and quality on a frame-by-frame basis.
  - 51. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 49.

52. In an encoder-side audio processing tool, a method of encoding one or more frames of an audio signal, the method comprising:
- estimating a number of extra available bits for a segment of the audio signal after basic encoding; and
  
  using at least some of the extra available bits for adaptive forward error correction.
- View Dependent Claims (53, 54, 55, 56, 57, 58, 59)
- - 53. The method of claim 52 wherein the audio processing tool is adapted to encode speech in real-time using linear prediction.
  - 54. The method of claim 52 wherein the estimating is based at least in part upon currently available network bandwidth and complexity of the audio signal.
  - 55. The method of claim 52 further comprising:
    - setting an allocation between forward error correction improvement and primary encoding quality improvement; and
      
      applying the extra available bits according to the allocation for the forward error correction and/or primary encoding.
  - 56. The method of claim 55 wherein the encoder adjusts the allocation during encoding is based at least in part on network loss rate or decoder loss rate.
  - 57. The method of claim 55 wherein the audio processing tool improves error resiliency by increasing intra frame usage.
  - 58. The method of claim 55 wherein the audio processing tool improves forward error correction using fuller parameter sets for forward error correction information.
  - 59. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 52.

60. In a real-time speech encoder that uses linear prediction, a method comprising:
- encoding a speech signal as plural linear prediction parameters, including adjusting bitrate and quality for a current segment of the speech signal based at least in part on a quality smoothness criteria for a transition between a previous segment and the current segment; and
  
  outputting encoded speech.
- View Dependent Claims (61, 62, 63)
- - 61. The method of claim 60 wherein the adjustment is also based at least in part on desired operating rate, speech class, complexity of the current segment, and complexity or bitrate of the previous segment.
  - 62. The method of claim 61 wherein the adjustment is also based at least in part on network bandwidth, network loss rate, or decoder loss rate.
  - 63. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 60.

64. In an audio processing tool, a method comprising:
- processing a frame for an audio signal, including processing first information that represents the frame as a predicted frame or intra frame, and further including processing second information that represents the frame as an intra frame; and
  
  outputting a result.
- View Dependent Claims (65, 66, 67, 68, 69, 70)
- - 65. The method of claim 64 wherein the first information is for primary encoding and the second information is for forward error correction.
  - 66. The method of claim 64 wherein the second information is for primary encoding and the first information is for forward error correction.
  - 67. The method of claim 64 wherein the predicted frame representation uses long-term prediction from outside of the frame, and wherein the intra frame representation uses no long-term prediction from outside of the frame.
  - 68. The method of claim 64 wherein the audio processing tool is a real-time speech encoder that uses linear prediction, and wherein the result is encoded speech.
  - 69. The method of claim 64 wherein the audio processing tool is a real-time speech decoder that uses linear prediction, and wherein the result is reconstructed speech.
  - 70. A computer-readable medium storing computer-executable instructions for causing a computer system programmed thereby to perform the method of claim 64.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Khalil, Hosam A., Chen, Wei-Ge, Koishida, Kazuhito, Wang, Tian, Han, Mu

Granted Patent

US 7,668,712 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/207
CPC Class Codes

G10L 19/005   Correction of errors induce...

G10L 19/08   Determination or coding of ...

G10L 19/22   Mode decision, i.e. based o...

Robust real-time speech codec

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

169 Citations

70 Claims

Specification

Solutions

Use Cases

Quick Links

Robust real-time speech codec

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

169 Citations

70 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links