Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding

US 9,653,088 B2
Filed: 06/12/2008
Issued: 05/16/2017
Est. Priority Date: 06/13/2007
Status: Active Grant

First Claim

Patent Images

1. A method of processing frames of an audio signal, said method comprising:

classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;

encoding the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame;

encoding the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame,wherein the second frame is a generic audio frame, andwherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein said encoding the first frame includes time-modifying, based on a time shift, a segment of a first signal that is based on the first frame, said time-modifying including one among (A) time-shifting the segment of the first frame according to the time shift and (B) time-warping the segment of the first signal based on the time shift, andwherein said time-modifying a segment of a first signal includes changing a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, andwherein said encoding the second frame includes time-modifying, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said time-modifying including one among (A) time-shifting the segment of the second frame according to the time shift and (B) time-warping the segment of the second signal based on the time shift; and

transmitting the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A time shift calculated during a pitch-regularizing (PR) encoding of a frame of an audio signal is used to time-shift a segment of another frame during a non-PR encoding.

Citations

73 Claims

1. A method of processing frames of an audio signal, said method comprising:
- classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
  
  encoding the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame;
  
  encoding the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame,wherein the second frame is a generic audio frame, andwherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein said encoding the first frame includes time-modifying, based on a time shift, a segment of a first signal that is based on the first frame, said time-modifying including one among (A) time-shifting the segment of the first frame according to the time shift and (B) time-warping the segment of the first signal based on the time shift, andwherein said time-modifying a segment of a first signal includes changing a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, andwherein said encoding the second frame includes time-modifying, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said time-modifying including one among (A) time-shifting the segment of the second frame according to the time shift and (B) time-warping the segment of the second signal based on the time shift; and
  
  transmitting the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 72, 73)
- - 2. The method of claim 1, wherein said first encoded frame is based on the time-modified segment of the first signal, andwherein said second encoded frame is based on the time-modified segment of the second signal.
  - 3. The method of claim 1, wherein the first signal is a residual of the first frame, and wherein the second signal is a residual of the second frame.
  - 4. The method of claim 1, wherein the first and second signals are weighted audio signals.
  - 5. The method of claim 1, wherein said encoding the first frame includes calculating the time shift based on information from a residual of a third frame that precedes the first frame in the audio signal.
  - 6. The method of claim 5, wherein said calculating the time shift includes mapping samples of the residual of the third frame to a delay contour of the audio signal.
  - 7. The method of claim 6, wherein said encoding the first frame includes computing the delay contour based on information relating to a pitch period of the audio signal.
  - 8. The method of claim 1,wherein the non-PR coding scheme is one among (A) a noise-excited linear prediction coding scheme, (B) a modified discrete cosine transform coding scheme, and (C) a prototype waveform interpolation coding scheme.
  - 9. The method of claim 1, wherein the non-PR coding scheme is a modified discrete cosine transform coding scheme.
  - 10. The method according to claim 1, wherein said encoding the second frame includes:
    - performing a modified discrete cosine transform (MDCT) operation on a residual of the second frame to obtain an encoded residual; and
      
      performing an inverse MDCT operation on a signal that is based on the encoded residual to obtain a decoded residual,wherein the second signal is based on the decoded residual.
  - 11. The method according to claim 1, wherein said encoding the second frame includes:
    - generating a residual of the second frame, wherein the second signal is the generated residual;
      
      subsequent to said time-modifying a segment of the second signal, performing a modified discrete cosine transform operation on the generated residual, including the time-modified segment, to obtain an encoded residual; and
      
      producing the second encoded frame based on the encoded residual.
  - 12. The method of claim 1, wherein said method comprises time-shifting, according to the time shift, a segment of a residual of a frame that follows the second frame in the audio signal.
  - 13. The method of claim 1, wherein said method includes time-modifying, based on the time shift, a segment of a third signal that is based on a third frame of the audio signal which follows the second frame, andwherein said encoding the second frame includes performing a modified discrete cosine transform (MDCT) operation over a window that includes samples of the time-modified segments of the second and third signals.
  - 14. The method of claim 13, wherein the second signal has a length of M samples and the third signal has a length of M samples, andwherein said performing an MDCT operation includes producing a set of M MDCT coefficients that is based on (A) M samples of the second signal, including the time-modified segment, and (B) not more than 3M/4 samples of the third signal.
  - 15. The method of claim 13, wherein the second signal has a length of M samples and the third signal has a length of M samples, andwherein said performing an MDCT operation includes producing a set of M MDCT coefficients that is based on a sequence of 2M samples which (A) includes M samples of the second signal, including the time-modified segment, (B) begins with a sequence of at least M/8 samples of zero value, and (C) ends with a sequence of at least M/8 samples of zero value.
  - 72. The method of claim 1, wherein the second frame comprises music.
  - 73. The method of claim 1, wherein the time shift is computed based on the first frame and used to time-modify the first frame entirely.

16. An apparatus for processing frames of an audio signal, said apparatus comprising:
- means for classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
  
  means for encoding the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame;
  
  means for encoding the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame,wherein the second frame is a generic audio frame, andwherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein said means for encoding the first frame includes means for time-modifying, based on a time shift, a segment of a first signal that is based on the first frame, said means for time-modifying being configured to perform one among (A) time-shifting the segment of the first frame according to the time shift and (B) time-warping the segment of the first signal based on the time shift, andwherein said means for time-modifying a segment of a first signal is configured to change a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, andwherein said means for encoding the second frame includes means for time-modifying, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said means for time-modifying being configured to perform one among (A) time-shifting the segment of the second frame according to the time shift and (B) time-warping the segment of the second signal based on the time shift; and
  
  means for transmitting the first encoded frame and the second encoded frame to a means for decoding having means for synthesizing the first encoded frame and the second encoded frame and means for outputting a synthesized audio signal.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
- - 17. The apparatus of claim 16, wherein the first signal is a residual of the first frame, and wherein the second signal is a residual of the second frame.
  - 18. The apparatus of claim 16, wherein the first and second signals are weighted audio signals.
  - 19. The apparatus of claim 16, wherein said means for encoding the first frame includes means for calculating the time shift based on information from a residual of a third frame that precedes the first frame in the audio signal.
  - 20. The apparatus of claim 16, wherein said means for encoding the second frame includes:
    - means for generating a residual of the second frame, wherein the second signal is the generated residual; and
      
      means for performing a modified discrete cosine transform operation on the generated residual, including the time-modified segment, to obtain an encoded residual,wherein said means for encoding the second frame is configured to produce the second encoded frame based on the encoded residual.
  - 21. The apparatus of claim 16, wherein said means for time-modifying a segment of the second signal is configured to time-shift, according to the time shift, a segment of a residual of a frame that follows the second frame in the audio signal.
  - 22. The apparatus of claim 16, wherein said means for time-modifying a segment of a second signal is configured to time-modify, based on the time shift, a segment of a third signal that is based on a third frame of the audio signal which follows the second frame, andwherein said means for encoding the second frame includes means for performing a modified discrete cosine transform (MDCT) operation over a window that includes samples of the time-modified segments of the second and third signals.
  - 23. The apparatus of claim 22, wherein the second signal has a length of M samples and the third signal has a length of M samples, andwherein said means for performing an MDCT operation is configured to produce a set of M MDCT coefficients that is based on (A) M samples of the second signal, including the time-modified segment, and (B) not more than 3M/4 samples of the third signal.

24. An apparatus for processing frames of an audio signal, said apparatus comprising:
- a processor comprising a first frame encoder and a second frame encoder, wherein the processor is configured to classify each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
  
  the first frame encoder configured to encode the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame;
  
  the second frame encoder configured to encode the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame,wherein the second frame is a generic audio frame, andwherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein said first frame encoder includes a first time modifier configured to time-modify, based on a time shift, a segment of a first signal that is based on the first frame, said first time modifier being configured to perform one among (A) time-shifting the segment of the first frame according to the time shift and (B) time-warping the segment of the first signal based on the time shift, andwherein said first time modifier is configured to change a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, andwherein said second frame encoder includes a second time modifier configured to time-modify, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said second time modifier being configured to perform one among (A) time-shifting the segment of the second frame according to the time shift and (B) time-warping the segment of the second signal based on the time shift; and
  
  a transmitter configured to transmit the first encoded frame and the second encoded frame to a decoder that is configured to synthesize the first encoded frame and the second encoded frame and output a synthesized audio signal.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31)
- - 25. The apparatus of claim 24, wherein the first signal is a residual of the first frame, and wherein the second signal is a residual of the second frame.
  - 26. The apparatus of claim 24, wherein the first and second signals are weighted audio signals.
  - 27. The apparatus of claim 24, wherein said first frame encoder includes a time shift calculator configured to calculate the time shift based on information from a residual of a third frame that precedes the first frame in the audio signal.
  - 28. The apparatus of claim 24, wherein said second frame encoder includes:
    - a residual generator configured to generate a residual of the second frame, wherein the second signal is the generated residual; and
      
      a modified discrete cosine transform (MDCT) module configured to perform an MDCT operation on the generated residual, including the time-modified segment, to obtain an encoded residual,wherein said second frame encoder is configured to produce the second encoded frame based on the encoded residual.
  - 29. The apparatus of claim 24, wherein said second time modifier is configured to time-shift, according to the time shift, a segment of a residual of a frame that follows the second frame in the audio signal.
  - 30. The apparatus of claim 24, wherein said second time modifier is configured to time-modify, based on the time shift, a segment of a third signal that is based on a third frame of the audio signal which follows the second frame, andwherein said second frame encoder includes a modified discrete cosine transform (MDCT) module configured to perform an MDCT operation over a window that includes samples of the time-modified segments of the second and third signals.
  - 31. The apparatus of claim 30, wherein the second signal has a length of M samples and the third signal has a length of M samples, andwherein said MDCT module is configured to produce a set of M MDCT coefficients that is based on (A) M samples of the second signal, including the time-modified segment, and (B) not more than 3M/4 samples of the third signal.

32. A non-transitory computer-readable medium comprising instructions which when executed by a processor cause the processor to:
- classify each of a first frame of an audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
  
  encode the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame;
  
  encode the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame,wherein the second frame is a generic audio frame, andwherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein said instructions which when executed cause the processor to encode the first frame include instructions to time-modify, based on a time shift, a segment of a first signal that is based on the first frame, said instructions to time-modify including one among (A) instructions to time-shift the segment of the first frame according to the time shift and (B) instructions to time-warp the segment of the first signal based on the time shift, andwherein said instructions to time-modify a segment of a first signal include instructions to change a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, andwherein said instructions which when executed cause the processor to encode the second frame include instructions to time-modify, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said instructions to time-modify including one among (A) instructions to time-shift the segment of the second frame according to the time shift and (B) instructions to time-warp the segment of the second signal based on the time shift; and
  
  transmit the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal.

33. A method of processing frames of an audio signal, said method comprising:
- classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
  
  encoding the first frame of the audio signal according to a first coding scheme to produce a first encoded frame, wherein the first frame is a generic audio frame;
  
  encoding the second frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a second encoded frame,wherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein the first coding scheme is a non-pitch-regularizing (non-PR) coding scheme, andwherein said encoding the first frame includes time-modifying, based on a first time shift, a segment of a first signal that is based on the first frame, wherein the first time shift is applied to at least one sample of the segment of the first signal by a same shift value as at least one sample of a segment of a signal of a preceding frame, said time-modifying including one among (A) time-shifting the segment of the first signal according to the first time shift and (B) time-warping the segment of the first signal based on the first time shift; and
  
  wherein said encoding the second frame includes time-modifying, based on a second time shift, a segment of a second signal that is based on the second frame, said time-modifying including one among (A) time-shifting the segment of the second signal according to the second time shift and (B) time-warping the segment of the second signal based on the second time shift,wherein said time-modifying a segment of a second signal includes changing a position of a pitch pulse of the segment relative to another pitch pulse of the second signal, andwherein the second time shift is based on information from the time-modified segment of the first signal; and
  
  transmitting the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
- - 34. The method of claim 33, wherein said first encoded frame is based on the time-modified segment of the first signal, andwherein said second encoded frame is based on the time-modified segment of the second signal.
  - 35. The method of claim 33, wherein the first signal is a residual of the first frame, and wherein the second signal is a residual of the second frame.
  - 36. The method of claim 33, wherein the first and second signals are weighted audio signals.
  - 37. The method according to claim 33, wherein said time-modifying a segment of the second signal includes calculating the second time shift based on information from the time-modified segment of the first signal, andwherein said calculating the second time shift includes mapping the time-modified segment of the first signal to a delay contour that is based on information from the second frame.
  - 38. The method according to claim 37, wherein said second time shift is based on a correlation between samples of the mapped segment and samples of a temporary modified residual, andwherein the temporary modified residual is based on (A) samples of a residual of the second frame and (B) the first time shift.
  - 39. The method according to claim 33, wherein the second signal is a residual of the second frame, andwherein said time-modifying a segment of the second signal includes time-shifting a first segment of the residual according to the second time shift, andwherein said method comprises:
    - calculating a third time shift that is different than the second time shift, based on information from the time-modified segment of the first signal; and
      
      time-shifting a second segment of the residual according to the third time shift.
  - 40. The method according to claim 33, wherein the second signal is a residual of the second frame, andwherein said time-modifying a segment of the second signal includes time-shifting a first segment of the residual according to the second time shift, andwherein said method comprises:
    - calculating a third time shift that is different than the second time shift, based on information from the time-modified first segment of the residual; and
      
      time-shifting a second segment of the residual according to the third time shift.
  - 41. The method according to claim 33, wherein said time-modifying a segment of the second signal includes mapping samples of the time-modified segment of the first signal to a delay contour that is based on information from the second frame.
  - 42. The method according to claim 33, wherein said method comprises:
    - storing a sequence based on the time-modified segment of the first signal to an adaptive codebook buffer; and
      
      subsequent to said storing, mapping samples of the adaptive codebook buffer to a delay contour that is based on information from the second frame.
  - 43. The method according to claim 33, wherein the second signal is a residual of the second frame, and wherein said time-modifying a segment of the second signal includes time-warping the residual of the second frame, andwherein said method comprises time-warping a residual of a third frame of the audio signal based on information from the time-warped residual of the second frame, wherein the third frame is consecutive to the second frame in the audio signal.
  - 44. The method according to claim 33, wherein the second signal is a residual of the second frame, and wherein said time-modifying a segment of the second signal includes calculating the second time shift based on (A) information from the time-modified segment of the first signal and (B) information from the residual of the second frame.
  - 45. The method of claim 33, wherein the non-PR coding scheme is one among (A) a noise-excited linear prediction coding scheme, (B) a modified discrete cosine transform coding scheme, and (C) a prototype waveform interpolation coding scheme.
  - 46. The method of claim 33, wherein the non-PR coding scheme is a modified discrete cosine transform coding scheme.
  - 47. The method according to claim 33, wherein said encoding the first frame includes:
    - performing a modified discrete cosine transform (MDCT) operation on a residual of the first frame to obtain an encoded residual; and
      
      performing an inverse MDCT operation on a signal that is based on the encoded residual to obtain a decoded residual,wherein the first signal is based on the decoded residual.
  - 48. The method according to claim 33, wherein said encoding the first frame includes:
    - generating a residual of the first frame, wherein the first signal is the generated residual;
      
      subsequent to said time-modifying a segment of the first signal, performing a modified discrete cosine transform operation on the generated residual, including the time-modified segment, to obtain an encoded residual; and
      
      producing the first encoded frame based on the encoded residual.
  - 49. The method according to claim 33, wherein the first signal has a length of M samples and the second signal has a length of M samples, andwherein said encoding the first frame includes producing a set of M modified discrete cosine transform (MDCT) coefficients that is based on M samples of the first signal, including the time-modified segment, and not more than 3M/4 samples of the second signal.
  - 50. The method according to claim 33, wherein the first signal has a length of M samples and the second signal has a length of M samples, andwherein said encoding the first frame includes producing a set of M modified discrete cosine transform (MDCT) coefficients that is based on a sequence of 2M samples which (A) includes M samples of the first signal, including the time-modified segment, (B) begins with a sequence of at least M/8 samples of zero value, and (C) ends with a sequence of at least M/8 samples of zero value.

51. An apparatus for processing frames of an audio signal, said apparatus comprising:
- means for classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
  
  means for encoding the first frame of the audio signal according to a first coding scheme to produce a first encoded frame, wherein the first frame is a generic audio frame;
  
  means for encoding the second frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a second encoded frame,wherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein the first coding scheme is a non-pitch-regularizing (non-PR) coding scheme, andwherein said means for encoding the first frame includes means for time-modifying, based on a first time shift, a segment of a first signal that is based on the first frame, wherein the first time shift is applied to at least one sample of the segment of the first signal by a same shift value as at least one sample of a segment of a signal of a preceding frame, said means for time-modifying being configured to perform one among (A) time-shifting the segment of the first signal according to the first time shift and (B) time-warping the segment of the first signal based on the first time shift; and
  
  wherein said means for encoding the second frame includes means for time-modifying, based on a second time shift, a segment of a second signal that is based on the second frame, said means for time-modifying being configured to perform one among (A) time-shifting the segment of the second signal according to the second time shift and (B) time-warping the segment of the second signal based on the second time shift,wherein said means for time-modifying a segment of a second signal is configured to change a position of a pitch pulse of the segment relative to another pitch pulse of the second signal, andwherein the second time shift is based on information from the time-modified segment of the first signal; and
  
  means for transmitting the first encoded frame and the second encoded frame to a means for decoding having means for synthesizing the first encoded frame and the second encoded frame and means for outputting a synthesized audio signal.
- View Dependent Claims (52, 53, 54, 55, 56, 57, 58, 59, 60)
- - 52. The apparatus of claim 51, wherein the first signal is a residual of the first frame, and wherein the second signal is a residual of the second frame.
  - 53. The apparatus of claim 51, wherein the first and second signals are weighted audio signals.
  - 54. The apparatus according to claim 51, wherein said means for time-modifying a segment of the second signal includes means for calculating the second time shift based on information from the time-modified segment of the first signal, andwherein said means for calculating the second time shift includes means for mapping the time-modified segment of the first signal to a delay contour that is based on information from the second frame.
  - 55. The apparatus according to claim 54, wherein said second time shift is based on a correlation between samples of the mapped segment and samples of a temporary modified residual, andwherein the temporary modified residual is based on (A) samples of a residual of the second frame and (B) the first time shift.
  - 56. The apparatus according to claim 51, wherein the second signal is a residual of the second frame, andwherein said means for time-modifying a segment of the second signal is configured to time-shift a first segment of the residual according to the second time shift, andwherein said apparatus comprises:
    - means for calculating a third time shift that is different than the second time shift, based on information from the time-modified first segment of the residual; and
      
      means for time-shifting a second segment of the residual according to the third time shift.
  - 57. The apparatus according to claim 51, wherein the second signal is a residual of the second frame, and wherein said means for time-modifying a segment of the second signal includes means for calculating the second time shift based on (A) information from the time-modified segment of the first signal and (B) information from the residual of the second frame.
  - 58. The apparatus according to claim 51, wherein said means for encoding the first frame includes:
    - means for generating a residual of the first frame, wherein the first signal is the generated residual; and
      
      means for performing a modified discrete cosine transform operation on the generated residual, including the time-modified segment, to obtain an encoded residual, andwherein said means for encoding the first frame is configured to produce the first encoded frame based on the encoded residual.
  - 59. The apparatus according to claim 51, wherein the first signal has a length of M samples and the second signal has a length of M samples, andwherein said means for encoding the first frame includes means for producing a set of M modified discrete cosine transform (MDCT) coefficients that is based on M samples of the first signal, including the time-modified segment, and not more than 3M/4 samples of the second signal.
  - 60. The apparatus according to claim 51, wherein the first signal has a length of M samples and the second signal has a length of M samples, andwherein said means for encoding the first frame includes means for producing a set of M modified discrete cosine transform (MDCT) coefficients that is based on a sequence of 2M samples which (A) includes M samples of the first signal, including the time-modified segment, (B) begins with a sequence of at least M/8 samples of zero value, and (C) ends with a sequence of at least M/8 samples of zero value.

61. An apparatus for processing frames of an audio signal, said apparatus comprising:
- a processor comprising a first frame encoder and a second frame encoder, wherein the processor is configured to classify each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
  
  the first frame encoder configured to encode the first frame of the audio signal according to a first coding scheme to produce a first encoded frame, wherein the first frame is a generic audio frame;
  
  the second frame encoder configured to encode the second frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a second encoded frame,wherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein the first coding scheme is a non-pitch-regularizing (non-PR) coding scheme, andwherein said first frame encoder includes a first time modifier configured to time-modify, based on a first time shift, a segment of a first signal that is based on the first frame, wherein the first time shift is applied to at least one sample of the segment of the first signal by a same shift value as at least one sample of a segment of a signal of a preceding frame, said first time modifier being configured to perform one among (A) time-shifting the segment of the first signal according to the first time shift and (B) time-warping the segment of the first signal based on the first time shift; and
  
  wherein said second frame encoder includes a second time modifier configured to time-modify, based on a second time shift, a segment of a second signal that is based on the second frame, said second time modifier being configured to perform one among (A) time-shifting the segment of the second signal according to the second time shift and (B) time-warping the segment of the second signal based on the second time shift,wherein said second time modifier is configured to change a position of a pitch pulse of the segment of a second signal relative to another pitch pulse of the second signal, andwherein the second time shift is based on information from the time-modified segment of the first signal; and
  
  a transmitter configured to transmit the first encoded frame and the second encoded frame to a decoder that is configured to synthesize the first encoded frame and the second encoded frame and output a synthesized audio signal.
- View Dependent Claims (62, 63, 64, 65, 66, 67, 68, 69, 70)
- - 62. The apparatus of claim 61, wherein the first signal is a residual of the first frame, and wherein the second signal is a residual of the second frame.
  - 63. The apparatus of claim 61, wherein the first and second signals are weighted audio signals.
  - 64. The apparatus according to claim 61, wherein said second time modifier includes a time shift calculator configured to calculate the second time shift based on information from the time-modified segment of the first signal, andwherein said time shift calculator includes a mapper configured to map the time-modified segment of the first signal to a delay contour that is based on information from the second frame.
  - 65. The apparatus according to claim 64, wherein said second time shift is based on a correlation between samples of the mapped segment and samples of a temporary modified residual, andwherein the temporary modified residual is based on (A) samples of a residual of the second frame and (B) the first time shift.
  - 66. The apparatus according to claim 61, wherein the second signal is a residual of the second frame, andwherein said second time modifier is configured to time-shift a first segment of the residual according to the second time shift, andwherein said apparatus further comprises a time shift calculator, wherein said time shift calculator is configured to calculate a third time shift that is different than the second time shift, based on information from the time-modified first segment of the residual, andwherein said apparatus further comprises a second time shifter, wherein said second time shifter is configured to time-shift a second segment of the residual according to the third time shift.
  - 67. The apparatus according to claim 61, wherein the second signal is a residual of the second frame, and wherein said second time modifier includes a time shift calculator configured to calculate the second time shift based on (A) information from the time-modified segment of the first signal and (B) information from the residual of the second frame.
  - 68. The apparatus according to claim 61, wherein said first frame encoder includes:
    - a residual generator configured to generate a residual of the first frame, wherein the first signal is the generated residual; and
      
      a modified discrete cosine transform (MDCT) module configured to perform an MDCT operation on the generated residual, including the time-modified segment, to obtain an encoded residual, andwherein said first frame encoder is configured to produce the first encoded frame based on the encoded residual.
  - 69. The apparatus according to claim 61, wherein the first signal has a length of M samples and the second signal has a length of M samples, andwherein said first frame encoder includes a modified discrete cosine transform (MDCT) module configured to produce a set of M MDCT coefficients that is based on M samples of the first signal, including the time-modified segment, and not more than 3M/4 samples of the second signal.
  - 70. The apparatus according to claim 61, wherein the first signal has a length of M samples and the second signal has a length of M samples, andwherein said first frame encoder includes a modified discrete cosine transform (MDCT) module configured to produce a set of M MDCT coefficients that is based on a sequence of 2M samples which (A) includes M samples of the first signal, including the time-modified segment, (B) begins with a sequence of at least M/8 samples of zero value, and (C) ends with a sequence of at least M/8 samples of zero value.

71. A non-transitory computer-readable medium comprising instructions which when executed by a processor cause the processor to:
- classify each of a first frame of an audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
  
  encode the first frame of the audio signal according to a first coding scheme to produce a first encoded frame, wherein the first frame is a generic audio frame;
  
  encode the second frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a second encoded frame,wherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein the first coding scheme is a non-pitch-regularizing (non-PR) coding scheme, andwherein said instructions which when executed by a processor cause the processor to encode the first frame include instructions to time-modify, based on a first time shift, a segment of a first signal that is based on the first frame, wherein the first time shift is applied to at least one sample of the segment of the first signal by a same shift value as at least one sample of a segment of a signal of a preceding frame, said instructions to time-modify including one among (A) instructions to time-shift the segment of the first signal according to the first time shift and (B) instructions to time-warp the segment of the first signal based on the first time shift; and
  
  wherein said instructions which when executed by a processor cause the processor to encode the second frame include instructions to time-modify, based on a second time shift, a segment of a second signal that is based on the second frame, said instructions to time-modify including one among (A) instructions to time-shift the segment of the second signal according to the second time shift and (B) instructions to time-warp the segment of the second signal based on the second time shift,wherein said instructions to time-modify a segment of a second signal include instructions to change a position of a pitch pulse of the segment relative to another pitch pulse of the second signal, andwherein the second time shift is based on information from the time-modified segment of the first signal; and
  
  transmit the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Krishnan, Venkatesh, Rajendran, Vivek, Kandhadai, Ananthapadmanabhan A.
Primary Examiner(s)
Serrou, Abdelali

Application Number

US12/137,700
Publication Number

US 20080312914A1
Time in Patent Office

3,260 Days
Field of Search

704214, 704 16, 704500-504, 704208
US Class Current
CPC Class Codes

G10L 19/022   Blocking, i.e. grouping of ...

G10L 19/08   Determination or coding of ...

G10L 19/18   Vocoders using multiple modes

Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

73 Claims

Specification

Solutions

Use Cases

Quick Links

Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

73 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links