Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding
First Claim
Patent Images
1. A method of processing frames of an audio signal, said method comprising:
- classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence;
encoding the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame;
encoding the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame,wherein the second frame is a generic audio frame, andwherein the second frame follows and is consecutive to the first frame in the audio signal, andwherein said encoding the first frame includes time-modifying, based on a time shift, a segment of a first signal that is based on the first frame, said time-modifying including one among (A) time-shifting the segment of the first frame according to the time shift and (B) time-warping the segment of the first signal based on the time shift, andwherein said time-modifying a segment of a first signal includes changing a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, andwherein said encoding the second frame includes time-modifying, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said time-modifying including one among (A) time-shifting the segment of the second frame according to the time shift and (B) time-warping the segment of the second signal based on the time shift; and
transmitting the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A time shift calculated during a pitch-regularizing (PR) encoding of a frame of an audio signal is used to time-shift a segment of another frame during a non-PR encoding.
-
Citations
73 Claims
-
1. A method of processing frames of an audio signal, said method comprising:
-
classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence; encoding the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame; encoding the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame, wherein the second frame is a generic audio frame, and wherein the second frame follows and is consecutive to the first frame in the audio signal, and wherein said encoding the first frame includes time-modifying, based on a time shift, a segment of a first signal that is based on the first frame, said time-modifying including one among (A) time-shifting the segment of the first frame according to the time shift and (B) time-warping the segment of the first signal based on the time shift, and wherein said time-modifying a segment of a first signal includes changing a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, and wherein said encoding the second frame includes time-modifying, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said time-modifying including one among (A) time-shifting the segment of the second frame according to the time shift and (B) time-warping the segment of the second signal based on the time shift; and transmitting the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 72, 73)
-
-
16. An apparatus for processing frames of an audio signal, said apparatus comprising:
-
means for classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence; means for encoding the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame; means for encoding the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame, wherein the second frame is a generic audio frame, and wherein the second frame follows and is consecutive to the first frame in the audio signal, and wherein said means for encoding the first frame includes means for time-modifying, based on a time shift, a segment of a first signal that is based on the first frame, said means for time-modifying being configured to perform one among (A) time-shifting the segment of the first frame according to the time shift and (B) time-warping the segment of the first signal based on the time shift, and wherein said means for time-modifying a segment of a first signal is configured to change a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, and wherein said means for encoding the second frame includes means for time-modifying, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said means for time-modifying being configured to perform one among (A) time-shifting the segment of the second frame according to the time shift and (B) time-warping the segment of the second signal based on the time shift; and means for transmitting the first encoded frame and the second encoded frame to a means for decoding having means for synthesizing the first encoded frame and the second encoded frame and means for outputting a synthesized audio signal. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
-
24. An apparatus for processing frames of an audio signal, said apparatus comprising:
-
a processor comprising a first frame encoder and a second frame encoder, wherein the processor is configured to classify each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence; the first frame encoder configured to encode the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame; the second frame encoder configured to encode the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame, wherein the second frame is a generic audio frame, and wherein the second frame follows and is consecutive to the first frame in the audio signal, and wherein said first frame encoder includes a first time modifier configured to time-modify, based on a time shift, a segment of a first signal that is based on the first frame, said first time modifier being configured to perform one among (A) time-shifting the segment of the first frame according to the time shift and (B) time-warping the segment of the first signal based on the time shift, and wherein said first time modifier is configured to change a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, and wherein said second frame encoder includes a second time modifier configured to time-modify, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said second time modifier being configured to perform one among (A) time-shifting the segment of the second frame according to the time shift and (B) time-warping the segment of the second signal based on the time shift; and a transmitter configured to transmit the first encoded frame and the second encoded frame to a decoder that is configured to synthesize the first encoded frame and the second encoded frame and output a synthesized audio signal. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31)
-
-
32. A non-transitory computer-readable medium comprising instructions which when executed by a processor cause the processor to:
-
classify each of a first frame of an audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence; encode the first frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a first encoded frame; encode the second frame of the audio signal according to a non-pitch-regularizing (non-PR) coding scheme to produce a second encoded frame, wherein the second frame is a generic audio frame, and wherein the second frame follows and is consecutive to the first frame in the audio signal, and wherein said instructions which when executed cause the processor to encode the first frame include instructions to time-modify, based on a time shift, a segment of a first signal that is based on the first frame, said instructions to time-modify including one among (A) instructions to time-shift the segment of the first frame according to the time shift and (B) instructions to time-warp the segment of the first signal based on the time shift, and wherein said instructions to time-modify a segment of a first signal include instructions to change a position of a pitch pulse of the segment relative to another pitch pulse of the first signal, and wherein said instructions which when executed cause the processor to encode the second frame include instructions to time-modify, based on the time shift, a segment of a second signal that is based on the second frame, wherein the time shift is applied to at least one sample of the segment of the second signal by a same shift value as at least one sample of the segment of the first signal, said instructions to time-modify including one among (A) instructions to time-shift the segment of the second frame according to the time shift and (B) instructions to time-warp the segment of the second signal based on the time shift; and transmit the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal.
-
-
33. A method of processing frames of an audio signal, said method comprising:
-
classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence; encoding the first frame of the audio signal according to a first coding scheme to produce a first encoded frame, wherein the first frame is a generic audio frame; encoding the second frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a second encoded frame, wherein the second frame follows and is consecutive to the first frame in the audio signal, and wherein the first coding scheme is a non-pitch-regularizing (non-PR) coding scheme, and wherein said encoding the first frame includes time-modifying, based on a first time shift, a segment of a first signal that is based on the first frame, wherein the first time shift is applied to at least one sample of the segment of the first signal by a same shift value as at least one sample of a segment of a signal of a preceding frame, said time-modifying including one among (A) time-shifting the segment of the first signal according to the first time shift and (B) time-warping the segment of the first signal based on the first time shift; and wherein said encoding the second frame includes time-modifying, based on a second time shift, a segment of a second signal that is based on the second frame, said time-modifying including one among (A) time-shifting the segment of the second signal according to the second time shift and (B) time-warping the segment of the second signal based on the second time shift, wherein said time-modifying a segment of a second signal includes changing a position of a pitch pulse of the segment relative to another pitch pulse of the second signal, and wherein the second time shift is based on information from the time-modified segment of the first signal; and transmitting the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)
-
-
51. An apparatus for processing frames of an audio signal, said apparatus comprising:
-
means for classifying each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence; means for encoding the first frame of the audio signal according to a first coding scheme to produce a first encoded frame, wherein the first frame is a generic audio frame; means for encoding the second frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a second encoded frame, wherein the second frame follows and is consecutive to the first frame in the audio signal, and wherein the first coding scheme is a non-pitch-regularizing (non-PR) coding scheme, and wherein said means for encoding the first frame includes means for time-modifying, based on a first time shift, a segment of a first signal that is based on the first frame, wherein the first time shift is applied to at least one sample of the segment of the first signal by a same shift value as at least one sample of a segment of a signal of a preceding frame, said means for time-modifying being configured to perform one among (A) time-shifting the segment of the first signal according to the first time shift and (B) time-warping the segment of the first signal based on the first time shift; and wherein said means for encoding the second frame includes means for time-modifying, based on a second time shift, a segment of a second signal that is based on the second frame, said means for time-modifying being configured to perform one among (A) time-shifting the segment of the second signal according to the second time shift and (B) time-warping the segment of the second signal based on the second time shift, wherein said means for time-modifying a segment of a second signal is configured to change a position of a pitch pulse of the segment relative to another pitch pulse of the second signal, and wherein the second time shift is based on information from the time-modified segment of the first signal; and means for transmitting the first encoded frame and the second encoded frame to a means for decoding having means for synthesizing the first encoded frame and the second encoded frame and means for outputting a synthesized audio signal. - View Dependent Claims (52, 53, 54, 55, 56, 57, 58, 59, 60)
-
-
61. An apparatus for processing frames of an audio signal, said apparatus comprising:
-
a processor comprising a first frame encoder and a second frame encoder, wherein the processor is configured to classify each of a first frame of the audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence; the first frame encoder configured to encode the first frame of the audio signal according to a first coding scheme to produce a first encoded frame, wherein the first frame is a generic audio frame; the second frame encoder configured to encode the second frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a second encoded frame, wherein the second frame follows and is consecutive to the first frame in the audio signal, and wherein the first coding scheme is a non-pitch-regularizing (non-PR) coding scheme, and wherein said first frame encoder includes a first time modifier configured to time-modify, based on a first time shift, a segment of a first signal that is based on the first frame, wherein the first time shift is applied to at least one sample of the segment of the first signal by a same shift value as at least one sample of a segment of a signal of a preceding frame, said first time modifier being configured to perform one among (A) time-shifting the segment of the first signal according to the first time shift and (B) time-warping the segment of the first signal based on the first time shift; and wherein said second frame encoder includes a second time modifier configured to time-modify, based on a second time shift, a segment of a second signal that is based on the second frame, said second time modifier being configured to perform one among (A) time-shifting the segment of the second signal according to the second time shift and (B) time-warping the segment of the second signal based on the second time shift, wherein said second time modifier is configured to change a position of a pitch pulse of the segment of a second signal relative to another pitch pulse of the second signal, and wherein the second time shift is based on information from the time-modified segment of the first signal; and a transmitter configured to transmit the first encoded frame and the second encoded frame to a decoder that is configured to synthesize the first encoded frame and the second encoded frame and output a synthesized audio signal. - View Dependent Claims (62, 63, 64, 65, 66, 67, 68, 69, 70)
-
-
71. A non-transitory computer-readable medium comprising instructions which when executed by a processor cause the processor to:
-
classify each of a first frame of an audio signal and a second frame of the audio signal as a frame type from a set of frame types comprising a voiced speech frame, an unvoiced speech frame, a transitional frame, a generic audio frame, and an inactive frame containing only one or more of background noise and silence; encode the first frame of the audio signal according to a first coding scheme to produce a first encoded frame, wherein the first frame is a generic audio frame; encode the second frame of the audio signal according to a relaxed code excited linear prediction (RCELP) coding scheme to produce a second encoded frame, wherein the second frame follows and is consecutive to the first frame in the audio signal, and wherein the first coding scheme is a non-pitch-regularizing (non-PR) coding scheme, and wherein said instructions which when executed by a processor cause the processor to encode the first frame include instructions to time-modify, based on a first time shift, a segment of a first signal that is based on the first frame, wherein the first time shift is applied to at least one sample of the segment of the first signal by a same shift value as at least one sample of a segment of a signal of a preceding frame, said instructions to time-modify including one among (A) instructions to time-shift the segment of the first signal according to the first time shift and (B) instructions to time-warp the segment of the first signal based on the first time shift; and wherein said instructions which when executed by a processor cause the processor to encode the second frame include instructions to time-modify, based on a second time shift, a segment of a second signal that is based on the second frame, said instructions to time-modify including one among (A) instructions to time-shift the segment of the second signal according to the second time shift and (B) instructions to time-warp the segment of the second signal based on the second time shift, wherein said instructions to time-modify a segment of a second signal include instructions to change a position of a pitch pulse of the segment relative to another pitch pulse of the second signal, and wherein the second time shift is based on information from the time-modified segment of the first signal; and transmit the first encoded frame and the second encoded frame to a decoder that synthesizes the first encoded frame and the second encoded frame and outputs a synthesized audio signal.
-
Specification