Method and device for efficient frame erasure concealment in linear predictive based speech codecs
First Claim
1. A method of concealing frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder to a decoder, comprising:
- determining, in the encoder, concealment/recovery parameters;
transmitting to the decoder concealment/recovery parameters determined in the encoder; and
in the decoder, conducting frame erasure concealment and decoder recovery in response to the received concealment/recovery parameters.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a method and device for improving concealment of frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder (106) to a decoder (110), and for accelerating recovery of the decoder after non erased frames of the encoded sound signal have been received. For that purpose, concealment/recovery parameters are determined in the encoder or decoder. When determined in the encoder (106), the concealment/recovery parameters are transmitted to the decoder (110). In the decoder, erasure frame concealment and decoder recovery is conducted in response to the concealment/recovery parameters. The concealment/recovery parameters may be selected from the group consisting of: a signal classification parameter, an energy information parameter and a phase information parameter. The determination of the concealment/recovery parameters comprises classifying the successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset, and this classification is determined on the basis of at least a part of the following parameters: a normalized correlation parameter, a spectral tilt parameter, a signal-to-noise ratio parameter, a pitch stability parameter, a relative frame energy parameter, and a zero crossing parameter.
258 Citations
177 Claims
-
1. A method of concealing frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder to a decoder, comprising:
-
determining, in the encoder, concealment/recovery parameters;
transmitting to the decoder concealment/recovery parameters determined in the encoder; and
in the decoder, conducting frame erasure concealment and decoder recovery in response to the received concealment/recovery parameters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
2. A method as defined in claim 1, further comprising quantizing, in the encoder, the concealment/recovery parameters prior to transmitting said concealment/recovery parameters to the decoder.
-
3. A method as defined in claim 1, wherein the concealment/recovery parameters are selected from the group consisting of:
- a signal classification parameter, an energy information parameter and a phase information parameter.
-
4. A method as defined in claim 3, wherein determination of the phase information parameter comprises determining a position of a first glottal pulse in a frame of the encoded sound signal.
-
5. A method as defined in claim 1, wherein conducting frame erasure concealment and decoder recovery comprises conducting decoder recovery in response to a determined position of a first glottal pulse after at least one lost voice onset.
-
6. A method as defined in claim 1, wherein conducting frame erasure concealment and decoder recovery comprises, when at least one onset frame is lost, constructing a periodic excitation part artificially as a low-pass filtered periodic train of pulses separated by a pitch period.
-
7. A method as defined in claim 6, wherein:
-
the method comprises quantizing the position of the first glottal pulse prior to transmission of said position of the first glottal pulse to the decoder; and
constructing a periodic excitation part comprises realizing the low-pass filtered periodic train of pulses by;
centering a first impulse response of a low-pass filter on the quantized position of the first glottal pulse with respect to the beginning of a frame; and
placing remaining impulse responses of the low-pass filter each with a distance corresponding to an average pitch value from the preceding impulse response up to the end of a last subframe affected by the artificial construction.
-
-
8. A method as defined in claim 4, wherein determination of the phase information parameter further comprises encoding, in the encoder, the shape, sign and amplitude of the first glottal pulse and transmitting the encoded shape, sign and amplitude from the encoder to the decoder.
-
9. A method as defined in claim 4, wherein determining the position of the first glottal pulse comprises:
-
measuring the first glottal pulse as a sample of maximum amplitude within a pitch period; and
quantizing the position of the sample of maximum amplitude within the pitch period.
-
-
10. A method as defined in claim 1, wherein:
-
the sound signal is a speech signal; and
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset.
-
-
11. A method as defined in claim 10, wherein classifying the successive frames comprises classifying as unvoiced every frame which is an unvoiced frame, every frame without active speech, and every voiced offset frame having an end tending to be unvoiced.
-
12. A method as defined in claim 10, wherein classifying the successive frames comprises classifying as unvoiced transition every unvoiced frame having an end with a possible voiced onset which is too short or not built well enough to be processed as a voiced frame.
-
13. A method as defined in claim 10, wherein classifying the successive frames comprises classifying as voiced transition every voiced frame with relatively weak voiced characteristics, including voiced frames with rapidly changing characteristics and voiced offsets lasting the whole frame, wherein a frame classified as voiced transition follows only frames classified as voiced transition, voiced or onset.
-
14. A method as defined in claim 10, wherein classifying the successive frames comprises classifying as voiced every voiced frames with stable characteristics, wherein a frame classified as voiced follows only frames classified as voiced transition, voiced or onset.
-
15. A method as defined in claim 10, wherein classifying the successive frames comprises classifying as onset every voiced frame with stable characteristics following a frame classified as unvoiced or unvoiced transition.
-
16. A method as defined in claim 10, comprising determining the classification of the successive frames of the encoded sound signal on the basis of at least a part of the following parameters:
- a normalized correlation parameter, a spectral tilt parameter, a signal-to-noise ratio parameter, a pitch stability parameter, a relative frame energy parameter, and a zero crossing parameter.
-
17. A method as defined in claim 16, wherein determining the classification of the successive frames comprises:
-
computing a figure of merit on the basis of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter; and
comparing the figure of merit to thresholds to determine the classification.
-
-
18. A method as defined in claim 16, comprising calculating the normalized correlation parameter on the basis of a current weighted version of the speech signal and a past weighted version of said speech signal.
-
19. A method as defined in claim 16, comprising estimating the spectral tilt parameter as a ratio between an energy concentrated in low frequencies and an energy concentrated in high frequencies.
-
20. A method as defined in claim 16, comprising estimating the signal-to-noise ratio parameter as a ratio between an energy of a weighted version of the speech signal of a current frame and an energy of an error between said weighted version of the speech signal of the current frame and a weighted version of a synthesized speech signal of said current frame.
-
21. A method as defined in claim 16, comprising computing the pitch stability parameter in response to open-loop pitch estimates for a first half of a current frame, a second half of the current frame and a look-ahead.
-
22. A method as defined in claim 16, comprising computing the relative frame energy parameter as a difference between an energy of a current frame and a long-term average of an energy of active speech frames.
-
23. A method as defined in claim 16, comprising determining the zero-crossing parameter as a number of times a sign of the speech signal changes from a first polarity to a second polarity.
-
24. A method as defined in claim 16, comprising computing at least one of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter using an available look-ahead to take into consideration the behavior of the speech signal in the following frame.
-
25. A method as defined in claim 16, further comprising determining the classification of the successive frames of the encoded sound signal also on the basis of a voice activity detection flag.
-
26. A method as defined in claim 3 wherein:
-
the sound signal is a speech signal;
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
determining concealment/recovery parameters comprises calculating the energy information parameter in relation to a maximum of a signal energy for frames classified as voiced or onset, and calculating the energy information parameter in relation to an average energy per sample for other frames.
-
-
27. A method as defined in claim 1, wherein determining, in the encoder, concealment/recovery parameters comprises computing a voicing information parameter.
-
28. A method as defined in claim 27, wherein:
-
the sound signal is a speech signal;
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal;
said method comprises determining the classification of the successive frames of the encoded sound signal on the basis of a normalized correlation parameter; and
computing the voicing information parameter comprises estimating said voicing information parameter on the basis of the normalized correlation.
-
-
29. A method as defined in claim 1, wherein conducting frame erasure concealment and decoder recovery comprises:
-
following receiving a non erased unvoiced frame after frame erasure, generating no periodic part of a LP filter excitation signal;
following receiving, after frame erasure, of a non erased frame other than unvoiced, constructing a periodic part of the LP filter excitation signal by repeating a last pitch period of a previous frame.
-
-
30. A method as defined in claim 29, wherein constructing the periodic part of the LP filter excitation signal comprises filtering the repeated last pitch period of the previous frame through a low-pass filter.
-
31. A method as defined in claim 30, wherein:
-
determining concealment/recovery parameters comprises computing a voicing information parameter;
the low-pass filter has a cut-off frequency; and
constructing the periodic part of the excitation signal comprises dynamically adjusting the cut-off frequency in relation to the voicing information parameter.
-
-
32. A method as defined in claim 1, wherein conducting frame erasure concealment and decoder recovery comprises randomly generating a non-periodic, innovation part of a LP filter excitation signal.
-
33. A method as defined in claim 32, wherein randomly generating the non-periodic, innovation part of the LP filter excitation signal comprises generating a random noise.
-
34. A method as defined in claim 32, wherein randomly generating the non-periodic, innovation part of the LP filter excitation signal comprises randomly generating vector indexes of an innovation codebook.
-
35. A method as defined in claim 32, wherein:
-
the sound signal is a speech signal;
determination of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
randomly generating the non-periodic, innovation part of the LP filter excitation signal further comprises;
if the last correctly received frame is different from unvoiced, filtering the innovation part of the excitation signal through a high pass filter; and
if the last correctly received frame is unvoiced, using only the innovation part of the excitation signal.
-
-
36. A method as defined in claim 1, wherein:
-
the sound signal is a speech signal;
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset;
conducting frame erasure concealment and decoder recovery comprises, when an onset frame is lost which is indicated by the presence of a voiced frame following frame erasure and an unvoiced frame before frame erasure, artificially reconstructing the lost onset by constructing a periodic part of an excitation signal as a low-pass filtered periodic train of pulses separated by a pitch period.
-
-
37. A method as defined in claim 36, wherein conducting frame erasure concealment and decoder recovery further comprises constructing an innovation part of the excitation signal by means of normal decoding.
-
38. A method as defined in claim 37, wherein constructing an innovation part of the excitation signal comprises randomly choosing entries of an innovation codebook.
-
39. A method as defined in claim 36, wherein artificially reconstructing the lost onset frame comprises limiting a length of the artificially reconstructed onset so that at least one entire pitch period is constructed by the onset artificial reconstruction, said reconstruction being continued until the end of a current subframe.
-
40. A method as defined in claim 39, wherein conducting frame erasure concealment and decoder recovery further comprises, after artificial reconstruction of the lost onset, resuming a regular CELP processing wherein the pitch period is a rounded average of decoded pitch periods of all subframes where the artificial onset reconstruction is used.
-
41. A method as defined in claim 3, wherein conducting frame erasure concealment and decoder recovery comprises:
-
controlling an energy of a synthesized sound signal produced by the decoder, controlling energy of the synthesized sound signal comprising scaling the synthesized sound signal to render an energy of said synthesized sound signal at the beginning of a first non erased frame received following frame erasure similar to an energy of said synthesized signal at the end of a last frame erased during said frame erasure; and
converging the energy of the synthesized sound signal in the received first non erased frame to an energy corresponding to the received energy information parameter toward the end of said received first non erased frame while limiting an increase in energy.
-
-
42. A method as defined in claim 3, wherein:
-
the energy information parameter is not transmitted from the encoder to the decoder; and
conducting frame erasure concealment and decoder recovery comprises, when a gain of a LP filter of a first non erased frame received following frame erasure is higher than a gain of a LP filter of a last frame erased during said frame erasure, adjusting the energy of an LP filter excitation signal produced in the decoder during the received first non erased frame to a gain of the LP filter of said received first non erased frame.
-
-
43. A method as defined in claim 42 wherein:
-
adjusting the energy of an LP filter excitation signal produced in the decoder during the received first non erased frame to a gain of the LP filter of said received first non erased frame comprises using the following relation;
where E1 is the energy at the end of the current frame, ELP0 is the energy of an impulse response of the LP filter to the last non erased frame received before the frame erasure, and ELP1 is the energy of the impulse response of the LP filter to the received first non erased frame following frame erasure.
-
-
44. A method as defined in claim 41, wherein:
-
the sound signal is a speech signal;
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
when the first non erased frame received after a frame erasure is classified as ONSET, conducting frame erasure concealment and decoder recovery comprises limiting to a given value a gain used for scaling the synthesized sound signal.
-
-
45. A method as defined in claim 41, wherein:
-
the sound signal is a speech signal;
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
said method comprising making a gain used for scaling the synthesized sound signal at the beginning of the first non erased frame received after frame erasure equal to a gain used at the end of said received first non erased frame;
during a transition from a voiced frame to an unvoiced frame, in the case of a last non erased frame received before frame erasure classified as voiced transition, voice or onset and a first non erased frame received after frame erasure classified as unvoiced; and
during a transition from a non-active speech period to an active speech period, when the last non erased frame received before frame erasure is encoded as comfort noise and the first non erased frame received after frame erasure is encoded as active speech.
-
-
2. A method as defined in claim 1, further comprising quantizing, in the encoder, the concealment/recovery parameters prior to transmitting said concealment/recovery parameters to the decoder.
-
-
46. A method of concealing frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder to a decoder, comprising:
-
determining, in the encoder, concealment/recovery parameters; and
transmitting to the decoder concealment/recovery parameters determined in the encoder. - View Dependent Claims (47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70)
-
47. A method as defined in claim 46, further comprising quantizing, in the encoder, the concealment/recovery parameters prior to transmitting said concealment/recovery parameters to the decoder.
-
48. A method as defined in claim 46, wherein the concealment/recovery parameters are selected from the group consisting of:
- a signal classification parameter, an energy information parameter and a phase information parameter.
-
49. A method as defined in claim 48, wherein determination of the phase information parameter comprises determining a position of a first glottal pulse in a frame of the encoded sound signal.
-
50. A method as defined in claim 49, wherein determination of the phase information parameter further comprises encoding, in the encoder, the shape, sign and amplitude of the first glottal pulse and transmitting the encoded shape, sign and amplitude from the encoder to the decoder.
-
51. A method as defined in claim 49, wherein determining the position of the first glottal pulse comprises:
-
measuring the first glottal pulse as a sample of maximum amplitude within a pitch period; and
quantizing the position of the sample of maximum amplitude within the pitch period.
-
-
52. A method as defined in claim 46, wherein:
-
the sound signal is a speech signal; and
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset.
-
-
53. A method as defined in claim 52, wherein classifying the successive frames comprises classifying as unvoiced every frame which is an unvoiced frame, every frame without active speech, and every voiced offset frame having an end tending to be unvoiced.
-
54. A method as defined in claim 52, wherein classifying the successive frames comprises classifying as unvoiced transition every unvoiced frame having an end with a possible voiced onset which is too short or not built well enough to be processed as a voiced frame.
-
55. A method as defined in claim 52, wherein classifying the successive frames comprises classifying as voiced transition every voiced frame with relatively weak voiced characteristics, including voiced frames with rapidly changing characteristics and voiced offsets lasting the whole frame, wherein a frame classified as voiced transition follows only frames classified as voiced transition, voiced or onset.
-
56. A method as defined in claim 52, wherein classifying the successive frames comprises classifying as voiced every voiced frames with stable characteristics, wherein a frame classified as voiced follows only frames classified as voiced transition, voiced or onset.
-
57. A method as defined in claim 52, wherein classifying the successive frames comprises classifying as onset every voiced frame with stable characteristics following a frame classified as unvoiced or unvoiced transition.
-
58. A method as defined in claim 52, comprising determining the classification of the successive frames of the encoded sound signal on the basis of at least a part of the following parameters:
- a normalized correlation parameter, a spectral tilt parameter, a signal-to-noise ratio parameter, a pitch stability parameter, a relative frame energy parameter, and a zero crossing parameter.
-
59. A method as defined in claim 58, wherein determining the classification of the successive frames comprises:
- computing a figure of merit on the basis of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter; and
comparing the figure of merit to thresholds to determine the classification.
- computing a figure of merit on the basis of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter; and
-
60. A method as defined in claim 58, comprising calculating the normalized correlation parameter on the basis of a current weighted version of the speech signal and a past weighted version of said speech signal.
-
61. A method as defined in claim 58, comprising estimating the spectral tilt parameter as a ratio between an energy concentrated in low frequencies and an energy concentrated in high frequencies.
-
62. A method as defined in claim 58, comprising estimating the signal-to-noise ratio parameter as a ratio between an energy of a weighted version of the speech signal of a current frame and an energy of an error between said weighted version of the speech signal of the current frame and a weighted version of a synthesized speech signal of said current frame.
-
63. A method as defined in claim 58, comprising computing the pitch stability parameter in response to open-loop pitch estimates for a first half of a current frame, a second half of the current frame and a look-ahead.
-
64. A method as defined in claim 58, comprising computing the relative frame energy parameter as a difference between an energy of a current frame and a long-term average of an energy of active speech frames.
-
65. A method as defined in claim 58, comprising determining the zero-crossing parameter as a number of times a sign of the speech signal changes from a first polarity to a second polarity.
-
66. A method as defined in claim 58, comprising computing at least one of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter using an available look-ahead to take into consideration the behavior of the speech signal in the following frame.
-
67. A method as defined in claim 58, further comprising determining the classification of the successive frames of the encoded sound signal also on the basis of a voice activity detection flag.
-
68. A method as defined in claim 48 wherein:
-
the sound signal is a speech signal;
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
determining concealment/recovery parameters comprises calculating the energy information parameter in relation to a maximum of a signal energy for frames classified as voiced or onset, and calculating the energy information parameter in relation to an average energy per sample for other frames.
-
-
69. A method as defined in claim 46, wherein determining, in the encoder, concealment/recovery parameters comprises computing a voicing information parameter.
-
70. A method as defined in claim 68, wherein:
-
the sound signal is a speech signal;
determination, in the encoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal;
said method comprises determining the classification of the successive frames of the encoded sound signal on the basis of a normalized correlation parameter; and
computing the voicing information parameter comprises estimating said voicing information parameter on the basis of the normalized correlation.
-
-
47. A method as defined in claim 46, further comprising quantizing, in the encoder, the concealment/recovery parameters prior to transmitting said concealment/recovery parameters to the decoder.
-
-
71. A method for the concealment of frame erasure caused by frames erased during transmission of a sound signal encoded under the form of signal-encoding parameters from an encoder to a decoder, comprising:
-
determining, in the decoder, concealment/recovery parameters from the signal-encoding parameters;
in the decoder, conducting erased frame concealment and decoder recovery in response to concealment/recovery parameters determined in the decoder. - View Dependent Claims (72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87)
-
72. A method as defined in claim 71, wherein the concealment/recovery parameters are selected from the group consisting of:
- a signal classification parameter, an energy information parameter and a phase information parameter.
-
73. A method as defined in claim 71, wherein:
-
the sound signal is a speech signal; and
determination, in the decoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset.
-
-
74. A method as defined in claim 71, wherein determining, in the decoder, concealment/recovery parameters comprises computing a voicing information parameter.
-
75. A method as defined in claim 71, wherein conducting frame erasure concealment and decoder recovery comprises:
-
following receiving a non erased unvoiced frame after frame erasure, generating no periodic part of a LP filter excitation signal;
following receiving, after frame erasure, of a non erased frame other than unvoiced, constructing a periodic part of the LP filter excitation signal by repeating a last pitch period of a previous frame.
-
-
76. A method as defined in claim 75, wherein constructing the periodic part of the excitation signal comprises filtering the repeated last pitch period of the previous frame through a low-pass filter.
-
77. A method as defined in claim 76, wherein:
-
determining, in the decoder, concealment/recovery parameters comprises computing a voicing information parameter;
the low-pass filter has a cut-off frequency; and
constructing the periodic part of the LP filter excitation signal comprises dynamically adjusting the cut-off frequency in relation to the voicing information parameter.
-
-
78. A method as defined in claim 71, wherein conducting frame erasure concealment and decoder recovery comprises randomly generating a non-periodic, innovation part of a LP filter excitation signal.
-
79. A method as defined in claim 78, wherein randomly generating the non-periodic, innovation part of the LP filter excitation signal comprises generating a random noise.
-
80. A method as defined in claim 78, wherein randomly generating the non-periodic, innovation part of the LP filter excitation signal comprises randomly generating vector indexes of an innovation codebook.
-
81. A method as defined in claim 78, wherein:
-
the sound signal is a speech signal;
determination, in the decoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
randomly generating the non-periodic, innovation part of the LP filter excitation signal further comprises;
if the last received non erased frame is different from unvoiced, filtering the innovation part of the LP filter excitation signal through a high pass filter; and
if the last received non erased frame is unvoiced, using only the innovation part of the LP filter excitation signal.
-
-
82. A method as defined in claim 78, wherein:
-
the sound signal is a speech signal;
determination, in the decoder, of concealment/recovery parameters comprises classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset;
conducting frame erasure concealment and decoder recovery comprises, when an onset frame is lost which is indicated by the presence of a voiced frame following frame erasure and an unvoiced frame before frame erasure, artificially reconstructing the lost onset by constructing a periodic part of an excitation signal as a low-pass filtered periodic train of pulses separated by a pitch period.
-
-
83. A method as defined in claim 82, wherein conducting frame erasure concealment and decoder recovery further comprises constructing an innovation part of the LP filter excitation signal by means of normal decoding.
-
84. A method as defined in claim 83, wherein constructing an innovation part of the LP filter excitation signal comprises randomly choosing entries of an innovation codebook.
-
85. A method as defined in claim 82, wherein artificially reconstructing the lost onset comprises limiting a length of the artificially reconstructed onset so that at least one entire pitch period is constructed by the onset artificial reconstruction, said reconstruction being continued until the end of a current subframe.
-
86. A method as defined in claim 85, wherein conducting frame erasure concealment and decoder recovery further comprises, after artificial reconstruction of the lost onset, resuming a regular CELP processing wherein the pitch period is a rounded average of decoded pitch periods of all subframes where the artificial onset reconstruction is used.
-
87. A method as defined in claim 72, wherein:
-
the energy information parameter is not transmitted from the encoder to the decoder; and
conducting frame erasure concealment and decoder recovery comprises, when a gain of a LP filter of a first non erased frame received following frame erasure is higher than a gain of a LP filter of a last frame erased during said frame erasure, adjusting the energy of an LP filter excitation signal produced in the decoder during the received first non erased frame to a gain of the LP filter of said received first non erased frame using the following relation;
where E1 is the energy at the end of the current frame, ELP0 is the energy of an impulse response of the LP filter to the last non erased frame received before the frame erasure, and ELP1 is the energy of the impulse response of the LP filter to the received first non erased frame following frame erasure.
-
-
72. A method as defined in claim 71, wherein the concealment/recovery parameters are selected from the group consisting of:
-
-
88. A device for conducting concealment of frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder to a decoder, comprising:
-
means for determining, in the encoder, concealment/recovery parameters;
means for transmitting to the decoder concealment/recovery parameters determined in the encoder; and
in the decoder, means for conducting frame erasure concealment and decoder recovery in response to received concealment/recovery parameters determined by the determining means. - View Dependent Claims (89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 175)
-
89. A device as defined in claim 88, further comprising means for quantizing, in the encoder, the concealment/recovery parameters prior to transmitting said concealment/recovery parameters to the decoder.
-
90. A device as defined in claim 88, wherein the concealment/recovery parameters are selected from the group consisting of:
- a signal classification parameter, an energy information parameter and a phase information parameter.
-
91. A device as defined in claim 90, wherein the means for determining the phase information parameter comprises means for determining the position of a first glottal pulse in a frame of the encoded sound signal.
-
92. A device as defined in claim 88, wherein the means for conducting frame erasure concealment and decoder recovery comprises means for conducting decoder recovery in response to a determined position of a first glottal pulse after at least one lost voice onset.
-
93. A device as defined in claim 88, wherein the means for conducting frame erasure concealment and decoder recovery comprises means for constructing, when at least one onset frame is lost, a periodic excitation part artificially as a low-pass filtered periodic train of pulses separated by a pitch period.
-
94. A device as defined in claim 93, wherein:
-
the device comprises means for quantizing the position of the first glottal pulse prior to transmission of said position of the first glottal pulse to the decoder; and
the means for constructing a periodic excitation part comprises means for realizing the low-pass filtered periodic train of pulses by;
centering a first impulse response of a low-pass filter on the quantized position of the first glottal pulse with respect to the beginning of a frame; and
placing remaining impulse responses of the low-pass filter each with a distance corresponding to an average pitch value from the preceding impulse response up to the end of a last subframe affected by the artificial construction.
-
-
95. A device as defined in claim 91, wherein the means for determining the phase information parameter further comprises means for encoding, in the encoder, the shape, sign and amplitude of the first glottal pulse and means for transmitting the encoded shape, sign and amplitude from the encoder to the decoder.
-
96. A device as defined in claim 91, wherein the means for determining the position of the first glottal pulse comprises:
-
means for measuring the first glottal pulse as a sample of maximum amplitude within a pitch period; and
means for quantizing the position of the sample of maximum amplitude within the pitch period.
-
-
97. A device as defined in claim 88, wherein:
-
the sound signal is a speech signal; and
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset.
-
-
98. A device as defined in claim 97, wherein the means for classifying the successive frames comprises means for classifying as unvoiced every frame which is an unvoiced frame, every frame without active speech, and every voiced offset frame having an end tending to be unvoiced.
-
99. A device as defined in claim 97, wherein the means for classifying the successive frames comprises means for classifying as unvoiced transition every unvoiced frame having an end with a possible voiced onset which is too short or not built well enough to be processed as a voiced frame.
-
100. A device as defined in claim 97, wherein the means for classifying the successive frames comprises means for classifying as voiced transition every voiced frame with relatively weak voiced characteristics, including voiced frames with rapidly changing characteristics and voiced offsets lasting the whole frame, wherein a frame classified as voiced transition follows only frames classified as voiced transition, voiced or onset.
-
101. A device as defined in claim 97, wherein the means for classifying the successive frames comprises means for classifying as voiced every voiced frames with stable characteristics, wherein a frame classified as voiced follows only frames classified as voiced transition, voiced or onset.
-
102. A device as defined in claim 97, wherein the means for classifying the successive frames comprises means for classifying as onset every voiced frame with stable characteristics following a frame classified as unvoiced or unvoiced transition.
-
103. A device as defined in claim 97, comprising means for determining the classification of the successive frames of the encoded sound signal on the basis of at least a part of the following parameters:
- a normalized correlation parameter, a spectral tilt parameter, a signal-to-noise ratio parameter, a pitch stability parameter, a relative frame energy parameter, and a zero crossing parameter.
-
104. A device as defined in claim 103, wherein the means for determining the classification of the successive frames comprises:
-
means for computing a figure of merit on the basis of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter; and
means for comparing the figure of merit to thresholds to determine the classification.
-
-
105. A device as defined in claim 103, comprising means for calculating the normalized correlation parameter on the basis of a current weighted version of the speech signal and a past weighted version of said speech signal.
-
106. A device as defined in claim 103, comprising means for estimating the spectral tilt parameter as a ratio between an energy concentrated in low frequencies and an energy concentrated in high frequencies.
-
107. A device as defined in claim 103, comprising means for estimating the signal-to-noise ratio parameter as a ratio between an energy of a weighted version of the speech signal of a current frame and an energy of an error between said weighted version of the speech signal of the current frame and a weighted version of a synthesized speech signal of said current frame.
-
108. A device as defined in claim 103, comprising means for computing the pitch stability parameter in response to open-loop pitch estimates for a first half of a current frame, a second half of the current frame and a look-ahead.
-
109. A device as defined in claim 103, comprising means for computing the relative frame energy parameter as a difference between an energy of a current frame and a long-term average of an energy of active speech frames.
-
110. A device as defined in claim 103, comprising means for determining the zero-crossing parameter as a number of times a sign of the speech signal changes from a first polarity to a second polarity.
-
111. A device as defined in claim 103, comprising means for computing at least one of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter using an available look-ahead to take into consideration the behavior of the speech signal in the following frame.
-
112. A device as defined in claim 103, further comprising means for determining the classification of the successive frames of the encoded sound signal also on the basis of a voice activity detection flag.
-
113. A device as defined in claim 90, wherein:
-
the sound signal is a speech signal;
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
the means for determining concealment/recovery parameters comprises means for calculating the energy information parameter in relation to a maximum of a signal energy for frames classified as voiced or onset, and means for calculating the energy information parameter in relation to an average energy per sample for other frames.
-
-
114. A device as defined in claim 88, wherein the means for determining, in the encoder, concealment/recovery parameters comprises means for computing a voicing information parameter.
-
115. A device as defined in claim 114, wherein:
-
the sound signal is a speech signal;
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal;
said device comprises means for determining the classification of the successive frames of the encoded sound signal on the basis of a normalized correlation parameter; and
the means for computing the voicing information parameter comprises means for estimating said voicing information parameter on the basis of the normalized correlation.
-
-
116. A device as defined in claim 88, wherein the means for conducting frame erasure concealment and decoder recovery comprises:
-
following receiving a non erased unvoiced frame after frame erasure, means for generating no periodic part of a LP filter excitation signal;
following receiving, after frame erasure, of a non erased frame other than unvoiced, means for constructing a periodic part of the LP filter excitation signal by repeating a last pitch period of a previous frame.
-
-
117. A device as defined in claim 116, wherein the means for constructing the periodic part of the LP filter excitation signal comprises a low-pass filter for filtering the repeated last pitch period of the previous frame.
-
118. A device as defined in claim 117, wherein:
-
the means for determining concealment/recovery parameters comprises means for computing a voicing information parameter;
the low-pass filter has a cut-off frequency; and
the means for constructing the periodic part of the excitation signal comprises means for dynamically adjusting the cut-off frequency in relation to the voicing information parameter.
-
-
119. A device as defined in claim 88, wherein the means for conducting frame erasure concealment and decoder recovery comprises means for randomly generating a non-periodic, innovation part of a LP filter excitation signal.
-
120. A device as defined in claim 119, wherein the means for randomly generating the non-periodic, innovation part of the LP filter excitation signal comprises means for generating a random noise.
-
121. A device as defined in claim 119, wherein the means for randomly generating the non-periodic, innovation part of the LP filter excitation signal comprises means for randomly generating vector indexes of an innovation codebook.
-
122. A device as defined in claim 119, wherein:
-
the sound signal is a speech signal;
the means for determining concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
the means for randomly generating the non-periodic, innovation part of the LP filter excitation signal further comprises;
if the last correctly received frame is different from unvoiced, a high-pass filter for filtering the innovation part of the excitation signal; and
if the last correctly received frame is unvoiced, means for using only the innovation part of the excitation signal.
-
-
123. A device as defined in claim 88, wherein:
-
the sound signal is a speech signal;
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset;
the means for conducting frame erasure concealment and decoder recovery comprises, when an onset frame is lost which is indicated by the presence of a voiced frame following frame erasure and an unvoiced frame before frame erasure, means for artificially reconstructing the lost onset by constructing a periodic part of an excitation signal as a low-pass filtered periodic train of pulses separated by a pitch period.
-
-
124. A device as defined in claim 123, wherein the means for conducting frame erasure concealment and decoder recovery further comprises means for constructing an innovation part of the excitation signal by means of normal decoding.
-
125. A device as defined in claim 124, wherein the means for constructing an innovation part of the excitation signal comprises means for randomly choosing entries of an innovation codebook.
-
126. A device as defined in claim 123, wherein the means for artificially reconstructing the lost onset comprises means for limiting a length of the artificially reconstructed onset so that at least one entire pitch period is constructed by the onset artificial reconstruction, said reconstruction being continued until the end of a current subframe.
-
127. A device as defined in claim 126, wherein the means for conducting frame erasure concealment and decoder recovery further comprises, after artificial reconstruction of the lost onset, means for resuming a regular CELP processing wherein the pitch period is a rounded average of decoded pitch periods of all subframes where the artificial onset reconstruction is used.
-
128. A device as defined in claim 90, wherein the means for conducting frame erasure concealment and decoder recovery comprises:
-
means for controlling an energy of a synthesized sound signal produced by the decoder, the means for controlling energy of the synthesized sound signal comprising means for scaling the synthesized sound signal to render an energy of said synthesized sound signal at the beginning of a first non erased frame received following frame erasure similar to an energy of said synthesized signal at the end of a last frame erased during said frame erasure; and
means for converging the energy of the synthesized sound signal in the received first non erased frame to an energy corresponding to the received energy information parameter toward the end of said received first non erased frame while limiting an increase in energy.
-
-
129. A device as defined in claim 90, wherein:
-
the energy information parameter is not transmitted from the encoder to the decoder; and
the means for conducting frame erasure concealment and decoder recovery comprises, when a gain of a LP filter of a first non erased frame received following frame erasure is higher than a gain of a LP filter of a last frame erased during said frame erasure, means for adjusting the energy of an LP filter excitation signal produced in the decoder during the received first non erased frame to a gain of the LP filter of said received first non erased frame.
-
-
130. A device as defined in claim 129, wherein:
-
the means for adjusting the energy of an LP filter excitation signal produced in the decoder during the received first non erased frame to a gain of the LP filter of said received first non erased frame comprises means for using the following relation;
where E1 is the energy at the end of the current frame, ELP0 is the energy of an impulse response of the LP filter to the last non erased frame received before the frame erasure, and ELP1 is the energy of the impulse response of the LP filter to the received first non erased frame following frame erasure.
-
-
131. A device as defined in claim 128, wherein:
-
the sound signal is a speech signal;
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
when the first non erased frame received after a frame erasure is classified as ONSET, the means for conducting frame erasure concealment and decoder recovery comprises means for limiting to a given value a gain used for scaling thee synthesized sound signal.
-
-
132. A device as defined in claim 128, wherein:
-
the sound signal is a speech signal;
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
said device comprises means for making a gain used for scaling the synthesized sound signal at the beginning of the first non erased frame received after frame erasure equal to a gain used at the end of said received first non erased frame;
during a transition from a voiced frame to an unvoiced frame, in the case of a last non erased frame received before frame erasure classified as voiced transition, voice or onset and a first non erased frame received after frame erasure classified as unvoiced; and
during a transition from a non-active speech period to an active speech period, when the last non erased frame received before frame erasure is encoded as comfort noise and the first non erased frame received after frame erasure is encoded as active speech.
-
-
175. A system for encoding and decoding a sound signal, comprising:
-
a sound signal encoder responsive to the sound signal for producing a set of signal-encoding parameters;
means for transmitting the signal-encoding parameters to a decoder;
said decoder for synthesizing the sound signal in response to the signal-encoding parameters; and
a device as recited in claim 88, for concealing frame erasure caused by frames of the encoded sound signal erased during transmission from the encoder to the decoder.
-
-
89. A device as defined in claim 88, further comprising means for quantizing, in the encoder, the concealment/recovery parameters prior to transmitting said concealment/recovery parameters to the decoder.
-
-
133. A device for conducting concealment of frame erasure caused by frames of an encoded sound signal erased during transmission from an encoder to a decoder, comprising:
-
means for determining, in the encoder, concealment/recovery parameters; and
means for transmitting to the decoder concealment/recovery parameters determined in the encoder. - View Dependent Claims (134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 177)
-
134. A device as defined in claim 133, further comprising means for quantizing, in the encoder, the concealment/recovery parameters prior to transmitting said concealment/recovery parameters to the decoder.
-
135. A device as defined in claim 133, wherein the concealment/recovery parameters are selected from the group consisting of:
- a signal classification parameter, an energy information parameter and a phase information parameter.
-
136. A device as defined in claim 135, wherein the means for determining the phase information parameter comprises means for determining the position of a first glottal pulse in a frame of the encoded sound signal.
-
137. A device as defined in claim 136, wherein the means for determining the phase information parameter further comprises means for encoding, in the encoder, the shape, sign and amplitude of the first glottal pulse and means for transmitting the encoded shape, sign and amplitude from the encoder to the decoder.
-
138. A device as defined in claim 136, wherein the means for determining the position of the first glottal pulse comprises:
-
means for measuring the first glottal pulse as a sample of maximum amplitude within a pitch period; and
means for quantizing the position of the sample of maximum amplitude within the pitch period.
-
-
139. A device as defined in claim 133, wherein:
-
the sound signal is a speech signal; and
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset.
-
-
140. A device as defined in claim 139, wherein the means for classifying the successive frames comprises means for classifying as unvoiced every frame which is an unvoiced frame, every frame without active speech, and every voiced offset frame having an end tending to be unvoiced.
-
141. A device as defined in claim 139, wherein the means for classifying the successive frames comprises means for classifying as unvoiced transition every unvoiced frame having an end with a possible voiced onset which is too short or not built well enough to be processed as a voiced frame.
-
142. A device as defined in claim 139, wherein the means for classifying the successive frames comprises means for classifying as voiced transition every voiced frame with relatively weak voiced characteristics, including voiced frames with rapidly changing characteristics and voiced offsets lasting the whole frame, wherein a frame classified as voiced transition follows only frames classified as voiced transition, voiced or onset.
-
143. A device as defined in claim 139, wherein the means for classifying the successive frames comprises means for classifying as voiced every voiced frames with stable characteristics, wherein a frame classified as voiced follows only frames classified as voiced transition, voiced or onset.
-
144. A device as defined in claim 139, wherein the means for classifying the successive frames comprises means for classifying as onset every voiced frame with stable characteristics following a frame classified as unvoiced or unvoiced transition.
-
145. A device as defined in claim 139, comprising means for determining the classification of the successive frames of the encoded sound signal on the basis of at least a part of the following parameters:
- a normalized correlation parameter, a spectral tilt parameter, a signal-to-noise ratio parameter, a pitch stability parameter, a relative frame energy parameter, and a zero crossing parameter.
-
146. A device as defined in claim 145, wherein the means for determining the classification of the successive frames comprises:
-
means for computing a figure of merit on the basis of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter; and
means for comparing the figure of merit to thresholds to determine the classification.
-
-
147. A device as defined in claim 145, comprising means for calculating the normalized correlation parameter on the basis of a current weighted version of the speech signal and a past weighted version of said speech signal.
-
148. A device as defined in claim 145, comprising means for estimating the spectral tilt parameter as a ratio between an energy concentrated in low frequencies and an energy concentrated in high frequencies.
-
149. A device as defined in claim 145, comprising means for estimating the signal-to-noise ratio parameter as a ratio between an energy of a weighted version of the speech signal of a current frame and an energy of an error between said weighted version of the speech signal of the current frame and a weighted version of a synthesized speech signal of said current frame.
-
150. A device as defined in claim 145, comprising means for computing the pitch stability parameter in response to open-loop pitch estimates for a first half of a current frame, a second half of the current frame and a look-ahead.
-
151. A device as defined in claim 145, comprising means for computing the relative frame energy parameter as a difference between an energy of a current frame and a long-term average of an energy of active speech frames.
-
152. A device as defined in claim 145, comprising means for determining the zero-crossing parameter as a number of times a sign of the speech signal changes from a first polarity to a second polarity.
-
153. A device as defined in claim 145, comprising means for computing at least one of the normalized correlation parameter, spectral tilt parameter, signal-to-noise ratio parameter, pitch stability parameter, relative frame energy parameter, and zero crossing parameter using an available look-ahead to take into consideration the behavior of the speech signal in the following frame.
-
154. A device as defined in claim 145, further comprising means for determining the classification of the successive frames of the encoded sound signal also on the basis of a voice activity detection flag.
-
155. A device as defined in claim 135, wherein:
-
the sound signal is a speech signal;
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
the means for determining concealment/recovery parameters comprises means for calculating the energy information parameter in relation to a maximum of a signal energy for frames classified as voiced or onset, and means for calculating the energy information parameter in relation to an average energy per sample for other frames.
-
-
156. A device as defined in claim 133, wherein the means for determining, in the encoder, concealment/recovery parameters comprises means for computing a voicing information parameter.
-
157. A device as defined in claim 156, wherein:
-
the sound signal is a speech signal;
the means for determining, in the encoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal;
said device comprises means for determining the classification of the successive frames of the encoded sound signal on the basis of a normalized correlation parameter; and
the means for computing the voicing information parameter comprises means for estimating said voicing information parameter on the basis of the normalized correlation.
-
-
177. An encoder for encoding a sound signal comprising:
-
means responsive to the sound signal for producing a set of signal-encoding parameters;
means for transmitting the set of signal-encoding parameters to a decoder responsive to the signal-encoding parameters for recovering the sound signal; and
a device as recited in claim 133, for conducting concealment of frame erasure caused by frames erased during transmission of the signal-encoding parameters from the encoder to the decoder.
-
-
134. A device as defined in claim 133, further comprising means for quantizing, in the encoder, the concealment/recovery parameters prior to transmitting said concealment/recovery parameters to the decoder.
-
-
158. A device for the concealment of frame erasure caused by frames erased during transmission of a sound signal encoded under the form of signal-encoding parameters from an encoder to a decoder, comprising:
-
means for determining, in the decoder, concealment/recovery parameters from the signal-encoding parameters;
in the decoder, means for conducting erased frame concealment and decoder recovery in response to concealment/recovery parameters determined by the determining means. - View Dependent Claims (159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 176)
-
159. A device as defined in claim 158, wherein the concealment/recovery parameters are selected from the group consisting of:
- a signal classification parameter, an energy information parameter and a phase information parameter.
-
160. A device as defined in claim 158, wherein:
-
the sound signal is a speech signal; and
the means for determining, in the decoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset.
-
-
161. A device as defined in claim 158, wherein the means for determining, in the decoder, concealment/recovery parameters comprises means for computing a voicing information parameter.
-
162. A device as defined in claim 158, wherein the means for conducting frame erasure concealment and decoder recovery comprises:
-
following receiving a non erased unvoiced frame after frame erasure, means for generating no periodic part of a LP filter excitation signal;
following receiving, after frame erasure, of a non erased frame other than unvoiced, means for constructing a periodic part of the LP filter excitation signal by repeating a last pitch period of a previous frame.
-
-
163. A device as defined in claim 162, wherein the means for constructing the periodic part of the excitation signal comprises a low-pass filter for filtering the repeated last pitch period of the previous frame.
-
164. A device as defined in claim 163, wherein:
-
the means for determining, in the decoder, concealment/recovery parameters comprises means for computing a voicing information parameter;
the low-pass filter has a cut-off frequency; and
the means for constructing the periodic part of the LP filter excitation signal comprises means for dynamically adjusting the cut-off frequency in relation to the voicing information parameter.
-
-
165. A device as defined in claim 158, wherein the means for conducting frame erasure concealment and decoder recovery comprises means for randomly generating a non-periodic, innovation part of a LP filter excitation signal.
-
166. A device as defined in claim 165, wherein the means for randomly generating the non-periodic, innovation part of the LP filter excitation signal comprises means for generating a random noise.
-
167. A device as defined in claim 165, wherein the means for randomly generating the non-periodic, innovation part of the LP filter excitation signal comprises means for randomly generating vector indexes of an innovation codebook.
-
168. A device as defined in claim 165, wherein:
-
the sound signal is a speech signal;
the means for determination, in the decoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset; and
the means for randomly generating the non-periodic, innovation part of the LP filter excitation signal further comprises;
if the last received non erased frame is different from unvoiced, a high-pass filter for filtering the innovation part of the LP filter excitation signal; and
if the last received non erased frame is unvoiced, means for using only the innovation part of the LP filter excitation signal.
-
-
169. A device as defined in claim 165, wherein:
-
the sound signal is a speech signal;
the means for determining, in the decoder, concealment/recovery parameters comprises means for classifying successive frames of the encoded sound signal as unvoiced, unvoiced transition, voiced transition, voiced, or onset;
the means for conducting frame erasure concealment and decoder recovery comprises, when an onset frame is lost which is indicated by the presence of a voiced frame following frame erasure and an unvoiced frame before frame erasure, means for artificially reconstructing the lost onset by constructing a periodic part of an excitation signal as a low-pass filtered periodic train of pulses separated by a pitch period.
-
-
170. A device as defined in claim 169, wherein the means for conducting frame erasure concealment and decoder recovery further comprises means for constructing an innovation part of the LP filter excitation signal by means of normal decoding.
-
171. A device as defined in claim 170, wherein the means for constructing an innovation part of the LP filter excitation signal comprises means for randomly choosing entries of an innovation codebook.
-
172. A device as defined in claim 169, wherein the means for artificially reconstructing the lost onset comprises means for limiting a length of the artificially reconstructed onset so that at least one entire pitch period is constructed by the onset artificial reconstruction, said reconstruction being continued until the end of a current subframe.
-
173. A device as defined in claim 172, wherein the means for conducting frame erasure concealment and decoder recovery further comprises, after artificial reconstruction of the lost onset, means for resuming a regular CELP processing wherein the pitch period is a rounded average of decoded pitch periods of all subframes where the artificial onset reconstruction is used.
-
174. A device as defined in claim 159, wherein:
-
the energy information parameter is not transmitted from the encoder to the decoder; and
the means for conducting frame erasure concealment and decoder recovery comprises, when a gain of a LP filter of a first non erased frame received following frame erasure is higher than a gain of a LP filter of a last frame erased during said frame erasure, means for adjusting the energy of an LP filter excitation signal produced in the decoder during the received first non erased frame to a gain of the LP filter of said received first non erased frame using the following relation;
where E1 is the energy at the end of the current frame, ELP0 is the energy of an impulse response of the LP filter to the last non erased frame received before the frame erasure, and ELP1 is the energy of the impulse response of the LP filter to the received first non erased frame following frame erasure.
-
-
176. A decoder for decoding an encoded sound signal comprising:
-
means responsive to the encoded sound signal for recovering from said encoded sound signal a set of signal-encoding parameters;
means for synthesizing the sound signal in response to the signal-encoding parameters; and
a device as recited in claim 158, for concealing frame erasure caused by frames of the encoded sound signal erased during transmission from an encoder to the decoder.
-
-
159. A device as defined in claim 158, wherein the concealment/recovery parameters are selected from the group consisting of:
-
Specification
- Resources
-
Current AssigneeVoiceAge EVS LLC (SoftBank Group Corp.)
-
Original AssigneeVoiceAge Corporation
-
InventorsJelinek, Milan, Gournay, Philippe
-
Granted Patent
-
Time in Patent OfficeDays
-
Field of Search
-
US Class Current704/219
-
CPC Class CodesG10L 19/00 Speech or audio signals ana...G10L 19/005 Correction of errors induce...