Method and system for pitch contour quantization in audio coding
First Claim
1. A method for coding an audio signal, comprising:
- receiving pitch contour data indicative of the audio signal, the pitch contour data comprising a plurality of pitch values obtained from an audio segment at a plurality of sampling points at regular time intervals;
creating, in response to the pitch contour data obtained at said regular time intervals, a plurality of pitch contour segment candidates, each segment candidate corresponding to a sub-segment of the audio signal, wherein each sub-segment has a start-point pitch value and an end-point pitch value selected from said plurality of pitch values and each segment candidate has a start-segment pitch value at a start segment point and an end-segment pitch value at an end segment point, the start segment point aligned with the sampling point of the start-point pitch value and the end segment point aligned with the sampling point of the end-point pitch value;
measuring deviation between each of the pitch contour segment candidates and said pitch values in the corresponding sub-segment;
selecting, among said segment candidates, a plurality of consecutive simplified contour segments to represent the audio segment based on the measured deviations and one or more pre-selected criteria, wherein the start-segment pitch values at the start segment points of at least some simplified contour segments are different from the start-point pitch values of the corresponding sub-segments and the end-segment pitch values at the end segment points of at least some simplified contour segments are different from the end-point pitch values of the corresponding sub-segments, wherein each of the simplified contour segments is selected from a corresponding group of segment candidates, and wherein the simplified contour segments comprise a first contour segment and a plurality of subsequent contour segments, and wherein, in said creating,the start-segment pitch value of the group of segment candidates corresponding to each of the subsequent contour segments is the same as the end-segment pitch value of the simplified contour segment immediately preceding said each of the subsequent contour segments, andthe start-segment pitch value of the group of segment candidates corresponding to the first contour segment is selected based on the start-segment pitch value of the sub-segment corresponding to first contour segment, and wherein the sub-segment corresponding to the first contour segment is representative of the pitch contour data first available after an inactive or unvoiced speech or at the beginning of an encoding process;
coding the sub-segment of the audio signal corresponding to the simplified contour segment with characteristics of the simplified contour segment.
9 Assignments
0 Petitions
Accused Products
Abstract
A method and device for improving coding efficiency in audio coding. From the pitch values of a pitch contour of an audio signal, a plurality of simplified pitch contour segments are generated to approximate the pitch contour, based on one or more pre-selected criteria. The contour segments can be linear or non-linear with each contour segment represented by a first end point and a second end point. If the contour segments are linear, then only the information regarding the end points, instead of the pitch values, are provided to a decoder for reconstructing the audio signal. The contour segment can have a fixed maximum length or a variable length, but the deviation between a contour segment and the pitch values in that segment is limited by a maximum value.
55 Citations
23 Claims
-
1. A method for coding an audio signal, comprising:
-
receiving pitch contour data indicative of the audio signal, the pitch contour data comprising a plurality of pitch values obtained from an audio segment at a plurality of sampling points at regular time intervals; creating, in response to the pitch contour data obtained at said regular time intervals, a plurality of pitch contour segment candidates, each segment candidate corresponding to a sub-segment of the audio signal, wherein each sub-segment has a start-point pitch value and an end-point pitch value selected from said plurality of pitch values and each segment candidate has a start-segment pitch value at a start segment point and an end-segment pitch value at an end segment point, the start segment point aligned with the sampling point of the start-point pitch value and the end segment point aligned with the sampling point of the end-point pitch value; measuring deviation between each of the pitch contour segment candidates and said pitch values in the corresponding sub-segment; selecting, among said segment candidates, a plurality of consecutive simplified contour segments to represent the audio segment based on the measured deviations and one or more pre-selected criteria, wherein the start-segment pitch values at the start segment points of at least some simplified contour segments are different from the start-point pitch values of the corresponding sub-segments and the end-segment pitch values at the end segment points of at least some simplified contour segments are different from the end-point pitch values of the corresponding sub-segments, wherein each of the simplified contour segments is selected from a corresponding group of segment candidates, and wherein the simplified contour segments comprise a first contour segment and a plurality of subsequent contour segments, and wherein, in said creating, the start-segment pitch value of the group of segment candidates corresponding to each of the subsequent contour segments is the same as the end-segment pitch value of the simplified contour segment immediately preceding said each of the subsequent contour segments, and the start-segment pitch value of the group of segment candidates corresponding to the first contour segment is selected based on the start-segment pitch value of the sub-segment corresponding to first contour segment, and wherein the sub-segment corresponding to the first contour segment is representative of the pitch contour data first available after an inactive or unvoiced speech or at the beginning of an encoding process; coding the sub-segment of the audio signal corresponding to the simplified contour segment with characteristics of the simplified contour segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 16)
-
-
11. An apparatus comprising:
-
an input end for receiving pitch contour data, the pitch contour data comprising a plurality of pitch values obtained from an audio segment of an audio signal at a plurality of sampling points at regular time intervals; and a data processing module, responsive to the pitch contour data obtained from said regular time intervals, for generating a plurality of pitch contour segment candidates, each segment candidate corresponding to a sub-segment of the audio signal, wherein each sub-segment has a start-point pitch value and an end-point pitch value selected from said plurality of pitch values and each segment candidate has a start-segment pitch value at a start segment point and an end-segment pitch value at an end segment point, the start segment point aligned with the sampling point of the start-point pitch value and the end segment point aligned with the sampling point of the end-point pitch value, and wherein the processing module is configured to measure deviation between each of the pitch contour segment candidates and said pitch values in the corresponding sub-segment; and to select, among said segment candidates, a plurality of consecutive simplified contour segments to represent the audio segment based on the measured deviations and pre-selected criteria, wherein the start-segment pitch values at the start segment points of at least some selected segment candidates are different from the start-point pitch values of the corresponding sub-segments and the end-segment pitch values at the end segment points of at least some simplified contour segments are different from the end-point pitch values of the corresponding sub-segments, wherein each of the simplified contour segments is selected from a corresponding group of segment candidates, and wherein the simplified contour segments comprise a first contour segment and a plurality of subsequent contour segments, and wherein, in said generating, the start-segment pitch value of the group of segment candidates corresponding to each of the subsequent contour segments is the same as the end-segment pitch value of the simplified contour segment immediately preceding said each of the subsequent contour segments, and the start-segment pitch value of the group of segment candidates corresponding to the first contour segment is selected based on the start-segment pitch value of the sub-segment corresponding to first contour segment, and wherein the sub-segment corresponding to the first contour segment is representative of the pitch contour data first available after an inactive or unvoiced speech or at the beginning of an encoding process. - View Dependent Claims (12, 13, 14, 15)
-
-
17. An apparatus comprising:
-
an input for receiving audio data indicative of a plurality of consecutive simplified contour segments, the consecutive simplified contour segments selected from a plurality of pitch contour segment candidates, wherein the pitch contour segment candidates are generated in response to pitch contour data comprising a plurality of pitch values obtained from an audio segment of an audio signal at a plurality of sampling points at regular time intervals, each segment candidate corresponding to a sub-segment of the audio signal, wherein each sub-segment has a start-point pitch value and an end-point pitch value selected from said plurality of pitch values and each segment candidate has a start segment point and an end segment point, the start segment point aligned with the sampling point of the start-point pitch value and the end segment point aligned with the sampling point of the end-point pitch value, and wherein the plurality of consecutive simplified contour segments are selected among said segment candidates based on pre-selected criteria and on deviation between each of the segment candidate and said pitch values in the corresponding sub-segment, and wherein each of the simplified segments is defined by a first end point having a first pitch value and a second end point having a second pitch value, and wherein the first pitch values at the first end points of at least some simplified segments are different from the start-point pitch values of the corresponding sub-segments and the second pitch values at the second end points of at least some simplified segments are different from the end-point pitch values of the corresponding sub-segments, and wherein the received audio data comprises the end points defining the sub segments, wherein each of the simplified contour segments is selected from a corresponding group of segment candidates, and wherein the simplified contour segments comprise a first contour segment and a plurality of subsequent contour segments, and wherein, in generating the pitch contour segment candidates, the start-segment pitch value of the group of segment candidates corresponding to each of the subsequent contour segments is the same as the end-segment pitch value of the simplified contour segment immediately preceding said each of the subsequent contour segments, and the start-segment pitch value of the group of segment candidates corresponding to the first contour segment is selected based on the start-segment pitch value of the sub-segment corresponding to first contour segment, and wherein the sub-segment corresponding to the first contour segment is representative of the pitch contour data first available after an inactive or unvoiced speech or at the beginning of an encoding process; and a reconstructing module configured to reconstruct the audio segment based on the received audio data. - View Dependent Claims (18, 19, 20)
-
-
21. A communication network, comprising:
-
a plurality of base stations; and a plurality of mobile stations communicating with the base stations, wherein at least one of the mobile stations comprises; an input for receiving audio data indicative of a plurality of consecutive simplified contour segments, the consecutive simplified contour segments selected from a plurality of pitch contour segment candidates, wherein the pitch contour segment candidates are generated in response to pitch contour data comprising a plurality of pitch values obtained from an audio segment of an audio signal at a plurality of sampling points at regular time intervals, each segment candidate corresponding to a sub-segment of the audio signal, wherein each sub-segment has a start-point pitch value and an end-point pitch value selected from said plurality of pitch values and each segment candidate has a start segment point and an end segment point, the start segment point aligned with the sampling point of the start-point pitch value and the end segment point aligned with the sampling point of the end-point pitch value, and wherein the plurality of consecutive simplified contour segments are selected among said segment candidates based on pre-selected criteria and on deviation between each of the segment candidate and said pitch values in the corresponding sub-segment, and wherein each of the simplified segments is defined by a first end point having a first pitch value and a second end point having a second pitch value, and wherein the first pitch values of the first end points of at least some simplified segments are different from the start-point pitch values of the corresponding sub-segments and the second pitch values of the second end points of at least some simplified segments are different from the end-point pitch values of the corresponding sub-segments, and wherein the received audio data comprises the end points defining the sub-segments, wherein each of the simplified contour segments is selected from a corresponding group of segment candidates, and wherein the simplified contour segments comprise a first contour segment and a plurality of subsequent contour segments, and wherein, in generating the pitch contour segment candidates, the start-segment pitch value of the group of segment candidates corresponding to each of the subsequent contour segments is the same as the end-segment itch value of the simplified contour segment immediately preceding said each of the subsequent contour segments, and the start-segment pitch value of the group of segment candidates corresponding to the first contour segment is selected based on the start-segment pitch value of the sub-segment corresponding to first contour segment, and wherein the sub-segment corresponding to the first contour segment is representative of the pitch contour data first available after an inactive or unvoiced speech or at the beginning of an encoding process; and a reconstructing module configured to reconstruct the audio segment based on the received audio data.
-
-
22. An apparatus comprising:
-
means for receiving pitch contour data, the pitch contour data comprising a plurality of pitch values obtained from an audio segment of an audio signal at a plurality of sampling points at regular time intervals; means, responsive to the pitch contour data obtained from said regular time intervals, for generating a plurality of pitch contour segment candidates, each segment candidate corresponding to a sub-segment of the audio signal, wherein each sub-segment has a start-point pitch value and an end-point pitch value selected from said plurality of pitch values and each segment candidate has a start-segment pitch value at a start segment point and an end-segment pitch value at an end segment point, the start segment point aligned with the sampling point of the start-point pitch value and the end segment point aligned with the sampling point of the end-point pitch value, means, for measuring deviation between each of the pitch contour segment candidates and said pitch values in the corresponding sub-segment, and means for selecting, among said segment candidates, a plurality of consecutive simplified contour segments to represent the audio segment based on the measured deviations and pre-selected criteria, wherein the start-segment pitch values of the start segment points of at least some selected segment candidates are different from the start-point pitch values of the corresponding sub-segments and the end-segment pitch values of the end segment points of at least some simplified contour segments are different from the end-point pitch values of the corresponding sub-segments, wherein each of the simplified contour segments is selected from a corresponding group of segment candidates, and wherein the simplified contour segments comprise a first contour segment and a plurality of subsequent contour segments, and wherein, in generating the plurality of pitch contour segment candidates, the start-segment pitch value of the group of segment candidates corresponding to each of the subsequent contour segments is the same as the end-segment pitch value of the simplified contour segment immediately preceding said each of the subsequent contour segments, and the start-segment pitch value of the group of segment candidates corresponding to the first contour segment is selected based on the start-segment pitch value of the sub-segment corresponding to first contour segment, and wherein the sub-segment corresponding to the first contour segment is representative of the pitch contour data first available after an inactive or unvoiced speech of at the beginning of an encoding process. - View Dependent Claims (23)
-
Specification