Speech enhancement apparatus, speech recording apparatus, speech enhancement program, speech recording program, speech enhancing method, and speech recording method
First Claim
1. A speech enhancement apparatus that corrects and outputs unclear portions of input speech data, the speech enhancement apparatus comprising:
- a voiced/unvoiced-boundary-data output unit that determines a separation of voiced/unvoiced of the input speech data and outputs voiced/unvoiced boundary data as phoneme boundary data that splits the input speech data into a plurality of phonemes;
a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the input speech data for each of the plurality of phonemes, the input speech data being input along with the phoneme boundary data, wherein the waveform feature quantity includes at least one ofamplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes,existence or absence of plosive portions of the phonemes,lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, andphoneme types of the phonemes before and after the phonemes;
a correction determining unit that determines a necessity of correction of the input speech data for each of the plurality of phonemes, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and
a waveform correcting unit that corrects a phoneme of the plurality of phonemes which is determined to be corrected by the correction determining unit by using waveform data that is prior stored in a phonemewise-waveform-data storage unit, wherein the waveform-feature-quantity calculating unit includesa speech data splitting unit that splits the input speech data into the phonemes based on the phoneme boundary data,an amplitude variation measuring unit that measures amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split by the speech data splitting unit,a plosive portion/aspirated portion detecting unit that detects plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured by the amplitude variation measuring unit and the input speech data that is split by the speech data splitting unit,a phoneme classifying unit that classifies phoneme types of the phonemes, based on a detection result by the plosive portion/aspirated portion detecting unit, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, anda phonemewise-feature-quantity calculating unit that calculates a feature quantity for each of the phonemes that are classified by the phoneme classifying unit.
1 Assignment
0 Petitions
Accused Products
Abstract
To automatically detect and automatically correct in a reproduced speech, defective portions related to plosives such as existence or absence of plosive portions, phoneme lengths of aspirated portions that continue after the plosive portions or defective portions related to amplitude variations of fricatives. Speech wherein consonants and unvoiced vowels are unclear and discordant is input into a speech enhancement apparatus according to the present invention. In the speech enhancement apparatus, the speech is split into phonemes and each phoneme is classified into any one of an unvoiced plosive, a voiced plosive, an unvoiced fricative, a voiced fricative, an affricate, and an unvoiced vowel. Each phoneme is corrected according to a determination of necessity of correction of each phoneme to obtain an output of the speech wherein the consonants and the unvoiced vowels are clear and not discordant.
15 Citations
9 Claims
-
1. A speech enhancement apparatus that corrects and outputs unclear portions of input speech data, the speech enhancement apparatus comprising:
-
a voiced/unvoiced-boundary-data output unit that determines a separation of voiced/unvoiced of the input speech data and outputs voiced/unvoiced boundary data as phoneme boundary data that splits the input speech data into a plurality of phonemes; a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the input speech data for each of the plurality of phonemes, the input speech data being input along with the phoneme boundary data, wherein the waveform feature quantity includes at least one of amplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes, existence or absence of plosive portions of the phonemes, lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, and phoneme types of the phonemes before and after the phonemes; a correction determining unit that determines a necessity of correction of the input speech data for each of the plurality of phonemes, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and a waveform correcting unit that corrects a phoneme of the plurality of phonemes which is determined to be corrected by the correction determining unit by using waveform data that is prior stored in a phonemewise-waveform-data storage unit, wherein the waveform-feature-quantity calculating unit includes a speech data splitting unit that splits the input speech data into the phonemes based on the phoneme boundary data, an amplitude variation measuring unit that measures amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split by the speech data splitting unit, a plosive portion/aspirated portion detecting unit that detects plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured by the amplitude variation measuring unit and the input speech data that is split by the speech data splitting unit, a phoneme classifying unit that classifies phoneme types of the phonemes, based on a detection result by the plosive portion/aspirated portion detecting unit, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, and a phonemewise-feature-quantity calculating unit that calculates a feature quantity for each of the phonemes that are classified by the phoneme classifying unit. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A speech recording apparatus that records input speech data in a phonemewise-waveform-data storage unit, the speech recording apparatus comprising:
-
a phoneme-identification-data output unit that assigns phoneme identification data to the input speech data, based on the input speech data and a string of phonemes that is output by carrying out a language process on text data of the input speech data, determines boundaries of the phoneme identification data, and outputs boundary data of the phoneme identification data as phoneme boundary data; a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the input speech data for each of the phonemes, the input speech data being input along with the boundary data of the phoneme identification data output by the phoneme-identification-data output unit, wherein the waveform feature quantity includes at least one of amplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes, existence or absence of plosive portions of the phonemes, lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, and phoneme types of the phonemes before and after the phonemes; a condition sufficiency determining unit that determines whether the input speech data satisfies predetermined conditions for each phoneme, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and a phonemewise-waveform-data recording unit that records in the phonemewise-waveform-data storage unit, the input speech data of each phoneme that is determined to be satisfied the predetermined conditions, based on a determination by the condition sufficiency determining unit, wherein the waveform-feature-quantity calculating unit includes a speech data splitting unit that splits the input speech data into the phonemes based on the phoneme boundary data, an amplitude variation measuring unit that measures an amplitude value and an amplitude variation rate for each of the phonemes that are split by the speech data splitting unit, a plosive portion/aspirated portion detecting unit that detects plosive portions and aspirated portions of the phonemes, based on the amplitude value and the amplitude variation rate that are measured by the amplitude variation measuring unit and the input speech data that is split by the speech data splitting unit, a phoneme classifying unit that classifies each of the phonemes into phoneme types, based on the amplitude value and the amplitude variation rate that are measured by the amplitude variation measuring unit, and a phonemewise-feature-quantity calculating unit that calculates a feature quantity for each of the phonemes that are classified by the phoneme classifying unit according to each of the phoneme types.
-
-
8. A speech enhancing method that corrects and outputs unclear portions of input speech data, the speech enhancing method comprising:
-
determining a separation of voiced/unvoiced of the input speech data and outputting voiced/unvoiced boundary data as phoneme boundary data that splits the input speech data into a plurality of phonemes; calculating a waveform feature quantity of the input speech data for each of the plurality of the phonemes, the input speech data being input along with the phoneme boundary data, wherein the waveform feature quantity includes at least one of amplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes, existence or absence of plosive portions of the phonemes, lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, and phoneme types of the phonemes before and after the phonemes; determining a necessity of correction of the input speech data for each of the plurality of phonemes, based on the waveform feature quantity calculated in the calculating; and correcting a phoneme of the plurality of phonemes which is determined to be corrected in the determining, by using waveform data that is prior stored in a phonemewise-waveform-data storage unit, wherein the calculating includes splitting the input speech data into the phonemes based on the phoneme boundary data, measuring amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split in the splitting, detecting plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured in the measuring and the input speech data that is split in the splitting, classifying phoneme types of the phonemes, based on a detection result in the detecting, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured in the measuring, and calculating a feature quantity for each of the phonemes that are classified in the classifying.
-
-
9. A speech recording method that corrects and outputs unclear portions of input speech data, the speech recording method comprising:
-
assigning phoneme identification data to the input speech data, based on the input speech data and a string of phonemes that is output by carrying out a language process on text data of the input speech data, determining boundaries of the phoneme identification data, and outputting boundary data of the phoneme identification data as phoneme boundary data; calculating a waveform feature quantity of the input speech data for each of the phonemes, the input speech data being input along with the boundary data of the phoneme identification data output from the outputting, wherein the waveform feature quantity includes at least one of amplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes, existence or absence of plosive portions of the phonemes, lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, and phoneme types of the phonemes before and after the phonemes; determining whether the input speech data satisfies predetermined conditions for each phoneme, based on the waveform feature quantity calculated in the calculating; and recording in the phonemewise-waveform-data storage unit, the input speech data of each phoneme that is determined to be satisfied the predetermined conditions, based on a determination in the determining, wherein the calculating includes splitting the input speech into the phonemes based on the phoneme boundary data, measuring an amplitude value and an amplitude variation rate for each of the phonemes that are split in the splitting, detecting plosive portions and aspirated portions of the phonemes, based on the amplitude value and the amplitude variation rate that are measured in the measuring and the input speech data that is split in the splitting, classifying each of the phonemes into phoneme types, based on the amplitude value and the amplitude variation rate that are measured in the measuring, and calculating a feature quantity for each of the phonemes that are classified in the classifying according to each of the phoneme types.
-
Specification