Speech enhancement apparatus, speech recording apparatus, speech enhancement program, speech recording program, speech enhancing method, and speech recording method

US 8,190,432 B2
Filed: 07/31/2007
Issued: 05/29/2012
Est. Priority Date: 09/13/2006
Status: Active Grant

First Claim

Patent Images

1. A speech enhancement apparatus that corrects and outputs unclear portions of input speech data, the speech enhancement apparatus comprising:

a voiced/unvoiced-boundary-data output unit that determines a separation of voiced/unvoiced of the input speech data and outputs voiced/unvoiced boundary data as phoneme boundary data that splits the input speech data into a plurality of phonemes;

a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the input speech data for each of the plurality of phonemes, the input speech data being input along with the phoneme boundary data, wherein the waveform feature quantity includes at least one ofamplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes,existence or absence of plosive portions of the phonemes,lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, andphoneme types of the phonemes before and after the phonemes;

a correction determining unit that determines a necessity of correction of the input speech data for each of the plurality of phonemes, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and

a waveform correcting unit that corrects a phoneme of the plurality of phonemes which is determined to be corrected by the correction determining unit by using waveform data that is prior stored in a phonemewise-waveform-data storage unit, wherein the waveform-feature-quantity calculating unit includesa speech data splitting unit that splits the input speech data into the phonemes based on the phoneme boundary data,an amplitude variation measuring unit that measures amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split by the speech data splitting unit,a plosive portion/aspirated portion detecting unit that detects plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured by the amplitude variation measuring unit and the input speech data that is split by the speech data splitting unit,a phoneme classifying unit that classifies phoneme types of the phonemes, based on a detection result by the plosive portion/aspirated portion detecting unit, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, anda phonemewise-feature-quantity calculating unit that calculates a feature quantity for each of the phonemes that are classified by the phoneme classifying unit.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

To automatically detect and automatically correct in a reproduced speech, defective portions related to plosives such as existence or absence of plosive portions, phoneme lengths of aspirated portions that continue after the plosive portions or defective portions related to amplitude variations of fricatives. Speech wherein consonants and unvoiced vowels are unclear and discordant is input into a speech enhancement apparatus according to the present invention. In the speech enhancement apparatus, the speech is split into phonemes and each phoneme is classified into any one of an unvoiced plosive, a voiced plosive, an unvoiced fricative, a voiced fricative, an affricate, and an unvoiced vowel. Each phoneme is corrected according to a determination of necessity of correction of each phoneme to obtain an output of the speech wherein the consonants and the unvoiced vowels are clear and not discordant.

15 Citations

View as Search Results

9 Claims

1. A speech enhancement apparatus that corrects and outputs unclear portions of input speech data, the speech enhancement apparatus comprising:
- a voiced/unvoiced-boundary-data output unit that determines a separation of voiced/unvoiced of the input speech data and outputs voiced/unvoiced boundary data as phoneme boundary data that splits the input speech data into a plurality of phonemes;
  
  a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the input speech data for each of the plurality of phonemes, the input speech data being input along with the phoneme boundary data, wherein the waveform feature quantity includes at least one ofamplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes,existence or absence of plosive portions of the phonemes,lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, andphoneme types of the phonemes before and after the phonemes;
  
  a correction determining unit that determines a necessity of correction of the input speech data for each of the plurality of phonemes, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and
  
  a waveform correcting unit that corrects a phoneme of the plurality of phonemes which is determined to be corrected by the correction determining unit by using waveform data that is prior stored in a phonemewise-waveform-data storage unit, wherein the waveform-feature-quantity calculating unit includesa speech data splitting unit that splits the input speech data into the phonemes based on the phoneme boundary data,an amplitude variation measuring unit that measures amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split by the speech data splitting unit,a plosive portion/aspirated portion detecting unit that detects plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured by the amplitude variation measuring unit and the input speech data that is split by the speech data splitting unit,a phoneme classifying unit that classifies phoneme types of the phonemes, based on a detection result by the plosive portion/aspirated portion detecting unit, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, anda phonemewise-feature-quantity calculating unit that calculates a feature quantity for each of the phonemes that are classified by the phoneme classifying unit.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The speech enhancement apparatus according to claim 1, further comprising:
    - a phoneme-identification-data output unit that assigns phoneme identification data to the input speech data based on the input speech data and a phoneme string that is output by carrying out a language process on text data of the input speech data, determines boundaries of the phoneme identification data, and outputs boundary data of the phoneme identification data as the phoneme boundary data, whereinthe waveform-feature-quantity calculating unit calculates the waveform feature quantity of the input speech data for each of the phonemes, the input speech data being input along with the boundary data of the phoneme identification data output by the phoneme-identification-data output unit.
  - 3. The speech enhancement apparatus according to claim 1, wherein the phonemewise-feature-quantity calculating unit calculates as the feature quantity, at least one of the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, existence or absence of the plosive portions of the phonemes, lengths of the plosive portions, existence or absence of the aspirated portions that continue after the plosive portions, and lengths of the aspirated portions that are detected by the plosive portion/aspirated portion detecting unit, and the phoneme types of the phonemes before and after the phonemes that are classified by the phoneme classifying unit.
  - 4. The speech enhancement apparatus according to claim 1, wherein the correction determining unit determines whether correction of the input speech data is necessitated for each phoneme according to the phoneme types that are classified by the phoneme classifying unit.
  - 5. The speech enhancement apparatus according to claim 1, wherein the waveform-feature-quantity calculating unit further includesa phoneme environment detecting unit that detects a difference of pronounced/silent and a difference of voiced/unvoiced in the phonemes before and after the phonemes that are split by the speech data splitting unit, and whereinthe correction determining unit determines the necessity of correction of the input speech data for each phoneme, based on a detection result by the phoneme environment detecting unit along with the waveform feature quantity that is calculated by the waveform-feature-quantity calculating unit.
  - 6. The speech enhancement apparatus according to claim 1, further comprising an output speech data synthesizer that synthesizes the input speech data with the input speech data of each phoneme that is corrected by the waveform correcting unit, and outputs the synthesized input speech data, based on the phoneme boundary data and a determination result by the correction determining unit.

7. A speech recording apparatus that records input speech data in a phonemewise-waveform-data storage unit, the speech recording apparatus comprising:
- a phoneme-identification-data output unit that assigns phoneme identification data to the input speech data, based on the input speech data and a string of phonemes that is output by carrying out a language process on text data of the input speech data, determines boundaries of the phoneme identification data, and outputs boundary data of the phoneme identification data as phoneme boundary data;
  
  a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the input speech data for each of the phonemes, the input speech data being input along with the boundary data of the phoneme identification data output by the phoneme-identification-data output unit, wherein the waveform feature quantity includes at least one ofamplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes,existence or absence of plosive portions of the phonemes,lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, andphoneme types of the phonemes before and after the phonemes;
  
  a condition sufficiency determining unit that determines whether the input speech data satisfies predetermined conditions for each phoneme, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and
  
  a phonemewise-waveform-data recording unit that records in the phonemewise-waveform-data storage unit, the input speech data of each phoneme that is determined to be satisfied the predetermined conditions, based on a determination by the condition sufficiency determining unit, wherein the waveform-feature-quantity calculating unit includesa speech data splitting unit that splits the input speech data into the phonemes based on the phoneme boundary data,an amplitude variation measuring unit that measures an amplitude value and an amplitude variation rate for each of the phonemes that are split by the speech data splitting unit,a plosive portion/aspirated portion detecting unit that detects plosive portions and aspirated portions of the phonemes, based on the amplitude value and the amplitude variation rate that are measured by the amplitude variation measuring unit and the input speech data that is split by the speech data splitting unit,a phoneme classifying unit that classifies each of the phonemes into phoneme types, based on the amplitude value and the amplitude variation rate that are measured by the amplitude variation measuring unit, anda phonemewise-feature-quantity calculating unit that calculates a feature quantity for each of the phonemes that are classified by the phoneme classifying unit according to each of the phoneme types.

8. A speech enhancing method that corrects and outputs unclear portions of input speech data, the speech enhancing method comprising:
- determining a separation of voiced/unvoiced of the input speech data and outputting voiced/unvoiced boundary data as phoneme boundary data that splits the input speech data into a plurality of phonemes;
  
  calculating a waveform feature quantity of the input speech data for each of the plurality of the phonemes, the input speech data being input along with the phoneme boundary data, wherein the waveform feature quantity includes at least one ofamplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes,existence or absence of plosive portions of the phonemes,lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, andphoneme types of the phonemes before and after the phonemes;
  
  determining a necessity of correction of the input speech data for each of the plurality of phonemes, based on the waveform feature quantity calculated in the calculating; and
  
  correcting a phoneme of the plurality of phonemes which is determined to be corrected in the determining, by using waveform data that is prior stored in a phonemewise-waveform-data storage unit, wherein the calculating includessplitting the input speech data into the phonemes based on the phoneme boundary data,measuring amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split in the splitting,detecting plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured in the measuring and the input speech data that is split in the splitting,classifying phoneme types of the phonemes, based on a detection result in the detecting, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured in the measuring, andcalculating a feature quantity for each of the phonemes that are classified in the classifying.

9. A speech recording method that corrects and outputs unclear portions of input speech data, the speech recording method comprising:
- assigning phoneme identification data to the input speech data, based on the input speech data and a string of phonemes that is output by carrying out a language process on text data of the input speech data, determining boundaries of the phoneme identification data, and outputting boundary data of the phoneme identification data as phoneme boundary data;
  
  calculating a waveform feature quantity of the input speech data for each of the phonemes, the input speech data being input along with the boundary data of the phoneme identification data output from the outputting, wherein the waveform feature quantity includes at least one ofamplitude values, amplitude variation rates, existence or absence of periodic waveforms, of the phonemes,existence or absence of plosive portions of the phonemes,lengths of the plosive portions, existence or absence of aspirated portions that continue after the plosive portions, lengths of the aspirated portions, andphoneme types of the phonemes before and after the phonemes;
  
  determining whether the input speech data satisfies predetermined conditions for each phoneme, based on the waveform feature quantity calculated in the calculating; and
  
  recording in the phonemewise-waveform-data storage unit, the input speech data of each phoneme that is determined to be satisfied the predetermined conditions, based on a determination in the determining, wherein the calculating includessplitting the input speech into the phonemes based on the phoneme boundary data,measuring an amplitude value and an amplitude variation rate for each of the phonemes that are split in the splitting,detecting plosive portions and aspirated portions of the phonemes, based on the amplitude value and the amplitude variation rate that are measured in the measuring and the input speech data that is split in the splitting,classifying each of the phonemes into phoneme types, based on the amplitude value and the amplitude variation rate that are measured in the measuring, andcalculating a feature quantity for each of the phonemes that are classified in the classifying according to each of the phoneme types.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Matsumoto, Chikako
Primary Examiner(s)
Jackson, Jakieda

Application Number

US11/882,312
Publication Number

US 20080065381A1
Time in Patent Office

1,764 Days
Field of Search

704/254
US Class Current

704/254
CPC Class Codes

G10L 2021/0575 Aids for the handicapped in...

G10L 21/0364 for improving intelligibility

Speech enhancement apparatus, speech recording apparatus, speech enhancement program, speech recording program, speech enhancing method, and speech recording method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

15 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Speech enhancement apparatus, speech recording apparatus, speech enhancement program, speech recording program, speech enhancing method, and speech recording method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links