Speech enhancement apparatus, speech recording apparatus, speech enhancement program, speech recording program, speech enhancing method, and speech recording method

US 20080065381A1
Filed: 07/31/2007
Published: 03/13/2008
Est. Priority Date: 09/13/2006
Status: Active Grant

First Claim

Patent Images

1. A speech enhancement apparatus that corrects and outputs unclear portions of input speech data, the speech enhancement apparatus comprising:

a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the speech data for each phoneme, the speech data being input along with phoneme boundary data that splits the speech data into phonemes;

a correction determining unit that determines a necessity of correction of the speech data for each phoneme, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and

a waveform correcting unit that corrects the speech data, the necessity of correction thereof is determined by the correction determining unit, for each phoneme by using waveform data that is prior stored in a phonemewise-waveform-data storage unit.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

To automatically detect and automatically correct in a reproduced speech, defective portions related to plosives such as existence or absence of plosive portions, phoneme lengths of aspirated portions that continue after the plosive portions or defective portions related to amplitude variations of fricatives. Speech wherein consonants and unvoiced vowels are unclear and discordant is input into a speech enhancement apparatus according to the present invention. In the speech enhancement apparatus, the speech is split into phonemes and each phoneme is classified into any one of an unvoiced plosive, a voiced plosive, an unvoiced fricative, a voiced fricative, an affricate, and an unvoiced vowel. Each phoneme is corrected according to a determination of necessity of correction of each phoneme to obtain an output of the speech wherein the consonants and the unvoiced vowels are clear and not discordant.

Citations

28 Claims

1. A speech enhancement apparatus that corrects and outputs unclear portions of input speech data, the speech enhancement apparatus comprising:
- a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the speech data for each phoneme, the speech data being input along with phoneme boundary data that splits the speech data into phonemes;
  
  a correction determining unit that determines a necessity of correction of the speech data for each phoneme, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and
  
  a waveform correcting unit that corrects the speech data, the necessity of correction thereof is determined by the correction determining unit, for each phoneme by using waveform data that is prior stored in a phonemewise-waveform-data storage unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The speech enhancement apparatus according to claim 1, further comprising:
    - a voiced/unvoiced-boundary-data output unit that determines a separation of voiced/unvoiced of the speech data and outputs voiced/unvoiced boundary data as the phoneme boundary data, whereinthe waveform-feature-quantity calculating unit calculates the waveform feature quantity of the speech data for each phoneme, the speech data being input along with the voiced/unvoiced boundary data output by the voiced/unvoiced boundary data calculating unit.
  - 3. The speech enhancement apparatus according to claim 1, further comprising:
    - a phoneme-identification-data output unit that assigns phoneme identification data to the speech data based on the input speech data and a phoneme string that is output by carrying out a language process on text data of the speech data, determines boundaries of the phoneme identification data, and outputs boundary data of the phoneme identification data as the phoneme boundary data, whereinthe waveform-feature-quantity calculating unit calculates the waveform feature quantity of the speech data for each phoneme, the speech data being input along with the boundary data of the phoneme identification data output by the phoneme-identification-data output unit.
  - 4. The speech enhancement apparatus according to claim 2, wherein the waveform-feature-quantity calculating unit includesa speech data splitting unit that splits the input speech data into the phonemes based on the phoneme boundary data,an amplitude variation measuring unit that measures amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split by the speech data splitting unit,a plosive portion/aspirated portion detecting unit that detects plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured by the amplitude variation measuring unit and the speech data that is split by the speech data splitting unit,a phoneme classifying unit that classifies phoneme types of the phonemes, based on a detection result by the plosive portion/aspirated portion detecting unit, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, anda phonemewise-feature-quantity calculating unit that calculates the feature quantity for each of the phonemes that are classified by the phoneme classifying unit.
  - 5. The speech enhancement apparatus according to claim 3, wherein the waveform-feature-quantity calculating unit includesa speech data splitting unit that splits the input speech data into the phonemes based on the phoneme boundary data,an amplitude variation measuring unit that measures amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split by the speech data splitting unit,a plosive portion/aspirated portion detecting unit that detects plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured by the amplitude variation measuring unit and the speech data that is split by the speech data splitting unit,a phoneme classifying unit that classifies phoneme types of the phonemes, based on a detection result by the plosive portion/aspirated portion detecting unit, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, anda phonemewise-feature-quantity calculating unit that calculates the feature quantity for each of the phonemes that are classified by the phoneme classifying unit.
  - 6. The speech enhancement apparatus according to claim 4, wherein the phonemewise-feature-quantity calculating unit calculates as the feature quantity, at least one of the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, existence or absence of the plosive portions of the phonemes, lengths of the plosive portions, existence or absence of the aspirated portions that continue after the plosive portions, and lengths of the aspirated portions that are detected by the plosive portion/aspirated portion detecting unit, and the phoneme types of the phonemes before and after the phonemes that are classified by the phoneme classifying unit.
  - 7. The speech enhancement apparatus according to claim 5, wherein the phonemewise-feature-quantity calculating unit calculates as the feature quantity, at least one of the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured by the amplitude variation measuring unit, existence or absence of the plosive portions of the phonemes, lengths of the plosive portions, existence or absence of the aspirated portions that continue after the plosive portions, and lengths of the aspirated portions that are detected by the plosive portion/aspirated portion detecting unit, and the phoneme types of the phonemes before and after the phonemes that are classified by the phoneme classifying unit.
  - 8. The speech enhancement apparatus according to claim 4, wherein the correction determining unit determines whether correction of the speech data is necessitated for each phoneme according to the phoneme types that are classified by the phoneme classifying unit.
  - 9. The speech enhancement apparatus according to claim 5, wherein the correction determining unit determines whether correction of the speech data is necessitated for each phoneme according to the phoneme types that are classified by the phoneme classifying unit.
  - 10. The speech enhancement apparatus according to claim 4, wherein the waveform-feature-quantity calculating unit further includesa phoneme environment detecting unit that detects a difference of pronounced/silent and a difference of voiced/unvoiced in the phonemes before and after the phonemes that are split by the speech data splitting unit, and whereinthe correction determining unit determines the necessity of correction of the speech data for each phoneme, based on a detection result by the phoneme environment detecting unit along with the waveform feature quantity that is calculated by the waveform-feature-quantity calculating unit.
  - 11. The speech enhancement apparatus according to claim 5, wherein the waveform-feature-quantity calculating unit further includesa phoneme environment detecting unit that detects a difference of pronounced/silent and a difference of voiced/unvoiced in the phonemes before and after the phonemes that are split by the speech data splitting unit, and whereinthe correction determining unit determines the necessity of correction of the speech data for each phoneme, based on a detection result by the phoneme environment detecting unit along with the waveform feature quantity that is calculated by the waveform-feature-quantity calculating unit.
  - 12. The speech enhancement apparatus according to claim 1, further comprising an output speech data synthesizer that synthesizes the input speech data with the speech data of each phoneme that is corrected by the waveform correcting unit, and outputs the synthesized speech data, based on the phoneme boundary data and a determination result by the correction determining unit.

13. A speech recording apparatus that records input speech data in a phonemewise-waveform-data storage unit, the speech recording apparatus comprising:
- a phoneme-identification-data output unit that assigns phoneme identification data to the speech data, based on the input speech data and a phoneme string that is output by carrying out a language process on text data of the speech data, determines boundaries of the phoneme identification data, and outputs boundary data of the phoneme identification data as the phoneme boundary data;
  
  a waveform-feature-quantity calculating unit that calculates a waveform feature quantity of the speech data for each phoneme, the speech data being input along with the boundary data of the phoneme identification data output by the phoneme-identification-data output unit;
  
  a condition sufficiency determining unit that determines whether the speech data satisfies predetermined conditions for each phoneme, based on the waveform feature quantity calculated by the waveform-feature-quantity calculating unit; and
  
  a phonemewise-waveform-data recording unit that records in the phonemewise-waveform-data storage unit, the speech data of each phoneme that is determined to be satisfied the predetermined conditions, based on a determination by the condition sufficiency determining unit.

14. A computer-readable recording medium that stores therein a speech enhancing program that causes a computer to correct and output unclear portions of input speech data, the speech enhancing program causing the computer to execute:
- calculating a waveform feature quantity of the speech data for each phoneme, the speech data being input along with phoneme boundary data that splits the speech data into phonemes;
  
  determining a necessity of correction of the speech data for each phoneme, based on the waveform feature quantity calculated in calculating the waveform-feature-quantity; and
  
  correcting the speech data, the necessity of correction thereof is determined in the determining, for each phoneme by using waveform data that is prior stored in a phonemewise-waveform-data storage unit.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 15. The computer-readable recording medium according to claim 14, the speech enhancing program further causing the computer to execute:
    - determining a separation of voiced/unvoiced of the speech data and outputting voiced/unvoiced boundary data as the phoneme boundary data, whereinthe calculating calculates the waveform feature quantity of the speech data for each phoneme, the speech data being input along with the voiced/unvoiced boundary data output from the outputting.
  - 16. The computer-readable recording medium according to claim 14, the speech enhancing program further causing the computer to execute:
    - assigning phoneme identification data to the speech data based on the input speech data and a phoneme string that is output by carrying out a language process on text data of the speech data, determining boundaries of the phoneme identification data, and outputting boundary data of the phoneme identification data as the phoneme boundary data, whereinthe calculating calculates the waveform feature quantity of the speech data for each phoneme, the speech data being input along with the boundary data of the phoneme identification data output from the outputting.
  - 17. The computer-readable recording medium according to claim 15, wherein the calculating includessplitting the input speech data into the phonemes based on the phoneme boundary data,measuring amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split in splitting,detecting plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured in measuring and the speech data that is split in splitting,classifying phoneme types of the phonemes, based on a detection result in detecting, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured in measuring, andcalculating the feature quantity for each of the phonemes that are classified in classifying.
  - 18. The computer-readable recording medium according to claim 16, wherein the calculating includessplitting the input speech data into the phonemes based on the phoneme boundary data,measuring amplitude values, amplitude variation rates, and existence or absence of periodic waveforms of the phonemes, based on the phonemes that are split in splitting,detecting plosive portions and aspirated portions of the phonemes, based on the amplitude values and the amplitude variation rates that are measured in measuring and the speech data that is split in splitting,classifying phoneme types of the phonemes, based on a detection result in detecting, and the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured in measuring, andcalculating the feature quantity for each of the phonemes that are classified in classifying.
  - 19. The computer-readable recording medium according to claim 17, wherein the calculating calculates as the feature quantity, at least one of the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured in measuring, existence or absence of the plosive portions of the phonemes, lengths of the plosive portions, existence or absence of the aspirated portions that continue after the plosive portions, and lengths of the aspirated portions that are detected in detecting, and the phoneme types of the phonemes before and after the phonemes that are classified in classifying.
  - 20. The computer-readable recording medium according to claim 18, wherein the calculating calculates as the feature quantity, at least one of the amplitude values, the amplitude variation rates, and existence or absence of the periodic waveforms that are measured in measuring, existence or absence of the plosive portions of the phonemes, lengths of the plosive portions, existence or absence of the aspirated portions that continue after the plosive portions, and lengths of the aspirated portions that are detected in detecting, and the phoneme types of the phonemes before and after the phonemes that are classified in classifying.
  - 21. The computer-readable recording medium according to claim 17, wherein the determining determines whether correction of the speech data is necessitated for each phoneme according to the phoneme types that are classified in classifying.
  - 22. The computer-readable recording medium according to claim 18, wherein the determining determines whether correction of the speech data is necessitated for each phoneme according to the phoneme types that are classified in classifying.
  - 23. The computer-readable recording medium according to claim 17, wherein the calculating further includesdetecting a difference of pronounced/silent and a difference of voiced/unvoiced in the phonemes before and after the phonemes that are split in splitting, and whereinthe determining determines the necessity of correction of the speech data for each phoneme, based on a detection result from detecting along with the waveform feature quantity that is calculated in calculating.
  - 24. The computer-readable recording medium according to claim 18, wherein the calculating further includesdetecting a difference of pronounced/silent and a difference of voiced/unvoiced in the phonemes before and after the phonemes that are split in splitting, and whereinthe determining determines the necessity of correction of the speech data for each phoneme, based on a detection result from detecting along with the waveform feature quantity that is calculated in calculating.
  - 25. The computer-readable recording medium according to claim 14, the speech enhancement program further causing the computer to execute:
    - synthesizing the input speech data with the speech data of each phoneme that is corrected in the correcting, and outputting the synthesized speech data, based on the phoneme boundary data and a determination result from the determining.

26. A computer-readable recording medium that stores therein a speech recording program that causes a computer to record input speech data in a phonemewise-waveform-data storage unit, the speech recording program causing the computer to execute:
- assigning phoneme identification data to the speech data, based on the input speech data and a phoneme string that is output by carrying out a language process on text data of the speech data, determining boundaries of the phoneme identification data, and outputting boundary data of the phoneme identification data as the phoneme boundary data;
  
  calculating a waveform feature quantity of the speech data for each phoneme, the speech data being input along with the boundary data of the phoneme identification data output from the outputting;
  
  determining whether the speech data satisfies predetermined conditions for each phoneme, based on the waveform feature quantity calculated in calculating; and
  
  recording in the phonemewise-waveform-data storage unit, the speech data of each phoneme that is determined to be satisfied the predetermined conditions, based on a determination in determining.

27. A speech enhancing method that corrects and outputs unclear portions of input speech data, the speech enhancing method comprising:
- calculating a waveform feature quantity of the speech data for each phoneme, the speech data being input along with phoneme boundary data that splits the speech data into phonemes;
  
  determining a necessity of correction of the speech data for each phoneme, based on the waveform feature quantity calculated in calculating; and
  
  correcting the speech data, the necessity of correction thereof is determined in determining, for each phoneme by using waveform data that is prior stored in a phonemewise-waveform-data storage unit.

28. A speech recording method that corrects and outputs unclear portions of input speech data, the speech recording method comprising:
- assigning phoneme identification data to the speech data, based on the input speech data and a phoneme string that is output by carrying out a language process on text data of the speech data, determining boundaries of the phoneme identification data, and outputting boundary data of the phoneme identification data as the phoneme boundary data;
  
  calculating a waveform feature quantity of the speech data for each phoneme, the speech data being input along with the boundary data of the phoneme identification data output from the outputting;
  
  determining whether the speech data satisfies predetermined conditions for each phoneme, based on the waveform feature quantity calculated in calculating; and
  
  recording in the phonemewise-waveform-data storage unit, the speech data of each phoneme that is determined to be satisfied the predetermined conditions, based on a determination in the determining.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Matsumoto, Chikako

Granted Patent

US 8,190,432 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 2021/0575 Aids for the handicapped in...

G10L 21/0364 for improving intelligibility

Speech enhancement apparatus, speech recording apparatus, speech enhancement program, speech recording program, speech enhancing method, and speech recording method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Speech enhancement apparatus, speech recording apparatus, speech enhancement program, speech recording program, speech enhancing method, and speech recording method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links