Voice synthesizing system, segment generation apparatus for generating segments for voice synthesis, voice synthesizing method and storage medium storing program therefor

US 7,089,187 B2
Filed: 09/26/2002
Issued: 08/08/2006
Est. Priority Date: 09/27/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A voice synthesizing system synthesizing a predetermined voice waveform by overlaying a plurality of voice waveform segments in a waveform concatenation method, comprising:

a compressed pitch segment database storing respective voice waveform segments compressed per pitch unit;

a pitch developing portion reading out compressed data of the voice waveform segment from said compressed pitch segment database and decompressing the read out compressed data for reproducing an original voice waveform segment when the voice waveform segment necessary for voice waveform synthesis is demanded;

a cache processing portion temporarily storing the voice waveform segment already used in voice waveform synthesis, and when voice waveform segment necessary for voice waveform synthesis is demanded, returning demanded voice waveform segment to a demander when demanded voice waveform segment is already stored, and obtaining the voice waveform segment from said compressed pitch segment database via said pitch developing portion to hold the obtained voice waveform segment and conjunction therewith to return to the demander when demanded voice waveform segment is not stored.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice synthesizing system can make necessary calculation amount satisfactorily small and can make necessary file size small. The system includes a compressed pitch segment database storing compressed voice waveform segments, a pitch developing portion reading out the voice waveform segment from the database and decompressing the compressed data for reproducing an original voice waveform segment when the voice waveform segment necessary for voice waveform synthesis is demanded, and a cache processing portion temporarily storing the voice waveform segment already used in voice waveform synthesis, and when voice waveform segment necessary for voice waveform synthesis is demanded, returning demanded voice waveform segment to a demander when demanded voice waveform segment is already stored, and obtaining the voice waveform segment from the database via the pitch developing portion to hold the obtained voice waveform segment and return to the demander when demanded voice waveform segment is not stored.

Citations

37 Claims

1. A voice synthesizing system synthesizing a predetermined voice waveform by overlaying a plurality of voice waveform segments in a waveform concatenation method, comprising:
- a compressed pitch segment database storing respective voice waveform segments compressed per pitch unit;
  
  a pitch developing portion reading out compressed data of the voice waveform segment from said compressed pitch segment database and decompressing the read out compressed data for reproducing an original voice waveform segment when the voice waveform segment necessary for voice waveform synthesis is demanded;
  
  a cache processing portion temporarily storing the voice waveform segment already used in voice waveform synthesis, and when voice waveform segment necessary for voice waveform synthesis is demanded, returning demanded voice waveform segment to a demander when demanded voice waveform segment is already stored, and obtaining the voice waveform segment from said compressed pitch segment database via said pitch developing portion to hold the obtained voice waveform segment and conjunction therewith to return to the demander when demanded voice waveform segment is not stored.
- View Dependent Claims (2, 3, 4)
- - 2. A voice synthesizing system as set forth in claim 1, which further comprises:
    - a continuity table respectively storing number of sequential voice waveform segment and amplitude multiplying factors per voice waveform segment with respect to a representative voice waveform segment when a plurality of sequential voice waveform segments can be replaced with one representative voice waveform segment; and
      
      a pitch index converting portion obtaining the voice waveform segment from said cache processing portion with reference to said continuity table and returns the voice waveform segment to the demander with amplification thereof by a value of said amplitude multiplying factor when the voice waveform segment necessary for voice waveform synthesis is demanded,said compressed pitch segment database stores said representative voice waveform segments and the voice waveform segments which cannot be replaced with said representative voice waveform segment.
  - 3. A voice synthesizing system as set forth in claim 1, which comprises:
    - a pitch index table storing amplitude multiplying factor per voice waveform segment with respect to said representative voice waveform segment and number of samples for shifting voice waveform segment in time direction when a plurality of voice waveform segments can be replaced with one representative voice waveform segment; and
      
      a pitch index converting portion obtaining the voice waveform segment from said cache processing portion with reference to said pitch index table, amplifying the voice waveform segments by a value of said amplitude multiplying factor, and returning the voice waveform segments to the demander with shifting the voice waveform segment in time direction with said number of samples, when the voice waveform segment necessary for voice waveform synthesis is demanded,said compressed pitch segment database stores said representative voice waveform segments and the voice waveform segments which cannot be replaced with said representative voice waveform segment.
  - 4. A voice synthesizing system as set forth in claim 1, which comprises:
    - a continuity table respectively storing number of sequential voice waveform segment and amplitude multiplying factors per voice waveform segment with respect to a representative voice waveform segment when a plurality of sequential voice waveform segments can be replaced with one representative voice waveform segment; and
      
      a pitch index table storing amplitude multiplying factor per voice waveform segment with respect to said representative voice waveform segment and number of samples for shifting voice waveform segment in time direction when a plurality of voice waveform segments can be replaced with one representative voice waveform segment; and
      
      a pitch index converting portion obtaining the voice waveform segment from said cache processing portion with reference to one of said continuity table and said pitch index table, amplifying the voice waveform segments at least by a value of said amplitude multiplying factor, and returning the voice waveform segments to the demander with shifting the voice waveform segment in time direction with said number of samples, when the voice waveform segment necessary for voice waveform synthesis is demanded,said compressed pitch segment database stores said representative voice waveform segments and the voice waveform segments which cannot be replaced with said representative voice waveform segment.

5. A voice waveform segment generating apparatus for voice synthesis extracting a plurality of voice waveform segments from a voice waveform of an original human speech and generating information for selecting voice waveform segment necessary for voice synthesis among extracted voice waveform segments, comprising:
- a sequential representative pitch segment determining portion selecting a range where voice waveform segments are regarded as the same voice waveform segment in a sequential zone and selecting representative voice waveform segment among voice waveform segments in said range;
  
  a pitch segment registering portion storing said representative waveform segment and the voice waveform segments out of said range in a database in compressed form; and
  
  a continuity table generating portion calculating number of sequential voice waveform segments in said range and amplitude multiplying factor per voice waveform segment with respect to said voice waveform segment and storing in a storage device in a form of table.
- View Dependent Claims (6, 7, 8, 9)
- - 6. A voice waveform segment generating apparatus as set forth in claim 5, wherein said sequential representative pitch segment determining portion sets the voice waveform segments contained in said range in number less than a predetermined number.
  - 7. A voice synthesizing segment generating apparatus as set forth in claim 6, which further comprises a class discriminating portion dividing the voice waveform segments including result of selection by said continuous representative pitch segment determining portion into a preliminarily set plurality of classes using a phoneme, in which the voice waveform segment belongs, a preceding phoneme immediately preceding to said phoneme, in which the voice waveform segment belongs, and a following phoneme immediately following to said phoneme, in which the voice waveform segment belongs, andsaid representative pitch segment determining portion selects set of the voice waveform segment regarded as the same voice waveform segment per said class.
  - 8. A voice synthesizing segment generating apparatus as set forth in claim 6, wherein said representative pitch segment determining portion selects representative voice waveform segments of the immediately preceding and immediately following sets and the voice waveform segments sequential in time when the representative voice waveform segment is selected among the voice waveform segments in said set.
  - 9. A voice synthesizing segment generating apparatus as set forth in claim 6, which further comprises a phase replacing portion performing predetermined phase replacement for the phoneme and the voice waveform segments preliminarily determined depending upon phonemic environment.

10. A voice waveform segment generating apparatus for voice synthesis extracting a plurality of voice waveform segments from a voice waveform of an original human speech and generating information for selecting voice waveform segment necessary for voice synthesis among extracted voice waveform segments, comprising:
- a representative pitch segment determining portion selecting a set of voice waveform segments which can be regarded as the same voice waveform and selecting representative voice waveform segment among voice waveform segments in said set;
  
  a pitch segment registering portion storing said representative waveform segment and the voice waveform segments out of said set in a database in compressed form; and
  
  a pitch index table generating portion calculating amplitude multiplying factor per each voice waveform segment in said set with respect to said representative voice waveform segments and number of samples for shifting the voice waveform segment in time direction, and storing in a storage device in a form of table.
- View Dependent Claims (11)
- - 11. A voice waveform segment generating apparatus as set forth in claim 10, wherein said representative pitch segment determining portion sets the voice waveform segments contained in said sets in number less than a predetermined number.

12. A voice waveform segment generating apparatus for voice synthesis extracting a plurality of voice waveform segments from a voice waveform of an original human speech and generating information for selecting voice waveform segment necessary for voice synthesis among extracted voice waveform segments, comprising:
- a sequential representative pitch segment determining portion selecting a range where voice waveform segments are regarded as the same voice waveform segment in a sequential zone and selecting representative voice waveform segment among voice waveform segments in said range;
  
  a representative pitch segment determining portion selecting a set of voice waveform segments which can be regarded as the same voice waveform with respect to the result of selection by said sequential representative pitch segment determining portion and selecting representative voice waveform segment among voice waveform segments in said set;
  
  a pitch segment registering portion storing said representative waveform segment and the voice waveform segments out of said set in a database in compressed form;
  
  a continuity table generating portion calculating number of voice waveform segments in said range and amplitude multiplying factor per voice waveform segment with respect to said voice waveform segment and storing in a storage device in a form of table; and
  
  a pitch index table generating portion calculating amplitude multiplying factor per each voice waveform segment in said set with respect to said representative voice waveform segments and number of samples for shifting the voice waveform segment in time direction, and storing in a storage device in a form of table.
- View Dependent Claims (13)
- - 13. A voice synthesizing segment generating apparatus as set forth in claim 12, wherein said sequential representative pitch segment determining portion sets the voice waveform segments contained in said range in number less than a predetermined number, andsaid representative pitch segment determining portion sets the voice waveform segments contained in said sets in number less than a predetermined number.

14. A voice synthesizing method for synthesizing a desired voice waveform by overlaying a plurality of voice waveform segments in waveform concatenation method, comprising the steps of:
- preliminarily storing compressed voice waveform segments in a database;
  
  returning the voice waveform segment to a demander when the voice waveform segment necessary for voice waveform synthesis is demanded and if the demanded voice waveform segment is already stored in a cache memory;
  
  reading out the compressed data of the voice waveform segment from said database storing the compressed data of the voice waveform segments and reproducing an original voice waveform segment by decompressing the read out compressed data if the demanded voice waveform segment is not stored in a cache memory; and
  
  storing the reproduced voice waveform segment in said cache memory and returning to said demander.
- View Dependent Claims (15, 16)
- - 15. A voice synthesizing method as set forth in claim 14, which comprises the steps of:
    - preliminarily storing number of sequential voice waveform segment and amplitude multiplying factor per each voice waveform segment in said storage device with respect to said representative voice waveform segment when a plurality of sequential voice waveform segments can be replaced with one representative voice waveform segment;
      
      obtaining the voice waveform segment from said cache memory when the voice waveform segment necessary for voice waveform synthesis is demanded; and
      
      returning the voice waveform segment to the demander with amplification by a value of said amplitude multiplying factor.
  - 16. A voice synthesizing method as set forth in claim 14, which comprises the steps of:
    - preliminarily storing amplitude multiplying factor per each voice waveform segment with respect to said representative voice waveform segment and number of samples for shifting the voice waveform segments in time direction in said storage device when a plurality of voice waveform segments can be replaced with one representative voice waveform segment;
      
      obtaining the voice waveform segment from said cache memory when the voice waveform segment necessary for voice waveform synthesis is demanded; and
      
      returning the voice waveform segment to the demander with amplification by a value of said amplitude multiplying factor and shifting the voice waveform segment by said sample number.

17. A voice synthesizing segment generating method extracting a plurality of voice waveform segments from an originally spoken human speech and generating information for selecting the voice waveform segment necessary for voice synthesis from the extracted voice waveform segment, comprising the steps of:
- selecting range, in which the voice waveform segments re regarded as the same within a sequential zone among all of voice waveform segments consisting the original speech, and selecting a representative voice waveform segment from the voice waveform segment within said range;
  
  storing said representative voice waveform segments and said voice waveform segment other than said range in a database in compressed form; and
  
  calculating number of sequential voice waveform segments within said range and amplitude multiplying factor per each waveform segment with respect to said representative voice waveform segment and storing in a storage device in a form of table.
- View Dependent Claims (18)
- - 18. A voice synthesizing segment generating method as set forth in claim 17, wherein number of the voice waveform segments contained in said range is less than a predetermined number.

19. A voice synthesizing segment generating method extracting a plurality of voice waveform segments from an originally spoken human speech and generating information for selecting the voice waveform segment necessary for voice synthesis from the extracted voice waveform segment, comprising the steps of:
- selecting set of the voice waveform segments regarded as the same among all of voice waveform segments consisting the original speech, and selecting a representative voice waveform segment from the voice waveform segment within said set;
  
  storing said representative voice waveform segments and said voice waveform segment other than said set in a database in compressed form; and
  
  calculating amplitude multiplying factor per each waveform segment with respect to said representative voice waveform segment and number of samples for shifting the voice wave form in a time direction, in said set and storing in a storage device in a form of table.
- View Dependent Claims (20, 21, 22, 23)
- - 20. A voice synthesizing segment generating method as set forth in claim 19, wherein number of the voice waveform segments contained in said set is less than a predetermined number.
  - 21. A voice synthesizing segment generating method as set forth in claim 19, which further comprises steps ofdividing the voice waveform segments including result of selection by said continuous representative pitch segment determining portion into a preliminarily set plurality of classes using a phoneme, in which the voice waveform segment belongs, a preceding phoneme immediately preceding to said phoneme, in which the voice waveform segment belongs, and a following phoneme immediately following to said phoneme, in which the voice waveform segment belongs, andselecting set of the voice waveform segment regarded as the same voice waveform segment per said class.
  - 22. A voice synthesizing segment generating method as set forth in claim 19, wherein representative voice waveform segments of the immediately preceding and immediately following sets and the voice waveform segments sequential in time are selected when the representative voice waveform segment is selected among the voice waveform segments in said set.
  - 23. A voice synthesizing segment generating method as set forth in claim 19, which further comprises a step of performing predetermined phase replacement for the phoneme and the voice waveform segments preliminarily determined depending upon phonemic environment.

24. A voice synthesizing segment generating method extracting a plurality of voice waveform segments from an originally spoken human speech and generating information for selecting the voice waveform segment necessary for voice synthesis from the extracted voice waveform segment, comprising the steps of:
- selecting range, in which the voice waveform segments are regarded as the same within a sequential zone among all of voice waveform segments consisting the original speech, and selecting a representative voice waveform segment from the voice waveform segment within said range;
  
  with respect to the result of selection, selecting set of the voice waveform segments regarded as the same voice waveform segment, and selecting a representative voice waveform segment from the voice waveform segment within said set;
  
  storing said representative voice waveform segments in said set and said voice waveform segment other than said set in a database in compressed form;
  
  calculating number of sequential voice waveform segments within said range and amplitude multiplying factor per each waveform segment with respect to said representative voice waveform segment and storing in a storage device in a form of table; and
  
  calculating amplitude multiplying factor per each waveform segment in said set with respect to said representative voice waveform segment and number of samples for shifting the voice wave form in a time direction, in said set and storing in a storage device in a form of table.
- View Dependent Claims (25)
- - 25. A voice synthesizing segment generating method as set forth in claim 24, wherein number of the voice waveform segments contained in said range is less than a predetermined number, andnumber of the voice waveform segments contained in said set is less than a predetermined number.

26. A storage medium recording a program for synthesizing a desired voice waveform by overlaying a plurality of voice waveform segments in waveform concatenation method, said program comprising the steps of:
- preliminarily storing compressed voice waveform segments in a database;
  
  returning the voice waveform segment to a demander when the voice waveform segment necessary for voice waveform synthesis is demanded and if the demanded voice waveform segment is already stored in a cache memory;
  
  reading out the compressed data of the voice waveform segment from said database storing the compressed data of the voice waveform segments and reproducing an original voice waveform segment by decompressing the read out compressed data if the demanded voice waveform segment is not stored in a cache memory; and
  
  storing the reproduced voice waveform segment in said cache memory and returning to said demander.
- View Dependent Claims (27, 28)
- - 27. A storage medium as set forth in claim 26, wherein said program further comprises the steps of:
    - storing number of sequential voice waveform segment and amplitude multiplying factor per each voice waveform segment with respect to said representative voice waveform segment in a storage device when a plurality of sequential voice waveform segments can be replaced preliminarily with one representative voice waveform segment;
      
      obtaining the voice waveform segment from said cache memory when the voice waveform segment necessary for voice waveform synthesis is demanded; and
      
      returning the voice waveform segment to the demander with amplification by a value of said amplitude multiplying factor.
  - 28. A storage medium as set forth in claim 26, wherein said program further comprises the steps of:
    - storing amplitude multiplying factor per each voice waveform segment with respect to said representative voice waveform segment and number of samples for shifting the voice waveform segments in time direction in a storage device when a plurality of voice waveform segments can be replaced preliminarily with one representative voice waveform segment;
      
      obtaining the voice waveform segment from said cache memory when the voice waveform segment necessary for voice waveform synthesis is demanded; and
      
      returning the voice waveform segment to the demander with amplification by a value of said amplitude multiplying factor and shifting the voice waveform segment by said sample number.

29. A storage medium recording a program extracting a plurality of voice waveform segments from an originally spoken human speech and generating information for selecting the voice waveform segment necessary for voice synthesis from the extracted voice waveform segment, said program comprising the steps of:
- selecting range, in which the voice waveform segments are regarded as the same within a sequential zone among all of voice waveform segments consisting the original speech, and selecting a representative voice waveform segment from the voice waveform segment within said range;
  
  storing said representative voice waveform segments and said voice waveform segment other than said range in a database in compressed form; and
  
  calculating number of sequential voice waveform segments within said range and amplitude multiplying factor per each waveform segment with respect to said representative voice waveform segment and storing in a storage device in a form of table.
- View Dependent Claims (30, 31)
- - 30. A storage medium as set forth in claim 29, wherein number of the voice waveform segments contained in said range is less than a predetermined number.
  - 31. A storage medium as set forth in claim 30, wherein number of the voice waveform segments contained in said set is less than a predetermined number.

32. A storage medium recording a program extracting a plurality of voice waveform segments from an originally spoken human speech and generating information for selecting the voice waveform segment necessary for voice synthesis from the extracted voice waveform segment, said program comprising the steps of:
- selecting set of the voice waveform segments regarded as the same among all of voice waveform segments consisting the original speech, and selecting a representative voice waveform segment from the voice waveform segment within said set;
  
  storing said representative voice waveform segments and said voice waveform segment other than said set in a database in compressed form; and
  
  calculating amplitude multiplying factor per each waveform segment with respect to said representative voice waveform segment and number of samples for shifting the voice wave form in a time direction, in said set and storing in a storage device in a form of table.
- View Dependent Claims (33, 34, 35)
- - 33. A storage segment as set forth in claim 32, wherein said program further comprises steps ofdividing the voice waveform segments including result of selection by said continuous representative pitch segment determining portion into a preliminarily set plurality of classes using a phoneme, in which the voice waveform segment belongs, a preceding phoneme immediately preceding to said phoneme, in which the voice waveform segment belongs, and a following phoneme immediately following to said phoneme, in which the voice waveform segment belongs, andselecting set of the voice waveform segment regarded as the same voice waveform segment per said class.
  - 34. A storage medium as set forth in claim 32, wherein representative voice waveform segments of the immediately preceding and immediately following sets and the voice waveform segments sequential in time are selected when the representative voice waveform segment is selected among the voice waveform segments in said set.
  - 35. A storage program as set forth in claim 32, wherein said program further comprises a step of performing predetermined phase replacement for the phoneme and the voice waveform segments preliminarily determined depending upon phonemic environment.

36. A storage medium recording a program extracting a plurality of voice waveform segments from an originally spoken human speech and generating information for selecting the voice waveform segment necessary for voice synthesis from the extracted voice waveform segment, said program comprising the steps of:
- selecting range, in which the voice waveform segments are regarded as the same within a sequential zone among all of voice waveform segments consisting the original speech, and selecting a representative voice waveform segment from the voice waveform segment within said range;
  
  with respect to the result of selection, selecting set of the voice waveform segments regarded as the same voice waveform segment, and selecting a representative voice waveform segment from the voice waveform segment within said set;
  
  storing said representative voice waveform segments and said voice waveform segment other than said set in a database in compressed form;
  
  calculating number of the voice waveform segments within said range and amplitude multiplying factor per each waveform segment with respect to said representative voice waveform segment and storing in a storage device in a form of table; and
  
  calculating amplitude multiplying factor per each waveform segment within said set with respect to said representative voice waveform segment and number of samples for shifting the voice wave form in a time direction and storing in a storage device in a form of table.
- View Dependent Claims (37)
- - 37. A storage medium as set forth in claim 36, wherein number of the voice waveform segments contained in said range is less than a predetermined number, andnumber of the voice waveform segments contained in said set is less than a predetermined number.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Kondo, Reishi, Hattori, Hiroaki
Primary Examiner(s)
Knepper, David D.

Application Number

US10/254,666
Publication Number

US 20030061051A1
Time in Patent Office

1,412 Days
Field of Search

None
US Class Current

704/267
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/06 Elementary speech units use...

Voice synthesizing system, segment generation apparatus for generating segments for voice synthesis, voice synthesizing method and storage medium storing program therefor

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

37 Claims

Specification

Solutions

Use Cases

Quick Links

Voice synthesizing system, segment generation apparatus for generating segments for voice synthesis, voice synthesizing method and storage medium storing program therefor

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

37 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links