Synthesising speech by converting phonemes to digital waveforms

US 6,502,074 B1
Filed: 10/02/1997
Issued: 12/31/2002
Est. Priority Date: 08/04/1993
Status: Expired due to Term

First Claim

Patent Images

1. A method of converting an input signal representing a text in phonemes into an output digital waveform signal convertible into an acoustic waveform corresponding to said text, wherein said method comprises:

(a) dividing said input signal into input segments, each of which is stored in an access section of a linked dtabase;

(b) for each input segment identified in step (a), retrieving an output segment of said digital waveform from an output section of the database, said output segment being that which is linked to the input segment; and

(c) joining the digital output segments retrieved in step (b), said output segments being kept in the same order as the respectively associated input segments whereby the resulting output digital signal is a waveform corresponding to the input signal waveform;

the output section of the database containing an extended digital waveform containing plural contextual occurrences of each of plural phonemes in extended speech representing signals of the phonemes to be converted and having a location parameter for identifying any point therein whereby the establishment of beginning and ending location parameters defines a portion of said extended digital waveform;

step (a) including establishing beginning and ending location parameters for segments of the input signal; and

step (c) including utilizing the parameters established in step (a) for retrieving a portion of stored digital waveform.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This invention relates to the generation of synthetic speech and specifically to the production of a digital waveform from a text in phonemes. The invention uses a linked database which comprises an extended text in phonemes and its equivalent in the form of a digital waveform. The two portions of the database are linked by a parameter which establishes equivalent points in both the phoneme text and the digital waveform. The input text (in phonemes) is analyzed to locate matching portion in the phoneme portion of the database. This matching utilises exact equivalence of phonemes where this is possible; otherwise relation between phonemes is utilised. The selection process identifies input phonemes in context whereby improved conversions are obtained. Having analyzed the input text into matching strings in the input form of the database beginning and ending parameters for the sections are established. The output text is produced by abutting sections of the digital waveform and defined by the beginning and ending parameters.

Citations

13 Claims

1. A method of converting an input signal representing a text in phonemes into an output digital waveform signal convertible into an acoustic waveform corresponding to said text, wherein said method comprises:
- (a) dividing said input signal into input segments, each of which is stored in an access section of a linked dtabase;
  
  (b) for each input segment identified in step (a), retrieving an output segment of said digital waveform from an output section of the database, said output segment being that which is linked to the input segment; and
  
  (c) joining the digital output segments retrieved in step (b), said output segments being kept in the same order as the respectively associated input segments whereby the resulting output digital signal is a waveform corresponding to the input signal waveform;
  
  the output section of the database containing an extended digital waveform containing plural contextual occurrences of each of plural phonemes in extended speech representing signals of the phonemes to be converted and having a location parameter for identifying any point therein whereby the establishment of beginning and ending location parameters defines a portion of said extended digital waveform;
  
  step (a) including establishing beginning and ending location parameters for segments of the input signal; and
  
  step (c) including utilizing the parameters established in step (a) for retrieving a portion of stored digital waveform.
- View Dependent Claims (2, 3, 4)
- - 2. A method according to claim 1, further comprising comparing input windows of the input signal with stored windows contained in the input section of the database to establish a closest match for the input signal.
  - 3. A method according to claim 2, further comprising establishing said window to have a length equivalent to 5 phonemes.
  - 4. A method according to claim 3, in which the input section of the database is organized into three hierarchical levels;
    - namely

5. A method of converting an input signal into an output signal, wherein:
- (a) said input signal represents a text in phonemes;
  
  (b) said output signal is a digital waveform convertible into an acoustic waveform corresponding to said text;
  
  (c) a database is used having an input section and an output section;
  
  (d) said output section containing an extended digital waveform having a location parameter for identifying any point therein whereby the establishment of beginning and ending location parameters defines a portion of said extended digital waveform;
  
  (e) said input section containing segments of an extended phoneme text corresponding to the extended waveform contained in the output section;
  
  said method comprising the steps of;
  
  (i) dividing said input signal into input segments;
  
  (ii) matching said input segments with segments contained in the input section of the database thereby establishing beginning and ending location parameters;
  
  (iii) retrieving from the output section of said database segments of extended digital waveform corresponding to said beginning and ending location parameters; and
  
  (iv) joining the output segments of digital waveform so retrieved, said segments being kept in the same order as the corresponding input segments.

6. A method of converting an input signal into an output signal, wherein:
- (a) said input signal represents an input text in phonemes;
  
  (b) said output signal is a digital waveform convertible into an acoustic waveform corresponding to said input text;
  
  (c) a database is used having an input section and an output section;
  
  (d) said output section containing an extended digital waveform having a location parameter for identifying any point therein whereby the establishment of beginning and ending location parameters defines a portion of said extended digital waveform;
  
  (e) said input section defining context windows of an extended phoneme text corresponding to the extended waveform contained in the output section;
  
  said method comprising the steps of;
  
  (i) dividing said input signal into input segments;
  
  (ii) matching said input segments with context windows contained in the input section of the database thereby establishing beginning and ending location parameters;
  
  (iii) retrieving from the output section of said database segments of extended waveform corresponding to said beginning and ending location parameters; and
  
  (iv) joining the output segments of a digital waveform, said joined segments being kept in the same order as the corresponding input segments.
- View Dependent Claims (7, 8)
- - 7. A method as in claim 6 wherein each context window has a length equivalent to five phonemes.
  - 8. A method as in claim 7 in which:

9. A method of converting a string of input phoneme text signals into an output digital waveform signal representing acoustic speech, said method comprising the steps of:
- (a) storing extended digital speech waveform signals, representing plural utterances of each phoneme to be converted, in a corresponding plurality of speech contexts with different preceding and/or succeeding phonemes;
  
  (b) dividing an input string of phonemes into input subsets of N contiguous phonemes, N being an integer;
  
  (c) matching each said input subset with a most similar corresponding subset of N contiguous phonemes in said stored extended digital speech waveform;
  
  (d) selecting a portion of the stored extended digital speech waveform corresponding to at least one phoneme of the match subset; and
  
  repeating at least steps (c) and (d) while concatenating the thus-selected portions of the extended digital speech waveform to provide said converted output digital waveform signal representing acoustic speech.
- View Dependent Claims (10, 11)
- - 10. A method as in claim 9 wherein N equals five.
  - 11. A method as in claim 9 wherein:

12. A method for converting an input signal representing an input text in phonemes into an output digital waveform signal which is, in turn, convertible into an acoustic waveform corresponding to said input text, said method utilizing a linked database having an output section containing an extended digital waveform corresponding to an extended text in phonemes, said text including plural occurrences of individual phonemes in different contexts whereby the extended digital waveform includes plural digital waveforms for the same phoneme in different contexts and said linked database having a location parameter for identifying any point in said extended text and an equivalent point in the extended digital waveform, whereby the establishment of beginning and ending parameters in the extended text defines a portion of said digital waveform, said method including:
- (a) dividing said input signal into input segments corresponding to portions of digital waveform contained in the output section of the linked database;
  
  (b) establishing beginning and ending parameters for input segments identified in step (a);
  
  (c) utilizing parameters established in step (b) for retrieving portions of stored digital waveform; and
  
  (d) joining the portions retrieved in step (c) in the same order as the respective input segments to produce said output digital waveform signal convertible into said acoustic waveform.

13. A method for converting an input signal representing an input text in phonemes into an output digital waveform signal which is, in turn, convertible into an acoustic waveform corresponding to said text, said method utilizing a linked database having an input section and an output section wherein the input section contains signals representing an extended text in phonemes including plural occurrences of individual phonemes in different contexts and the output section contains an extended digital waveform corresponding to the extended text of the input section of the database and having a location parameter for identifying any point in said extended text whereby the establishment of beginning and ending parameters defines a portion of said digital waveform, said method including:
- (a) dividing said input signal into input segments containing input phonemes;
  
  (b) comparing said input phonemes with the extended text contained in the input section of the database to identify the plural occurrences of said input phonemes and selecting from said plural occurrences of said input phonemes closest contexts based on the respective input segments, whereby beginning and ending parameters corresponding to input phonemes are established;
  
  (c) utilizing the parameters established in step (b) for retrieving portions of stored digital waveform corresponding to input phonemes;
  
  (d) joining the portions retrieved in step (c) in the same order as the respective input phonemes to produce said output digital waveform signal convertible into said acoustic waveform.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Telecommunications PLC (BT Group PLC)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Breen, Andrew Paul
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US08/942,482
Time in Patent Office

1,916 Days
Field of Search

704/258, 704/260, 704/211, 704/243, 704/252
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

Synthesising speech by converting phonemes to digital waveforms

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Synthesising speech by converting phonemes to digital waveforms

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links