Speech synthesis

US 4,908,867 A
Filed: 11/19/1987
Issued: 03/13/1990
Est. Priority Date: 11/19/1987
Status: Expired due to Term

First Claim

Patent Images

1. A speech synthesiser comprising:

(a) means for deriving, from coded text input thereto, phonetic data indicative of the properties of a synthesis filter and accent data indicating the occurrence of accents on words and to identify phrase groups of words delimited by punctuation marks;

(b) means for deriving from the accent data a pitch contour;

(c) an excitation generator responsive to the pitch contour to produce an excitation signal of varying pitch; and

(d) filter means responsive to the phonetic data to filter the excitation signal to produce synthetic speech;

wherein each phrase group comprises one or more subgroups and the deriving means are arranged in operation in response to paragraph division within the text to produce a pitch contour which, for a given textual content, is, for each of a plurality of subgroups at the commencement of a paragraph, higher than for a subgroup at an intermediate part of a paragraph by a factor which, falls from a value greater than unity at the commencement of the paragraph to a value of unity at said intermediate part, the factor falling stepwise at the boundary between each one of said plurality of subgroups, and the subgroup which follows it.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Coded text is converted to phonetic data to drive a synthesis filter. Accent data are also obtained to derive a pitch contour for a variable pitch excitation source. Recognition of the beginning of a paragraph causes a pitch contour of higher pitch than the pitch at a later part of the paragraph. The initial pitch falls following each subgroup into which phrases are divided. Accents within a phrase are assigned pitch values which are high for the first accent, less high for the last; and the remainder alternate between higher and lower lesser values. Accents on repeated words may be suppressed.

177 Citations

13 Claims

1. A speech synthesiser comprising:
- (a) means for deriving, from coded text input thereto, phonetic data indicative of the properties of a synthesis filter and accent data indicating the occurrence of accents on words and to identify phrase groups of words delimited by punctuation marks;
  
  (b) means for deriving from the accent data a pitch contour;
  
  (c) an excitation generator responsive to the pitch contour to produce an excitation signal of varying pitch; and
  
  (d) filter means responsive to the phonetic data to filter the excitation signal to produce synthetic speech;
  
  wherein each phrase group comprises one or more subgroups and the deriving means are arranged in operation in response to paragraph division within the text to produce a pitch contour which, for a given textual content, is, for each of a plurality of subgroups at the commencement of a paragraph, higher than for a subgroup at an intermediate part of a paragraph by a factor which, falls from a value greater than unity at the commencement of the paragraph to a value of unity at said intermediate part, the factor falling stepwise at the boundary between each one of said plurality of subgroups, and the subgroup which follows it.
- View Dependent Claims (2, 13)
- - 2. A speech synthesiser according to claim 1 in which the said factor falls at each subgroup by a constant proportion of its previous value.
  - 13. A speech synthesiser according to claim 1 or 3 wherein the deriving means are arranged in operation to suppress accents on words which, in accordance with a predetermined criterion, resemble words previously processed.

3. A speech synthesiser comprising:
- (a) means for deriving, from coded text input thereto, phonetic data indicative of the properties of a synthesis filter and accent data indicating the occurrence of accents on words and to identify phase groups of words delimited by punctuation marks;
  
  (b) means for deriving from the accent data a pitch contour;
  
  (c) an excitation generator responsive to the pitch contour to produce an excitation signal of varying pitch;
  
  (d) filter means responsive to the phonetic data to filter the excitation signal to produce synthetic speech;
  
  wherein each phrase group comprises one or more subgroups and the deriving means are arranged in operation in response to paragraph division within the text to produce a pitch contour which, for a given textual content, is for each of a plurality of subgroups at the commencement of a paragraph, higher than for a subgroup at an intermediate part of a paragraph by a factor which, falls from a value greater than unity at the commencement of the paragraph to a value of unity at said intermediate part, the factor falling stepwise at the boundary between each one of said plurality of subgroups, and the subgroup which follows it; and
  
  (e) means assigning each word to a first class having a relatively high contextual significance or a second class having a relatively lower contextual significance and the boundaries between subgroups are defined as occurring after any word of the first class which is followed by a word of the second class.

4. A speech synthesiser comprising:
- (a) means for deriving, from coded text input thereto, phonetic data indicative of the properties of a synthesis filter and accent data indicating the occurrence of accents on words and to identify phrase groups of words delimited by punctuation marks;
  
  (b) means for deriving from the accent data a pitch contour;
  
  (c) an excitation generator responsive to the pitch contour to produce an excitation signal of varying pitch; and
  
  (d) filter means responsive to the phonetic data to filter the excitation signal to produce synthetic speech;
  
  wherein the deriving means are arranged in operation to assign pitch representative values to the accents within each phrase group, the values comprising;
  
  (i) a first value assigned to the first accent in the group;
  
  (ii) a second value, lower than the last, assigned to the first accent in the group; and
  
  (iii) further values, lower than the first and second values, assigned to the remaining accents in the group such that the majority of those further values form a sequence in which the difference between successive values is alternately positive and negative;
  
  and to derive a pitch contour from those values; and
  
  wherein the further values consist of a third value and a fourth value lower than the third, the last of the remaining accents is assigned the fourth value, and of the other remaining accents the first and odd numbered ones are assigned the third value and the even numbered ones are assigned the fourth value.
- View Dependent Claims (5)
- - 5. A speech synthesiser according to claim 4 in which each phrase group comprises one or more subgroups and pitch values are also assigned to boundaries between subgroups.

6. A speech synthesiser comprising:
- (a) means for deriving, from coded text input thereto, phonetic data indicative of the properties of a synthesis filter and accent data indicating the occurrence of accents on words and to identify phrase groups of words delimited by punctuation marks;
  
  (b) means for deriving from the accent data a pitch contour;
  
  (c) an excitation generator responsive to the pitch contour to produce an excitation signal of varying pitch; and
  
  (d) filter means responsive to the phonetic data to filter the excitation signal to produce synthetic speech;
  
  wherein the deriving means are arranged in operation to assign pitch representative values to the accents within each phrase group, the values comprising;
  
  (i) a first value assigned to the first accent in the group;
  
  (ii) a second value, lower than the last, assigned to the first accent in the group; and
  
  (iii) further values, lower than the first and second values, assigned to the remaining accents in the group such that the majority of those further values form a sequence in which the difference between successive values is alternately positive and negative;
  
  and to derive a pitch contour from those values; and
  
  wherein each phrase group comprises one or more subgroups and the deriving means is arranged in operation in response to paragraph division within the text to produce a pitch contour which, for a given textual content, is, for each of a plurality of subgroups at the commencement of a paragraph higher than for a subgroup at an intermediate part of a paragraph by a factor which falls from a value greater than unity at the commencement of the paragraph to a value of unity of said intermediate part, the factor falling stepwise at the boundary between each one of said plurality of subgroups and the subgroup which follows it.
- View Dependent Claims (7)
- - 7. A speech synthesiser according to claim 6 in which the said factor falls at each subgroup by a constant proportion of its previous value.

8. A speech synthesiser comprising:
- (a) means for deriving, from coded text input thereto, phonetic data indicative of the properties of a synthesis filter and accent data indicating the occurrence of accents on words and to identify phrase groups of words delimited by punctuation marks;
  
  (b) means for deriving from the accent data a pitch contour;
  
  (c) an excitation generator responsive to the pitch contour to produce an excitation signal of varying pitch; and
  
  (d) filter means responsive to the phonetic data to filter the excitation signal to produce synthetic speech;
  
  wherein the deriving means are arranged in operation to assign pitch representative values to the accents within each phrase group, the values comprising;
  
  (i) a first value assigned to the first accent in the group;
  
  (ii) a second value, lower than the last, assigned to the first accent in the group; and
  
  (iii) further values, lower than the first and second values, assigned to the remaining accents in the group such that the majority of those further values form a sequence in which the difference between successive values is alternately positive and negative;
  
  and to derive a pitch contour from those values; and
  
  wherein the deriving means is arranged in operation to derive the pitch contour from the values by(a) linear interpolation between the values and(b) filtering of the resulting contour.

9. A speech synthesiser comprising:
- (a) means for deriving, from coded text input thereto, phonetic data indicative of the properties of a synthesis filter and accent data indicating the occurrence of accents on words;
  
  (b) means for deriving from the accent data a pitch contour;
  
  (c) an excitation generator responsive to the pitch contour to produce an excitation signal of varying pitch; and
  
  (d) filter means responsive to the phonetic data to filter the excitation signal to produce synthetic speech;
  
  wherein the deriving means are arranged in operation to suppress accents on words which, in accordance with a predetermined criterion, resemble words previously processed,wherein the predetermined criterion is one of identity of words.

10. A speech synthesiser comprising:
- (a) means for deriving, from coded text input thereto, phonetic data indicative of the properties of a synthesis filter and accent data indicating the occurrence of accents on words;
  
  (b) means for deriving from the accent data a pitch contour;
  
  (c) an excitation generator responsive to the pitch contour to produce an excitation signal of varying pitch; and
  
  (d) filter means responsive to the phonetic data to filter the excitation signal to produce synthetic speech;
  
  wherein the deriving means are arranged in operation to suppress accents on words which, in accordance with a predetermined criterion, resemble words previously processed wherein the predetermined criterion is that the stem of the word is the same as that of the earlier word.
- View Dependent Claims (11, 12)
- - 11. A speech synthesiser according to claim 9 or 10 in which the deriving means includes a store for storing a word list of predetermined size to which previously processed words are added, organized such that when a new word is added the least recently added word is discarded, the suppression of accents being performed only in respect of words resembling those in the list.
  - 12. A speech synthesiser according to claim 11 in which the deriving means is arranged to recognise the end of a paragraph and, upon such recognition, to erase the list.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Telecommunications PLC (BT Group PLC)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Silverman, Kim E. A.
Primary Examiner(s)
Wong, Peter S.
Assistant Examiner(s)
JONES, JUDSON

Application Number

US07/122,804
Time in Patent Office

845 Days
Field of Search

381/38, 381/44, 381/51, 381/36
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/10 Prosody rules derived from ...

Speech synthesis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

177 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

177 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links