Speech synthesis apparatus and method

US 7,191,132 B2
Filed: 05/31/2002
Issued: 03/13/2007
Est. Priority Date: 06/04/2001
Status: Expired due to Fees

First Claim

Patent Images

1. Speech synthesis apparatus comprising:

a dialog-style selection arrangement responsive to at least one factor affecting intelligibility of speech output as heard by a user, to select a dialog style intended to provide at least a minimum level of intelligibility;

a speech-application text provider arranged to provide text-form utterances for a current speech application in the dialog style selected by the selection arrangement;

a text-to-speech converter arranged to convert text-form utterances received from the speech-application text provider into speech form and arranged to generate the said at least one factor; and

wherein the selection arrangement is operative to select a dialog style intended to balance intelligibility and naturalness whilst maintaining said minimum level of intelligibility whereby changes in said at least one factor indicating improved intelligibility of speech output lead to changes in dialog style in favor of naturalness whilst changes in said at least one factor indicating reduced intelligibility of speech output lead to changes in dialog style in favor of intelligibility.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesiser is provided with a dialog-style selection arrangement responsive to a factor affecting intelligibility of speech output by the apparatus to select a dialog style intended to provide at least a minimum level of intelligibility of speech output by the synthesiser. The selected dialog style is used by a speech-application text provider when generating text-form utterances for a current speech application, these text-form utterances then being converted into speech form by a text-to-speech converter. The factor affecting intelligibility may be a measure of the intelligibility of the speech-form output or an environmental factor such as background noise in the user'"'"'s environment.

39 Citations

View as Search Results

18 Claims

1. Speech synthesis apparatus comprising:
- a dialog-style selection arrangement responsive to at least one factor affecting intelligibility of speech output as heard by a user, to select a dialog style intended to provide at least a minimum level of intelligibility;
  
  a speech-application text provider arranged to provide text-form utterances for a current speech application in the dialog style selected by the selection arrangement;
  
  a text-to-speech converter arranged to convert text-form utterances received from the speech-application text provider into speech form and arranged to generate the said at least one factor; and
  
  wherein the selection arrangement is operative to select a dialog style intended to balance intelligibility and naturalness whilst maintaining said minimum level of intelligibility whereby changes in said at least one factor indicating improved intelligibility of speech output lead to changes in dialog style in favor of naturalness whilst changes in said at least one factor indicating reduced intelligibility of speech output lead to changes in dialog style in favor of intelligibility.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. Apparatus according to claim 1, wherein the said at least one factor is a measure of the intelligibility of the speech form actually produced by the text-to-speech converter.
  - 3. Apparatus according to claim 2, wherein the text-to-speech converter includes a concatenative speech generator which in generating a speech-form utterance, produces an accumulated unit selection cost in respect of the speech units used to make up the speech-form utterance;
    - the selection arrangement comprising a comparator for comparing the selection cost produced by the speech generator against one or more stored threshold values, in order to select the dialog style.
  - 4. Apparatus according to claim 1, further comprising an output buffer for temporarily storing the latest speech-form utterance generated by the text-to-speech converter, the selection arrangement releasing this speech-form utterance for output only if said at least one factor indicates that a change in dialog style is not currently required.
  - 5. Apparatus according to claim 1, further comprising an arrangement for receiving sound signals from the user, and a background-noise analyser for processing said sound signals to provide a measure of the background noise level in the user'"'"'s environment, this measure constituting the said at least one factor to which the dialog-style selection arrangement is responsive.
  - 6. A speech synthesis apparatus according to claim 5, further comprising a speech input channel with a speech recogniser, the speech input channel constituting said arrangement for receiving sound signals from the user;
    - said background-noise analyser being operative to receive inputs from the text-to-speech converter and the speech recogniser to indicate periods when speech is being produced or received, and the analyser being further operative to effect its background noise measure outside of such periods.
  - 7. Apparatus according to claim 1, wherein the speech-application text provider comprises a dialog manager for running a speech application in the form of multiple scripts each corresponding to a different dialog style, the dialog manager being operative to use the script corresponding to the currently-selected dialog style.
  - 8. Apparatus according to claim 1, wherein the speech-application text provider comprises a language generator responsive to speech-application input information indicative of at least the content of a desired speech output, to generate a corresponding text-form utterance;
    - the language generator being operative to generate said text-form utterance according to one of a set of dialog-style rules, the set of rules used being dependent on the currently-selected dialog style.

9. A method of generating speech output for a current speech application comprising the steps of:
- (a) in dependence on at least one factor affecting intelligibility of speech output as heard by a user, dynamically selecting a dialog style intended to provide at least a minimum level of intelligibility;
  
  (b) providing text-form utterances for a current speech application in the dialog style selected in step (a); and
  
  (c) converting the text-form utterances into speech form and generating the said at least one factor based on converting the text-form utterances into speech form; and
  
  wherein step (a) is effected in a manner so as to balance intelligibility and naturalness whilst maintaining said minimum level of intelligibility whereby changes in said at least one factor indicating improved intelligibility of speech output lead to changes in dialog style in favor of naturalness whilst changes in said at least one factor indicating reduced intelligibility of speech output lead to changes in dialog style in favor of intelligibility.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. A method according to claim 9, wherein the said at least one factor is a measure of the intelligibility of the speech form actually produced by the text-to-speech conversion.
  - 11. A method according to claim 10, wherein step (c) is effected using a concatenative speech generator which in generating a speech-form utterance, produces an accumulated unit selection cost in respect of the speech units used to make up the speech-form utterance;
    - step (a) comparing this selection cost against one or more stored threshold values, in order to select the dialog style.
  - 12. A method according to claim 9, further involving temporarily storing the latest speech form generated in step (c) and then releasing this speech form for output only if said at least one factor indicates that a change in dialog style is not currently required.
  - 13. A method according to claim 9, further involving receiving sound signals from the user and processing said sound signals to provide a measure of the background noise level in the user'"'"'s environment, this measure constituting the said at least one factor to which the dialog-style selection arrangement is responsive.
  - 14. A method according to claim 13, wherein the signals received and processed to provide said measure of the background noise level are selected to be signals received outside of a period when said speech form produced in step (c) is being output.
  - 15. A method according to claim 9, wherein step (b) involves selecting from multiple scripts each corresponding to a different dialog style, the script corresponding to the dialog style selected in step (a).
  - 16. A method according to claim 9, wherein step (b) involves generating a text-form utterance on the basis of speech-application input information indicative of at least the content of a desired speech output, the text-form utterance being generated according to one of a set of dialog-style rules, the set of rules used being dependent on the dialog style selected in step (a).

17. Speech synthesis apparatus comprising:
- a dialog-style selection arrangement responsive to at least one factor affecting intelligibility of speech output as heard by a user, to select a dialog style intended to provide at least a minimum level of intelligibility;
  
  a speech-application text provider arranged to provide text-form utterances for a current speech application in the dialog style selected by the selection arrangement;
  
  a text-to-speech converter arranged to convert text-form utterances received from the speech-application text provider into speech form and arranged to generate the said at least one factor; and
  
  wherein the said at least one factor is a measure of the intelligibility of the speech form actually produced by the text-to-speech converter, wherein the text-to-speech converter is arranged to generate, in the course of converting a text-form utterance into speech form, values of predetermined features that are indicative of the intelligibility of the speech form of the utterance, the selection arrangement comprising;
  
  a classifier responsive to the feature values generated by the text-to-speech converter to provide a measure of the intelligibility of the speech form of the utterance concerned; and
  
  a comparator for comparing the measure produced by the classifier against one or more stored threshold values, in order to select the dialog style.

18. A method of generating speech output for a current speech application comprising the steps of:
- (a) in dependence on at least one factor affecting intelligibility of speech output as heard by a user, dynamically selecting a dialog style intended to provide at least a minimum level of intelligibility;
  
  (b) providing text-form utterances for a current speech application in the dialog style selected in step (a); and
  
  (c) converting the text-form utterances into speech form and generating the said at least one factor based on converting the text-form utterances into speech form; and
  
  wherein step (a) is effected in a manner so as to balance intelligibility and naturalness whilst maintaining said minimum level of intelligibility whereby changes in said at least one factor indicating improved intelligibility of speech output lead to changes in dialog style in favor of naturalness whilst changes in said at least one factor indicating reduced intelligibility of speech output lead to changes in dialog style in favor of intelligibility;
  
  wherein the said at least one factor is a measure of the intelligibility of the speech form actually produced by the text-to-speech conversion;
  
  wherein step (c) involves generating in the course of converting a text-form utterance into speech form, values of predetermined features that are indicative of the intelligibility of the speech form of the utterance, step (a) involving;
  
  using a classifier responsive to the said values of predetermined features to provide a measure of the intelligibility of the speech form of the utterance concerned; and
  
  comparing the measure produced by the classifier against one or more stored threshold values, in order to select the dialog style.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Inventors
Hickey, Marianne, Brittan, Paul St John, Tucker, Roger Cecil Ferry
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
Han; Qi

Application Number

US10/158,084
Publication Number

US 20020184030A1
Time in Patent Office

1,747 Days
Field of Search

704/260, 704258-278
US Class Current

704/260
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 2021/02168   the estimation exclusively ...

G10L 21/0208   Noise filtering

Speech synthesis apparatus and method

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis apparatus and method

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links