Text-based speech synthesis method containing synthetic speech comparisons and updates

US 6,546,369 B1
Filed: 05/05/2000
Issued: 04/08/2003
Est. Priority Date: 05/05/1999
Status: Expired due to Term

First Claim

Patent Images

1. A reproduction method for voice-controlled systems with text-based speech synthesis, comprising the steps of:

converting a stored string of characters described phonetically according to general rules into a pure synthetic form;

if there is an actually spoken speech input that corresponds to said stored string of characters, comparing said pure synthetic form of said string of characters with said speech input before reproduction of said string of characters;

if a deviation is detected in said pure synthetic form of said string of characters that has a value greater than a threshold value, creating at least one variation of said pure synthetic form of said string of characters;

comparing one of said variations with said speech input; and

outputting one of said variations instead of said pure synthetic form of said string of characters, if the deviation of one of said variations from said speech input is less than said threshold value.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention specifies a simple reproduction method with improved pronunciation for voice-controlled systems with text-based speech synthesis even when the stored train of characters to be synthesized does not follow the general rules of speech reproduction. According to the invention, the method of “copying” the original spoken input text into the otherwise synthesized reproduction text, which is the current state of the art, is avoided, which will significantly increase the acceptance of the user of the voice-controlled system due to the process invented. More specifically, when there is actual spoken speech input that corresponds to a stored train of characters, the converted train of characters is compared to the speech input before reproduction of the train of characters described phonetically according to general rules and converted to a purely synthetic form. When the converted train of characters is found to deviate from the speech input by a value above a threshold value, at least one variation of the converted train of characters is created. This variation is then output instead of the converted train of characters as long as this variation deviates from the speech input by a value below the threshold value.

Citations

23 Claims

1. A reproduction method for voice-controlled systems with text-based speech synthesis, comprising the steps of:
- converting a stored string of characters described phonetically according to general rules into a pure synthetic form;
  
  if there is an actually spoken speech input that corresponds to said stored string of characters, comparing said pure synthetic form of said string of characters with said speech input before reproduction of said string of characters;
  
  if a deviation is detected in said pure synthetic form of said string of characters that has a value greater than a threshold value, creating at least one variation of said pure synthetic form of said string of characters;
  
  comparing one of said variations with said speech input; and
  
  outputting one of said variations instead of said pure synthetic form of said string of characters, if the deviation of one of said variations from said speech input is less than said threshold value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A reproduction method according to claim 1, wherein one variation of the converted string of characters is created in said creating step, and
- 3. A method according to claim 2, wherein before comparing the speech input with the converted string of characters of the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented.
- 4. A reproduction method according to claim 1, wherein at least two variations of the converted string of characters will be created in said creating step andwherein when there is more than one variation of the converted string of characters having a deviation from the speech input that is below the threshold value, the variation of the converted string of characters with the smallest deviation from the speech input will be reproduced.
- 5. A method according to claim 4, wherein before comparing the speech input with the converted string of characters or the variation(s) created from the converted string of characters, the speech input and the converted train of characters or the variation created will be segmented.
- 6. A method according to claim 1, wherein before comparing the speech input with the converted string of characters or the variation(s) created from it, the speech input and the converted train of characters or the variation(s) created will be segmented.
- 7. A reproduction method according to claim 6, wherein the same segmenting approach will be used to segment the speech input and the converted string of characters or the variation created from the converted string of characters.
- 8. A reproduction method according to claim 6, wherein different segmenting approaches will be used to segment the speech input and the converted string of characters of the variation created from the converted string of characters.
- 9. A reproduction method according to claim 6, wherein an explicit segmenting approach will be used to segment the converted string of characters or the variation created from the converted string of characters, and an implicit segmenting approach will be used to segment the speech input.
- 10. A reproduction method according to claim 6, wherein the corresponding segments of the converted string of characters provided in segmented form and of the segmented speech input will be examined for common features, andwherein the phoneme present in the segment of the converted string of characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value.
- 11. A reproduction method according to claim 10, wherein each phoneme is linked to at least one replacement phoneme that is similar to the phoneme.

12. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.

13. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting said at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, and wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein the same segmenting approach will be used to segment the speech input and the converted string of characters or the variation created from the converted string of characters, wherein the corresponding segments of the converted string of characters provided in segmented form and of the segmented speech input will be examined for common features and that the phoneme present in the segment of the converted train of characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value.

14. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein different segmenting approaches will be used to segment the speech input and the converted string of characters or the variation created from the converted string of characters, and wherein the corresponding segments of the converted string of characters provided in segmented form and of the segmented speech input will be examined for common features and that the phoneme present in the segment of the converted train of characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value.

15. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of said at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation crated from the converted string of characters, the speech input and the converted string of characters of the variation created will be segmented, wherein an explicit segmenting approach will be used to segment the converted string of characters or the variation created from the converted string of characters, and an implicit segmenting approach will be used to segment the speech input, and wherein the corresponding segments of the converted string of characters provided in segmented form and of the segmented speech input will be examined for common features and that the phoneme present in the segment of the converted train of characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value.

16. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein one variation of the converted string of characters is created by said creating step, and wherein said creating step will be executed at least one more time to create a new variation of the converted string of characters if in the outputting step the deviation of the variation from the speech input is always above the threshold value when the two are compared, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction of the string of characters will be stored with a reference to the string of characters.

17. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein at least two variations of the converted string of characters will be created by said creating step, wherein there is more than one variation of the converted string of characters having a deviation from the speech input that is below the threshold value, the variation of the converted string of characters with the smallest deviation from the speech input will be reproduced, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.

18. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.

19. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein the same segmenting approach will be used to segment the speech input and the converted string of characters or the variation created from the converted string of characters, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.

20. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein different segmenting approaches will be used to segment the speech input and the converted string of characters of the variation created from the converted string of characters, and wherein as soon as a variation of a string of characters has been determined to be worth of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a references to the string of characters.

21. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein an explicit segmenting approach will be used to segment the converted string of characters or the variation created from the converted string of characters, and an implicit segmenting approach will be used to segment the speech unit, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.

22. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein the corresponding segments of the converted string of characters provided in segmented form and of the generated speech input will be examined for common features, wherein the phoneme present in the segment of the converted string characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.

23. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
- when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
  
  when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
  
  outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein the corresponding segments of the converted string of characters provided in segmented form and of the generated speech input will be examined for common features, wherein the phoneme present in the segment of the converted string characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value, and wherein each phoneme is linked to at least one replacement phoneme that is similar to the phoneme, wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Corporation
Original Assignee
Nokia Corporation
Inventors
Dufhues, Frank, Buth, Peter
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/564,787
Time in Patent Office

1,068 Days
Field of Search

704/260, 704/270, 704/275
US Class Current

704/275
CPC Class Codes

G10L 13/04 Details of speech synthesis...

Text-based speech synthesis method containing synthetic speech comparisons and updates

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Text-based speech synthesis method containing synthetic speech comparisons and updates

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links