Text-based speech synthesis method containing synthetic speech comparisons and updates
First Claim
1. A reproduction method for voice-controlled systems with text-based speech synthesis, comprising the steps of:
- converting a stored string of characters described phonetically according to general rules into a pure synthetic form;
if there is an actually spoken speech input that corresponds to said stored string of characters, comparing said pure synthetic form of said string of characters with said speech input before reproduction of said string of characters;
if a deviation is detected in said pure synthetic form of said string of characters that has a value greater than a threshold value, creating at least one variation of said pure synthetic form of said string of characters;
comparing one of said variations with said speech input; and
outputting one of said variations instead of said pure synthetic form of said string of characters, if the deviation of one of said variations from said speech input is less than said threshold value.
9 Assignments
0 Petitions
Accused Products
Abstract
The invention specifies a simple reproduction method with improved pronunciation for voice-controlled systems with text-based speech synthesis even when the stored train of characters to be synthesized does not follow the general rules of speech reproduction. According to the invention, the method of “copying” the original spoken input text into the otherwise synthesized reproduction text, which is the current state of the art, is avoided, which will significantly increase the acceptance of the user of the voice-controlled system due to the process invented. More specifically, when there is actual spoken speech input that corresponds to a stored train of characters, the converted train of characters is compared to the speech input before reproduction of the train of characters described phonetically according to general rules and converted to a purely synthetic form. When the converted train of characters is found to deviate from the speech input by a value above a threshold value, at least one variation of the converted train of characters is created. This variation is then output instead of the converted train of characters as long as this variation deviates from the speech input by a value below the threshold value.
-
Citations
23 Claims
-
1. A reproduction method for voice-controlled systems with text-based speech synthesis, comprising the steps of:
-
converting a stored string of characters described phonetically according to general rules into a pure synthetic form;
if there is an actually spoken speech input that corresponds to said stored string of characters, comparing said pure synthetic form of said string of characters with said speech input before reproduction of said string of characters;
if a deviation is detected in said pure synthetic form of said string of characters that has a value greater than a threshold value, creating at least one variation of said pure synthetic form of said string of characters;
comparing one of said variations with said speech input; and
outputting one of said variations instead of said pure synthetic form of said string of characters, if the deviation of one of said variations from said speech input is less than said threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
wherein said creating step will be executed at least one more time to create a new variation of the converted string of characters if in said outputting step the deviation of the variation from the speech input is always above the threshold value when the two are compared. -
3. A method according to claim 2, wherein before comparing the speech input with the converted string of characters of the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented.
-
4. A reproduction method according to claim 1, wherein at least two variations of the converted string of characters will be created in said creating step and
wherein when there is more than one variation of the converted string of characters having a deviation from the speech input that is below the threshold value, the variation of the converted string of characters with the smallest deviation from the speech input will be reproduced. -
5. A method according to claim 4, wherein before comparing the speech input with the converted string of characters or the variation(s) created from the converted string of characters, the speech input and the converted train of characters or the variation created will be segmented.
-
6. A method according to claim 1, wherein before comparing the speech input with the converted string of characters or the variation(s) created from it, the speech input and the converted train of characters or the variation(s) created will be segmented.
-
7. A reproduction method according to claim 6, wherein the same segmenting approach will be used to segment the speech input and the converted string of characters or the variation created from the converted string of characters.
-
8. A reproduction method according to claim 6, wherein different segmenting approaches will be used to segment the speech input and the converted string of characters of the variation created from the converted string of characters.
-
9. A reproduction method according to claim 6, wherein an explicit segmenting approach will be used to segment the converted string of characters or the variation created from the converted string of characters, and an implicit segmenting approach will be used to segment the speech input.
-
10. A reproduction method according to claim 6, wherein the corresponding segments of the converted string of characters provided in segmented form and of the segmented speech input will be examined for common features, and
wherein the phoneme present in the segment of the converted string of characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value. -
11. A reproduction method according to claim 10, wherein each phoneme is linked to at least one replacement phoneme that is similar to the phoneme.
-
-
12. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.
-
-
13. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting said at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, and wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein the same segmenting approach will be used to segment the speech input and the converted string of characters or the variation created from the converted string of characters, wherein the corresponding segments of the converted string of characters provided in segmented form and of the segmented speech input will be examined for common features and that the phoneme present in the segment of the converted train of characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value.
-
-
14. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein different segmenting approaches will be used to segment the speech input and the converted string of characters or the variation created from the converted string of characters, and wherein the corresponding segments of the converted string of characters provided in segmented form and of the segmented speech input will be examined for common features and that the phoneme present in the segment of the converted train of characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value.
-
-
15. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of said at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation crated from the converted string of characters, the speech input and the converted string of characters of the variation created will be segmented, wherein an explicit segmenting approach will be used to segment the converted string of characters or the variation created from the converted string of characters, and an implicit segmenting approach will be used to segment the speech input, and wherein the corresponding segments of the converted string of characters provided in segmented form and of the segmented speech input will be examined for common features and that the phoneme present in the segment of the converted train of characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value.
-
-
16. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein one variation of the converted string of characters is created by said creating step, and wherein said creating step will be executed at least one more time to create a new variation of the converted string of characters if in the outputting step the deviation of the variation from the speech input is always above the threshold value when the two are compared, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction of the string of characters will be stored with a reference to the string of characters.
-
-
17. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein at least two variations of the converted string of characters will be created by said creating step, wherein there is more than one variation of the converted string of characters having a deviation from the speech input that is below the threshold value, the variation of the converted string of characters with the smallest deviation from the speech input will be reproduced, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.
-
-
18. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.
-
-
19. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein the same segmenting approach will be used to segment the speech input and the converted string of characters or the variation created from the converted string of characters, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.
-
-
20. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein different segmenting approaches will be used to segment the speech input and the converted string of characters of the variation created from the converted string of characters, and wherein as soon as a variation of a string of characters has been determined to be worth of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a references to the string of characters.
-
-
21. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein an explicit segmenting approach will be used to segment the converted string of characters or the variation created from the converted string of characters, and an implicit segmenting approach will be used to segment the speech unit, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.
-
-
22. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein the corresponding segments of the converted string of characters provided in segmented form and of the generated speech input will be examined for common features, wherein the phoneme present in the segment of the converted string characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value, and wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.
-
-
23. A reproduction method for voice-controlled systems with text-based speech synthesis, said reproduction method comprising the steps of:
-
when there is actual spoken speech input that corresponds to a stored string of characters, comparing a converted string of characters to the speech input before reproduction of the string of characters described phonetically according to general rules and converted to a purely synthetic form;
when a deviation is detected in the converted string of characters that has a value above a threshold value, creating at least one variation of the converted string of characters; and
outputting at least one variation of the converting string of characters having been created instead of the converted string of characters as long as the deviation of at least one variation of the converted string of characters having been from the speech input is below the threshold value when the two are compared, wherein before comparing the speech input with the converted string of characters or the variation created from the converted string of characters, the speech input and the converted string of characters or the variation created will be segmented, wherein the corresponding segments of the converted string of characters provided in segmented form and of the generated speech input will be examined for common features, wherein the phoneme present in the segment of the converted string characters will be replaced by a replacement phoneme when there is a deviation in two corresponding segments that is above the threshold value, and wherein each phoneme is linked to at least one replacement phoneme that is similar to the phoneme, wherein as soon as a variation of a string of characters has been determined to be worthy of reproduction, the peculiarities arising in conjunction with the reproduction of the string of characters will be stored with a reference to the string of characters.
-
Specification