Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis
First Claim
Patent Images
1. A computer-implemented method for text-to-speech (TTS) synthesis, comprising:
- in response to a word of a text sequence, generating a first part-of-speech POS tag using a statistical POS tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence, wherein the first POS tag is selected from a first POS tag set;
generating a second POS tag using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence, wherein the second POS tag is selected from a second POS tag set that is different from the first POS tag set;
calculating a first confidence score for the second POS tag based on a statistic data of applying a rule associated with the second POS tag, wherein the first confidence score is calculated based on a percentage of successful applications of the rule in previous TTS synthesis;
designating the second POS tag as the final POS tag if the first confidence score is greater than or equal to a first predetermined threshold;
designating the first POS tag as the final POS tag if the first confidence score is less than the first predetermined threshold;
assigning a final POS tag to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag;
adjusting the first confidence score for the rule for future TTS synthesis based on whether the second POS tag has been selected as the final POS tag; and
removing the rule from the set of one or more rules if the first confidence score is below a second predetermined threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
In response to a word of a text sequence, a first part-of-speech (POS) tag is generated using a statistical part-of-speech (POS) tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence. A second POS tag is generated using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence. A final POS tag is assigned to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag.
934 Citations
17 Claims
-
1. A computer-implemented method for text-to-speech (TTS) synthesis, comprising:
-
in response to a word of a text sequence, generating a first part-of-speech POS tag using a statistical POS tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence, wherein the first POS tag is selected from a first POS tag set; generating a second POS tag using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence, wherein the second POS tag is selected from a second POS tag set that is different from the first POS tag set; calculating a first confidence score for the second POS tag based on a statistic data of applying a rule associated with the second POS tag, wherein the first confidence score is calculated based on a percentage of successful applications of the rule in previous TTS synthesis; designating the second POS tag as the final POS tag if the first confidence score is greater than or equal to a first predetermined threshold; designating the first POS tag as the final POS tag if the first confidence score is less than the first predetermined threshold; assigning a final POS tag to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag; adjusting the first confidence score for the rule for future TTS synthesis based on whether the second POS tag has been selected as the final POS tag; and removing the rule from the set of one or more rules if the first confidence score is below a second predetermined threshold. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A non-transitory machine-readable storage medium having instructions stored therein, which when executed by a machine, cause the machine to perform a method for text-to-speech (TTS) synthesis, the method comprising:
-
in response to a word of a text sequence, generating a first part-of-speech (POS) tag using a statistical POS tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence, wherein the first POS tag is selected from a first POS tag set; generating a second POS tag using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence, wherein the second POS tag is selected from a second POS tag set that is different from the first POS tag set; calculating a first confidence score for the second POS tag based on a statistic data of applying a rule associated with the second POS tag, wherein the first confidence score is calculated based on a percentage or successful applications of the rule in previous TTS synthesis; designating the second POS tag as the final POS tag if the first confidence score is greater than or equal to a first predetermined threshold; designating the first POS tag as the final POS tag if the first confidence score is less than the first predetermined threshold; assigning a final POS tag to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag; adjusting the first confidence score for the rule for future TTS synthesis based on whether the second POS tag has been selected as the final POS tag; and removing the rule from the set of one or more rules if the first confidence score is below a second predetermined threshold. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-implemented method for text-to-speech (TTS) synthesis, the method comprising:
-
in response to a word of a text sequence, generating a first part-of-speech (POS) tag using a statistical POS tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence, wherein the first POS tag is selected from a first POS tag set; generating a second POS tag using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence, wherein the second POS tag is selected from a second POS tag set that is different from the first POS tag set; converting the second POS tag to a corresponding tag in the first POS tag set; and assigning a final POS tag to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag. - View Dependent Claims (12)
-
-
13. A computer-implemented method for text-to-speech (TTS) synthesis, the method comprising:
-
in response to a word of a text sequence, generating a first part-of-speech (POS) tag using a statistical POS tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence, wherein the first POS tag is selected from a first POS tag set; generating a second POS tag using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence, wherein the second POS tag is selected from a second POS tag set that is different from the first POS tag set; converting the first POS tag to a corresponding tag in the second POS tag set; and assigning a final POS tag to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag.
-
-
14. A computer-implemented method for text-to-speech (TTS) synthesis, the method comprising:
-
in response to a word of a text sequence, generating a first part-of-speech (POS) tag using a statistical POS tagger; generating a second POS tag using a rule-based POS tagger; calculating a confidence score for the second POS tag based on a statistic data of applying a rule associated with the second POS tag, assigning a final POS tag to the word of the text sequence for TTS synthesis, including; assigning the second POS tag as the final POS tag if the confidence score is greater than or equal to a first predetermined threshold; and assigning the first POS tag as the final POS tag if the confidence score is less than the first predetermined threshold; adjusting the confidence score for the rule for future TTS synthesis based on whether the second POS tag has been selected as the final POS tag; and removing the rule from the set of one or more rules if the confidence score is below a second predetermined threshold. - View Dependent Claims (15, 16)
-
-
17. A system, comprising:
-
one or more processors; and memory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the processors to perform operations comprising; in response to a word of a text sequence, generating a first part-of-speech (POS) tag using a statistical POS tagger; generating a second POS tag using a rule-based POS tagger; calculating a confidence score for the second POS tag based on a statistic data of applying a rule associated with the second POS tag; assigning a final POS tag to the word of the text sequence for TTS synthesis, including; assigning the second POS tag as the final POS tag if the confidence score is greater than or equal to a first predetermined threshold; and assigning the first POS tag as the final POS tag if the confidence score is less than the first predetermined threshold; adjusting the confidence score for the rule for future TTS synthesis based on whether the second POS tag has been selected as the final POS tag; and removing the rule from the set of one or more rules if the confidence score is below a second predetermined threshold.
-
Specification