Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same
First Claim
Patent Images
1. A dialog prosody structure generating method comprising:
- generating discourse information based on a speech act of a user utterance for a semantic structure of a system utterance corresponding to the user utterance;
generating prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit for the discourse information of the semantic structure, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and
generating an intonation pattern for the semantic structure of the system utterance based on the prosody information using by at least one computer system,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
A dialog prosody structure generating method and apparatus, and a speech synthesis method and system employing the dialog prosody structure generation method and apparatus, are provided. The speech synthesis method using the dialog prosody structure generation method includes: determining a system speaking style based on a user utterance; if the system speaking style is dialog speech, generating dialog prosody information by reflecting discourse information between a user and a system; and synthesizing a system utterance based on the generated dialog prosody information.
25 Citations
25 Claims
-
1. A dialog prosody structure generating method comprising:
-
generating discourse information based on a speech act of a user utterance for a semantic structure of a system utterance corresponding to the user utterance; generating prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit for the discourse information of the semantic structure, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and generating an intonation pattern for the semantic structure of the system utterance based on the prosody information using by at least one computer system, wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance. - View Dependent Claims (2, 3, 4, 5, 25)
-
-
6. A dialog prosody structure generating apparatus comprising:
-
a dialog information database which manages an entire dialog between a user and a system, and stores information and a dialog history required to proceed with the dialog based on speech acts and intention; a discourse information generation unit which generates semantic units of a system utterance corresponding to a user utterance by referring to the dialog information database, and generates discourse information for each semantic unit based on a speech act of the user utterance; a prosody information generation unit which generates prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit, for the discourse information of each semantic unit, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and an intonation pattern generation unit which generates an intonation pattern for each semantic unit based on the prosody information using by at least one computer system, wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance. - View Dependent Claims (7, 8)
-
-
9. A speech synthesis method comprising:
-
determining a system speaking style based on a user utterance; generating dialog prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit by reflecting discourse information between a user and a system when the system speaking style is determined as dialog speech, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and synthesizing a system utterance based on the generated dialog prosody information using by at least one computer system, wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the synthesized system utterance varies according to the speech act classification of the user utterance. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A speech synthesis system comprising:
-
a dialog information database which manages an entire dialog between a user and a system, and stores information and a dialog history required to proceed with the dialog based on speech acts and intention; a system speaking style determination unit which, by referring to the dialog information database, determines a system speaking style based on a user utterance; a dialog prosody generation unit which, when the system speaking style is determined as dialog speech, generates dialog prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit by referring to the dialog information database and reflecting discourse information between a user and the system, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and a synthesis unit which synthesizes a system utterance based on the generated dialog prosody information using by at least one computer system, wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the synthesized system utterance varies according to the speech act classification of the user utterance. - View Dependent Claims (18)
-
-
19. A non-transitory computer-readable recording medium having embodied thereon a computer program used by at least one computer system for executing a dialog prosody structure generating method, the method comprising:
-
generating discourse information based on a speech act of a user utterance for a semantic structure of a system utterance corresponding to the user utterance; generating prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit for the discourse information of the semantic structure, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and generating an intonation pattern for the semantic structure of a system utterance based on the prosody information, wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance.
-
-
20. A non-transitory computer-readable recording medium having embodied thereon a computer program used by at least one computer system for executing a speech synthesis method, the method comprising:
-
determining a system speaking style based on a user utterance; generating dialog prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit by reflecting discourse information between a user and a system when the system speaking style is determined as dialog speech, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and synthesizing a system utterance based on the generated dialog prosody information, wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the synthesized system utterance varies according to the speech act classification of the user utterance.
-
-
21. A prosody structure generation apparatus comprising:
-
a dialog information database which manages an entire dialog between a user and a system, and stores information and a dialog history required for the dialog to proceed based on a speech acts and intention; a system speaking style determination unit which determines a speech act and intention by analyzing a user utterance obtained through a speech recognition process with reference to the dialog information database, and determines the system speaking style as either read speech or dialog speech according to the determined speech act and intention associated with the user utterance; and a dialog prosody generation unit including a discourse information generation unit, a prosody information generation unit, and an intonation pattern generation unit, wherein the discourse information generation unit receives a user utterance from the system speaking style determination unit and generates a discourse information structure in which a different emphasis part is set according to whether the speech act and included semantic unit of a system utterance corresponding to whether the user utterance is new information or old information, wherein the prosody information generation unit receives discourse information structure from the discourse information generation unit, and a semantic structure, a sentence structure, and a morpheme structure of a system utterance, and generates prosody information in which an emphasis tag including an utterance boundary level, accent, and utterance duration is set on the basis of the types of semantic words, a closeness between polymorphemes, and a number of syllables that can be spoken at a time, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case, and wherein the intonation pattern generation unit receives inputs of the semantic structure of a system utterance including prosody information, extracts a plurality of characteristics in each semantic unit and compares the plurality of characteristics with the characteristics of each semantic unit of an intonation pattern database with contents of characteristics of each semantic unit and their index, searches for a semantic unit having closest characteristics, and generates an intonation pattern according to a result of the search using by at least one computer system, wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance. - View Dependent Claims (22, 23, 24)
-
Specification