Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same

US 8,234,118 B2
Filed: 05/19/2005
Issued: 07/31/2012
Est. Priority Date: 05/21/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A dialog prosody structure generating method comprising:

generating discourse information based on a speech act of a user utterance for a semantic structure of a system utterance corresponding to the user utterance;

generating prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit for the discourse information of the semantic structure, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and

generating an intonation pattern for the semantic structure of the system utterance based on the prosody information using by at least one computer system,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dialog prosody structure generating method and apparatus, and a speech synthesis method and system employing the dialog prosody structure generation method and apparatus, are provided. The speech synthesis method using the dialog prosody structure generation method includes: determining a system speaking style based on a user utterance; if the system speaking style is dialog speech, generating dialog prosody information by reflecting discourse information between a user and a system; and synthesizing a system utterance based on the generated dialog prosody information.

25 Citations

View as Search Results

25 Claims

1. A dialog prosody structure generating method comprising:
- generating discourse information based on a speech act of a user utterance for a semantic structure of a system utterance corresponding to the user utterance;
  
  generating prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit for the discourse information of the semantic structure, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and
  
  generating an intonation pattern for the semantic structure of the system utterance based on the prosody information using by at least one computer system,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance.
- View Dependent Claims (2, 3, 4, 5, 25)
- - 2. The method of claim 1, further comprising adjusting an emphasis tag on repeated information when the semantic structure of a current system utterance is identical to that of a previous system utterance.
  - 3. The method of claim 1, further comprising adjusting an emphasis tag on repeated information when the semantic structure and a surface structure of a current system utterance are identical to those of a previous system utterance.
  - 4. The method of claim 1, wherein generating discourse information includes:
    - analyzing the semantic structure of the system utterance corresponding to the user utterance by referring to a dialog database, and generating semantic units;
      
      selecting a semantic unit to be emphasized based on the speech act of the user utterance, and adding a first emphasis tag; and
      
      generating a discourse information structure by combining the semantic unit to which the first emphasis tag is added with the remaining semantic units.
  - 5. The method of claim 1, wherein generating prosody information includes:
    - setting a different utterance boundary level depending on whether a semantic unit of the system utterance is new information or old information;
      
      adjusting the utterance boundary level according to closeness between semantic units;
      
      readjusting the utterance boundary level based on the number of syllables capable of being spoken at one time; and
      
      generating prosody information by adding the readjusted utterance boundary level, an accent, and a speech duration time, as a second emphasis tag, to each semantic unit.
  - 25. The method of claim 1, wherein the speech act classification of the user utterance includes one of an interrogatory, an answer to an interrogatory, a statement to inform, a greeting, and a request.

6. A dialog prosody structure generating apparatus comprising:
- a dialog information database which manages an entire dialog between a user and a system, and stores information and a dialog history required to proceed with the dialog based on speech acts and intention;
  
  a discourse information generation unit which generates semantic units of a system utterance corresponding to a user utterance by referring to the dialog information database, and generates discourse information for each semantic unit based on a speech act of the user utterance;
  
  a prosody information generation unit which generates prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit, for the discourse information of each semantic unit, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and
  
  an intonation pattern generation unit which generates an intonation pattern for each semantic unit based on the prosody information using by at least one computer system,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance.
- View Dependent Claims (7, 8)
- - 7. The apparatus of claim 6, further comprising a repeated information application unit which, by referring to the dialog information database, adjusts an emphasis tag on repeated information when the semantic structure of a current system utterance is identical to that of a previous system utterance.
  - 8. The apparatus of claim 6, further comprising a repeated information application unit which, by referring to the dialog information database, adjusts an emphasis tag on repeated information when the semantic structure and a surface structure of a current system utterance are identical to those of a previous system utterance.

9. A speech synthesis method comprising:
- determining a system speaking style based on a user utterance;
  
  generating dialog prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit by reflecting discourse information between a user and a system when the system speaking style is determined as dialog speech, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and
  
  synthesizing a system utterance based on the generated dialog prosody information using by at least one computer system,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the synthesized system utterance varies according to the speech act classification of the user utterance.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9, further comprising synthesizing a system utterance based on prosody information generated according to a rule when the system speaking style is determined as read speech.
  - 11. The method of claim 9, wherein determining the system speaking style includes:
    - determining a speech act and intention associated with the user utterance by referring to a dialog information database; and
      
      determining the system speaking style as one of read speech and dialog speech according to the speech act and intention associated with the user utterance.
  - 12. The method of claim 9, wherein generating dialog prosody information includes:
    - generating discourse information for the semantic structure of the system utterance based on the speech act of the user utterance;
      
      generating prosody information, including an utterance boundary level, for the discourse information of the semantic structure; and
      
      generating an intonation pattern for the semantic structure of the system utterance based on the prosody information.
  - 13. The method of claim 12, wherein generating dialog prosody information includes adjusting an emphasis tag on repeated information when the semantic structure of a current system utterance is identical to that of a previous system utterance.
  - 14. The method of claim 12, wherein generating dialog prosody information includes adjusting an emphasis tag on repeated information when the semantic structure and a surface structure of a current system utterance are identical to those of a previous system utterance.
  - 15. The method of claim 12, wherein generating discourse information includes:
    - generating semantic units by referring to a dialog information database and analyzing the semantic structure of the system utterance corresponding to the user utterance;
      
      selecting a semantic unit to be emphasized based on the speech act of the user utterance, and adding a first emphasis tag to the semantic unit; and
      
      generating a discourse information structure by combining the semantic unit to which the first emphasis tag is added with the remaining semantic units.
  - 16. The method of claim 15, wherein adding the fist emphasis tag comprises:
    - setting a different utterance boundary level depending on whether a semantic unit of the system utterance is new information or old information;
      
      adjusting the utterance boundary level according to a closeness between the semantic units;
      
      readjusting the utterance boundary level based on the number of syllables capable of being spoken at one time; and
      
      generating prosody information by adding the adjusted utterance boundary level, an accent, and a speech duration time, as a second emphasis tag, to each semantic unit.

17. A speech synthesis system comprising:
- a dialog information database which manages an entire dialog between a user and a system, and stores information and a dialog history required to proceed with the dialog based on speech acts and intention;
  
  a system speaking style determination unit which, by referring to the dialog information database, determines a system speaking style based on a user utterance;
  
  a dialog prosody generation unit which, when the system speaking style is determined as dialog speech, generates dialog prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit by referring to the dialog information database and reflecting discourse information between a user and the system, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and
  
  a synthesis unit which synthesizes a system utterance based on the generated dialog prosody information using by at least one computer system,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the synthesized system utterance varies according to the speech act classification of the user utterance.
- View Dependent Claims (18)
- - 18. The system of claim 17, further comprising a system utterance generation unit which, when the system speaking style is determined as read speech, generates a system utterance corresponding to the user utterance by referring to the dialog information database and provides the generated system utterance to the synthesis unit.

19. A non-transitory computer-readable recording medium having embodied thereon a computer program used by at least one computer system for executing a dialog prosody structure generating method, the method comprising:
- generating discourse information based on a speech act of a user utterance for a semantic structure of a system utterance corresponding to the user utterance;
  
  generating prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit for the discourse information of the semantic structure, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and
  
  generating an intonation pattern for the semantic structure of a system utterance based on the prosody information,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance.

20. A non-transitory computer-readable recording medium having embodied thereon a computer program used by at least one computer system for executing a speech synthesis method, the method comprising:
- determining a system speaking style based on a user utterance;
  
  generating dialog prosody information including an utterance boundary level indicating a duration of a silent period between each semantic unit by reflecting discourse information between a user and a system when the system speaking style is determined as dialog speech, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case; and
  
  synthesizing a system utterance based on the generated dialog prosody information,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the synthesized system utterance varies according to the speech act classification of the user utterance.

21. A prosody structure generation apparatus comprising:
- a dialog information database which manages an entire dialog between a user and a system, and stores information and a dialog history required for the dialog to proceed based on a speech acts and intention;
  
  a system speaking style determination unit which determines a speech act and intention by analyzing a user utterance obtained through a speech recognition process with reference to the dialog information database, and determines the system speaking style as either read speech or dialog speech according to the determined speech act and intention associated with the user utterance; and
  
  a dialog prosody generation unit including a discourse information generation unit, a prosody information generation unit, and an intonation pattern generation unit,wherein the discourse information generation unit receives a user utterance from the system speaking style determination unit and generates a discourse information structure in which a different emphasis part is set according to whether the speech act and included semantic unit of a system utterance corresponding to whether the user utterance is new information or old information,wherein the prosody information generation unit receives discourse information structure from the discourse information generation unit, and a semantic structure, a sentence structure, and a morpheme structure of a system utterance, and generates prosody information in which an emphasis tag including an utterance boundary level, accent, and utterance duration is set on the basis of the types of semantic words, a closeness between polymorphemes, and a number of syllables that can be spoken at a time, wherein the utterance boundary level is adjusted based on closeness between semantic units, which is determined by syntax and case, andwherein the intonation pattern generation unit receives inputs of the semantic structure of a system utterance including prosody information, extracts a plurality of characteristics in each semantic unit and compares the plurality of characteristics with the characteristics of each semantic unit of an intonation pattern database with contents of characteristics of each semantic unit and their index, searches for a semantic unit having closest characteristics, and generates an intonation pattern according to a result of the search using by at least one computer system,wherein the speech act provides a speech act classification of the user utterance, so that even when speech acts are identical, the generated intonation pattern varies according to the speech act classification of the user utterance.
- View Dependent Claims (22, 23, 24)
- - 22. The apparatus of claim 21, wherein the dialog information database expresses system utterances corresponding to user utterances that are input, according to the speech act and intention, and stores the system utterances as elements of a database.
  - 23. The apparatus of claim 21, wherein the system speaking style determination unit sets a criterion to determine a system speaking style, by determining the criterion statistically or experimentally in advance, corresponding to speech act and intention.
  - 24. The apparatus of claim 21, wherein the a dialog prosody generation unit includes a repeated information application unit which adds, by referring to the dialog history stored in the dialog information database, an extended pitch range to the emphasis tag or adjusts an already set accent or utterance duration, depending on whether a current system utterance has the same meaning as the previous system utterance, and provides finally generated prosody information to the intonation pattern generation unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Pyo, Kyoungnan, Lee, Jaewon
Primary Examiner(s)
Godbold, Douglas
Assistant Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US11/132,195
Publication Number

US 20050261905A1
Time in Patent Office

2,630 Days
Field of Search

704/258, 704/260, 704/268, 704/257, 704/231, 704/253, 704/215, 704/267, 704/251, 704/252, 704/255, 704/261, 704/243, 704/219, 704/275, 704/270, 704/270.1, 704/4, 704/9, 704/10
US Class Current

704/260
CPC Class Codes

G10L 13/10   Prosody rules derived from ...

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

25 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for generating dialog prosody structure, and speech synthesis method and system employing the same

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links