Apparatus, method, and medium for generating grammar network for use in speech recognition and dialogue speech recognition

US 20060173686A1
Filed: 02/01/2006
Published: 08/03/2006
Est. Priority Date: 02/01/2005
Status: Active Grant

First Claim

Patent Images

1. An apparatus for generating a grammar network for speech recognition comprising:

a dialogue history storage unit storing a dialogue history between a system and a user;

a semantic map formed by clustering words forming each dialogue sentence included in a dialogue sentence corpus depending on semantic correlation, and generating a first candidate group formed of a plurality of words having the semantic correlation extracted for each word forming a dialogue sentence provided from the dialogue history storage unit;

an acoustic map formed by clustering words forming each dialogue sentence included in the dialogue sentence corpus depending on acoustic similarity, and generating a second candidate group formed of a plurality of words having an acoustic similarity extracted for each word forming the dialogue sentence provided from the dialogue history storage unit and each word of the first candidate group; and

a grammar network construction unit constructing a grammar network by combining words included in the first candidate group and the words included in the second candidate group.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, apparatus, and medium for generating a grammar network for speech recognition and a dialogue speech recognition are provided. A method, apparatus, and medium for employing the same are provided. The apparatus for generating a grammar network for speech recognition includes: a dialogue history storage unit storing a dialogue history between a system and a user; a semantic map formed by clustering words forming each dialogue sentence included in a dialogue sentence corpus depending on semantic correlation, and generating a first candidate group formed of a plurality of words having the semantic correlation extracted for each word forming a dialogue sentence provided from the dialogue history storage unit; a sound map formed by clustering words forming each dialogue sentence included in the dialogue sentence corpus depending on acoustic similarity, and generating a second candidate group formed of a plurality of words having an acoustic similarity extracted for each word forming the dialogue sentence provided from the dialogue history storage unit and each word of the first candidate group; and a grammar network construction unit constructing a grammar network by combining the first candidate group and the second candidate group.

31 Citations

View as Search Results

24 Claims

1. An apparatus for generating a grammar network for speech recognition comprising:
- a dialogue history storage unit storing a dialogue history between a system and a user;
  
  a semantic map formed by clustering words forming each dialogue sentence included in a dialogue sentence corpus depending on semantic correlation, and generating a first candidate group formed of a plurality of words having the semantic correlation extracted for each word forming a dialogue sentence provided from the dialogue history storage unit;
  
  an acoustic map formed by clustering words forming each dialogue sentence included in the dialogue sentence corpus depending on acoustic similarity, and generating a second candidate group formed of a plurality of words having an acoustic similarity extracted for each word forming the dialogue sentence provided from the dialogue history storage unit and each word of the first candidate group; and
  
  a grammar network construction unit constructing a grammar network by combining words included in the first candidate group and the words included in the second candidate group.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The apparatus of claim 1, wherein the dialogue history storage unit stores contents of a latest dialogue, and the stored contents are updated as the dialogue proceeds.
  - 3. The apparatus of claim 1, wherein the semantic map and the acoustic map are activated by a dialogue sentence most recently recognized by the system and a dialogue sentence most recently output by the system among the dialogue history stored in the dialogue history storage unit.
  - 4. The apparatus of claim 1, wherein each word is a basic element forming each dialogue sentence, each word is a word or a word string formed of one or more syllables, and each word comprises a single meaning and a single pronunciation as a pair.
  - 5. The apparatus of claim 1, wherein the dialogue sentence corpus is obtained by arranging all contents available between the system and the user, as sequential dialogue sentences, comprising a variety of usages, in the form of a database.

6. A method of generating a grammar network for speech recognition comprising:
- forming a semantic map by clustering words forming each dialogue sentence included in a dialogue sentence corpus depending on semantic correlation;
  
  forming an acoustic map by clustering words forming each dialogue sentence included in the dialogue sentence corpus depending on acoustic similarity;
  
  activating the semantic map and generating a first candidate group formed of a plurality of words having the semantic correlation extracted for each word forming a dialogue sentence included in a dialogue history performed between a system and a user;
  
  activating the acoustic map and generating a second candidate group formed of a plurality of words having an acoustic similarity extracted for each word forming the dialogue sentence included in the dialogue history and each word of the first candidate group; and
  
  generating a grammar network by combining the first candidate group and the second candidate group.
- View Dependent Claims (7, 8, 9)
- - 7. The method of claim 6, wherein the semantic map and the acoustic map are activated whenever words are uttered by the user.
  - 8. The method of claim 6, wherein the first and second candidate groups are formed of words having acoustic similarity and semantic correlation with words included in a dialogue sentence that have been recognized most recently by the system and words included in a dialogue sentence that have been output most recently by the system.
  - 9. The method of claim 6, wherein each word is a basic element forming each dialogue sentence, each word is a word or a word string formed with of one or more syllables, and each word comprises a single meaning and a single pronunciation as a pair.

10. An apparatus for speech recognition comprising:
- a feature extraction unit extracting features from a user'"'"'s voice and generating a feature vector string;
  
  a grammar network generation unit generating a grammar network by activating a semantic map and an acoustic map by using contents of a dialogue most recently spoken, whenever the user speaks;
  
  a loading unit loading the grammar network generated by the grammar network generation unit; and
  
  a searching unit searching the grammar network loaded in the loading unit, by using the feature vector string, and generating a candidate recognition sentence formed of a word string matching the feature vector string.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The apparatus of claim 10, wherein the grammar network generation unit comprises:
    - a dialogue history storage unit storing a dialogue history between the system and the user;
      
      a semantic map formed by clustering words forming each dialogue sentence included in a dialogue sentence corpus depending on semantic correlation, and generating a first candidate group formed of a plurality of words having the semantic correlation extracted for each word forming a dialogue sentence provided from the dialogue history storage unit;
      
      an acoustic map formed by clustering words forming each dialogue sentence included in the dialogue sentence corpus depending on acoustic similarity, and generating a second candidate group formed of a plurality of words having an acoustic similarity extracted for each word forming the dialogue sentence provided from the dialogue history storage unit and each word of the first candidate group; and
      
      a grammar network construction unit constructing a grammar network by combining words included in the first candidate group and the words included in the second candidate group.
  - 12. The apparatus of claim 11, wherein the dialogue history storage unit stores contents of a latest dialogue, and the stored contents are updated as the dialogue proceeds.
  - 13. The apparatus of claim 11, wherein the semantic map and the acoustic map are activated by a dialogue sentence most recently recognized by the system and a dialogue sentence most recently output by the system among the dialogue history stored in the dialogue history storage unit.
  - 14. The apparatus of claim 11, wherein each word is a basic element forming each dialogue sentence, each word is a word or a word string formed with one or more syllables, and each word comprises a single meaning and a single pronunciation as a pair.
  - 15. The apparatus of claim 11, wherein the dialogue sentence corpus is obtained by arranging all contents available between the system and the user, as sequential dialogue sentences, comprising a variety of usages, in the form of a database.

16. A method of speech recognition comprising:
- extracting features from a user'"'"'s voice and generating a feature vector string;
  
  generating a grammar network by activating a semantic map and an acoustic map by using contents of a dialogue most recently spoken, whenever the user speaks;
  
  loading the grammar network; and
  
  searching the loaded grammar network, by using the feature vector string, and generating a candidate recognition sentence formed of a word string matching the feature vector string.
- View Dependent Claims (17, 18, 19)
- - 17. The method of claim 16, wherein the generation of the grammar network comprises:
    - forming a semantic map by clustering words forming each dialogue sentence included in a dialogue sentence corpus depending on semantic correlation;
      
      forming an acoustic map by clustering words forming each dialogue sentence included in the dialogue sentence corpus depending on acoustic similarity;
      
      activating the semantic map and generating a first candidate group formed of a plurality of words having the semantic correlation extracted for each word forming a dialogue sentence included in a dialogue history performed between a system and a user;
      
      activating the acoustic map and generating a second candidate group formed of a plurality of words having an acoustic similarity extracted for each word forming the dialogue sentence included in the dialogue history and each word of the first candidate group; and
      
      generating a grammar network by combining the first candidate group and the second candidate group.
  - 18. The method of claim 17, wherein the first and second candidate groups are formed of words having acoustic similarity and semantic correlation with words included in a dialogue sentence that has been recognized most recently by the system and words included in a dialogue sentence that has been output most recently by the system.
  - 19. The method of claim 17, wherein each word is a basic element forming each dialogue sentence, and is a word or a word string formed of one or more syllables, and comprises as a pair of a single meaning and a single pronunciation.

20. At least one computer readable medium storing instructions that control at least one processor for executing a method of generating a grammar network for speech recognition, wherein the method comprises:
- forming a semantic map by clustering words forming each dialogue sentence included in a dialogue sentence corpus depending on semantic correlation;
  
  forming an acoustic map by clustering words forming each dialogue sentence included in the dialogue sentence corpus depending on acoustic similarity;
  
  activating the semantic map and generating a first candidate group formed of a plurality of words having the semantic correlation extracted for each word forming a dialogue sentence included in a dialogue history performed between a system and a user;
  
  activating the acoustic map and generating a second candidate group formed of a plurality of words having an acoustic similarity extracted for each word forming the dialogue sentence included in the dialogue history and each word of the first candidate group; and
  
  generating a grammar network by combining the first candidate group and the second candidate group.

21. At least one computer readable medium storing instructions that control at least one processor for executing a method of speech recognition, wherein the method comprises:
- extracting features from a user'"'"'s voice and generating a feature vector string;
  
  generating a grammar network by activating a semantic map and an acoustic map by using contents of a dialogue most recently spoken, whenever the user speaks;
  
  loading the grammar network; and
  
  searching the loaded grammar network, by using the feature vector string, and generating a candidate recognition sentence formed of a word string matching the feature vector string.

22. A method of speech recognition comprising:
- extracting features from a user'"'"'s voice and generating a feature vector string;
  
  generating a grammar network by activating a semantic map and an acoustic map by using contents of a dialogue spoken by a user; and
  
  searching the grammar network, by using the feature vector string, and generating a candidate recognition sentence formed of a word string matching the feature vector string.

23. The method of claim 23, wherein the generation of a grammar network comprises combining first candidate group formed by activation of the semantic map and second candidate group formed by activation of the acoustic map.

24. At least one computer readable medium storing instructions that control at least one processor for executing a method of speech recognition, wherein the method comprises:
- extracting features from a user'"'"'s voice and generating a feature vector string;
  
  generating a grammar network by activating a semantic map and an acoustic map by using contents of a dialogue spoken by a user; and
  
  searching the grammar network, by using the feature vector string, and generating a candidate recognition sentence formed of a word string matching the feature vector string.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Hwang, Kwangil

Granted Patent

US 7,606,708 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/257
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/183   using context dependencies,...

G10L 15/19   Grammatical context, e.g. d...

Apparatus, method, and medium for generating grammar network for use in speech recognition and dialogue speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

31 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus, method, and medium for generating grammar network for use in speech recognition and dialogue speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links