Speech recognition method and apparatus using lexicon group tree

US 20060173673A1
Filed: 01/31/2006
Published: 08/03/2006
Est. Priority Date: 02/02/2005
Status: Active Grant

First Claim

Patent Images

1. A method of generating a lexicon group tree, comprising the steps of:

(a) generating a centroid lexicon representing lexicons belonging to a predetermined lexicon group;

(b) selecting two lexicons, having a longest distance therebetween in the lexicon group, using the centroid lexicon from the lexicon group, and branching a node indicating the lexicon group, based on the two selected lexicons; and

(c) selecting a node having low group similarity from among current terminal nodes, including branch nodes, and repeatedly performing steps (a) and (b) on a lexicon group indicated by the selected node.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and an apparatus for selecting a vocabulary closest to an input speech from among lexicons stored in memory, wherein a centroid lexicon representing lexicons belonging to a predetermined lexicon group is generated. Two lexicons, having a longest distance therebetween in the lexicon group, are selected using the centroid lexicon from the lexicon group, and a node indicating the lexicon group branches based on the two selected lexicons. A node having low group similarity is selected from among current terminal nodes, including branch nodes, and the above procedure is repeatedly performed on a lexicon group indicated by the selected node.

32 Citations

View as Search Results

28 Claims

1. A method of generating a lexicon group tree, comprising the steps of:
- (a) generating a centroid lexicon representing lexicons belonging to a predetermined lexicon group;
  
  (b) selecting two lexicons, having a longest distance therebetween in the lexicon group, using the centroid lexicon from the lexicon group, and branching a node indicating the lexicon group, based on the two selected lexicons; and
  
  (c) selecting a node having low group similarity from among current terminal nodes, including branch nodes, and repeatedly performing steps (a) and (b) on a lexicon group indicated by the selected node.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The lexicon group tree generation method according to claim 1, wherein the step (a) comprises the steps of:
    - (a1) initializing a virtual centroid lexicon of the lexicon group; and
      
      (a2) updating the centroid lexicon using the initialized centroid lexicon.
  - 3. The lexicon group tree generation method according to claim 2, wherein the step (a1) comprises the steps of:
    - (a11) multiplying a number smaller than 1 by the average number of states of the lexicon group, thus determining the initial number of states expressed by a predetermined integer;
      
      (a12) uniformly segmenting each lexicon, existing in the lexicon group, into states depending on the initial number of states;
      
      (a13) allocating the uniformly segmented states to states of the centroid lexicon; and
      
      (a14) obtaining virtual average models for respective states of the centroid lexicon.
  - 4. The lexicon group tree generation method according to claim 2, wherein the step (a2) comprises the steps of:
    - (a21) performing distance matching with each lexicon in the corresponding lexicon group, based on the initialized centroid lexicon;
      
      (a22) allocating states matched through the matching step to respective states of the centroid lexicon; and
      
      (a23) averaging models allocated to the states of the centroid lexicon and obtaining an average vector, thus updating the centroid lexicon.
  - 5. The lexicon group tree generation method according to claim 3, wherein the initial number of states increases as depth of a tree increases.
  - 6. The lexicon group tree generation method according to claim 1, wherein the step (b) comprises the steps of:
    - (b1) selecting a first lexicon having a longest distance to the centroid lexicon from the lexicon group;
      
      (b2) selecting a second lexicon having a longest distance to the first lexicon from the lexicon group; and
      
      (b3) bisecting remaining lexicons belonging to the lexicon group, based on the two selected lexicons.
  - 7. The lexicon group tree generation method according to claim 6, wherein the step (b3) comprises the step of allocating each of the lexicons of the lexicon group to a node on which a closer one of the two selected lexicons is based, and allocating a corresponding lexicon to two nodes on which the two lexicons are based when a distance between the centroid lexicon and the corresponding lexicon is within a predetermined threshold value.
  - 8. The lexicon group tree generation method according to claim 1, wherein the step (c) comprises the step of selecting all of two or more nodes, having group similarity lower than a predetermined threshold value, from among the current terminal nodes, thus repeatedly performing steps (a) and (b) on the selected nodes.
  - 9. The lexicon group tree generation method according to claim 1, wherein the distance between lexicons is determined by generating states of the two lexicons in a two-dimensional coordinate system and calculating cumulative distances at respective coordinate points.
  - 10. The lexicon group tree generation method according to claim 1, wherein the steps (a) and (b) are repeatedly performed until a variance, indicating group similarity, becomes lower than a predetermined threshold value and/or until the number of lexicons, belonging to a node, decreases to a predetermined number or less.

11. A method of recognizing speech, comprising the steps of:
- (a) segmenting an input acoustic signal into frames;
  
  (b) performing a feature transform on the segmented acoustic signal;
  
  (c) determining similarities between centroid lexicons, representing two branch nodes, and the feature-transformed acoustic signal, and selecting a node having higher similarity;
  
  (d) repeatedly performing step (c) until the selected node is a terminal node; and
  
  (e) loading a lexicon group of the terminal node if the selected node is the terminal node, and selecting a lexicon having higher similarity between the lexicon and the feature-transformed acoustic signal from the loaded lexicon group.
- View Dependent Claims (12, 13, 14)
- - 12. The speech recognition method according to claim 11, wherein the step (b) comprises the steps of:
    - transforming the frames into signal frames in a frequency domain; and
      
      linearly transforming the signal frames in the frequency domain into frames in a dimensional space in which features of input speech can be sufficiently exhibited.
  - 13. The speech recognition method according to claim 11, wherein the similarity is determined by a cumulative distance calculated between input speech and the centroid lexicon, or between the input speech and each lexicon belonging to the loaded lexicon group, the calculation of the cumulative distance being performed in frames.
  - 14. The speech recognition method according to claim 11, wherein the centroid lexicon means a virtual centroid lexicon or one of actual lexicons having a shortest distance to the virtual centroid lexicon.

15. A device for generating a lexicon group tree, comprising:
- a centroid lexicon generation unit for generating a centroid lexicon representing lexicons belonging to a predetermined lexicon group;
  
  a node branching determination unit for selecting a node having low group similarity from among current terminal nodes; and
  
  a node branching unit for selecting two lexicons, having a longest distance therebetween in the lexicon group, using the centroid lexicon from the lexicon group, and branching a node indicating the lexicon group, based on the two selected lexicons.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 16. The lexicon group tree generation device according to claim 15, wherein the centroid lexicon generation unit initializes a virtual centroid lexicon of the lexicon group, and updates the centroid lexicon using the initialized centroid lexicon.
  - 17. The lexicon group tree generation device according to claim 16, wherein the centroid lexicon generation unit multiplies a number smaller than 1 by the average number of states of the lexicon group to determine the initial number of states expressed by a predetermined integer, uniformly segments each lexicon, existing in the lexicon group, into states depending on the initial number of states, allocates the uniformly segmented states to states of the centroid lexicon, and obtains virtual average models for respective states of the centroid lexicon.
  - 18. The lexicon group tree generation device according to claim 16, wherein the centroid lexicon generation unit performs distance matching with each lexicon in the corresponding lexicon group, based on the initialized centroid lexicon, allocates states matched through the matching step to respective states of the centroid lexicon, and updates the centroid lexicon by averaging models allocated to the states of the centroid lexicon and obtaining an average vector.
  - 19. The lexicon group tree generation device according to claim 17, wherein the initial number of states increases as depth of a tree increases.
  - 20. The lexicon group tree generation device according to claim 15, wherein the node branching unit selects a first lexicon having a longest distance to the centroid lexicon from the lexicon group, selects a second lexicon having a longest distance to the first lexicon from the lexicon group, and bisects remaining lexicons belonging to the lexicon group, based on the two selected lexicons.
  - 21. The lexicon group tree generation device according to claim 20, wherein the node branching unit allocates each of the lexicons of the lexicon group to a node on which a closer one of the two selected lexicons is based, and allocates a corresponding lexicon to both the two nodes on which the two lexicons are based when a distance between the centroid lexicon and the corresponding lexicon is within a predetermined threshold value.
  - 22. The lexicon group tree generation device according to claim 15, wherein the node branching determination unit selects all nodes, having group similarity lower than a predetermined threshold value, from among the current terminal nodes.
  - 23. The lexicon group tree generation device according to claim 15, wherein the distance between lexicons is determined by generating states of the two lexicons in a two-dimensional coordinate system and calculating cumulative distances at respective coordinate points.
  - 24. The lexicon group tree generation device according to claim 15, wherein the node branching is repeatedly performed until a variance, indicating group similarity, becomes lower than a predetermined threshold value and/or until the number of lexicons, belonging to a node, decreases to a predetermined number or less.

25. A device for recognizing speech, comprising:
- a frame segmentation unit for segmenting an input acoustic signal into frames;
  
  a feature transform unit for performing a feature transform on the segmented acoustic signal;
  
  a node branching determination unit for repeatedly performing a procedure of determining similarities between centroid lexicons, representing two branch nodes, and the feature-transformed acoustic signal and selecting a node having higher similarity until the selected node is a terminal node; and
  
  a lexicon selection unit for loading a lexicon group of the terminal node if the selected node is the terminal node, and selecting a lexicon having higher similarity between the lexicon and the feature-transformed acoustic signal from the loaded lexicon group.
- View Dependent Claims (26, 27, 28)
- - 26. The speech recognition device according to claim 25, wherein the feature transform unit transforms the frames into signal frames in a frequency domain, and then linearly transforms the signal frames in the frequency domain into frames in a dimensional space in which features of input speech can be sufficiently exhibited.
  - 27. The speech recognition device according to claim 25, wherein the similarity is determined by a cumulative distance calculated between input speech and the centroid lexicon, or between the input speech and each lexicon belonging to the loaded lexicon group, the calculation of the cumulative distance being performed in frames.
  - 28. The speech recognition device according to claim 25, wherein the centroid lexicon means a virtual centroid lexicon or one of actual lexicons having a shortest distance to the virtual centroid lexicon.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Choi, In-jeong, Han, Ick-sang, Jeong, Sang-bae, Kim, Jeong-su

Granted Patent

US 7,953,594 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/279 Recognition of textual enti...

G10L 15/197 Probabilistic grammars, e.g...

Speech recognition method and apparatus using lexicon group tree

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

32 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method and apparatus using lexicon group tree

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links