State transition model design method and voice recognition method and apparatus using same
First Claim
1. A method of recognizing patterns in an input speech signal using a designed state transition model in which a state shared structure of the state transition model is designed, the method comprising:
- a step of inputting a speech signal;
a step of recognizing patterns in the input speech signal using a designed state transition model, the designing of the state transition model comprising;
a step of setting the states of a triphone state transition model in an acoustic space as initial clusters;
a clustering step of generating a cluster containing said initial clusters by top-down clustering;
a step of determining a state shared structure by assigning a short distance cluster among clusters generated by said clustering step, to the state transition model; and
a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure; and
a step of outputting a speech signal representing the pattern recognized in said recognizing step.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of designing a state transition model capable of high speed voice recognition and a voice recognition method and apparatus using the state transition model is provided. The methods provide a state transition model in which a state shared structure of the state transition model is designed. The method includes a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model, and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.
89 Citations
11 Claims
-
1. A method of recognizing patterns in an input speech signal using a designed state transition model in which a state shared structure of the state transition model is designed, the method comprising:
-
a step of inputting a speech signal; a step of recognizing patterns in the input speech signal using a designed state transition model, the designing of the state transition model comprising; a step of setting the states of a triphone state transition model in an acoustic space as initial clusters; a clustering step of generating a cluster containing said initial clusters by top-down clustering; a step of determining a state shared structure by assigning a short distance cluster among clusters generated by said clustering step, to the state transition model; and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure; and a step of outputting a speech signal representing the pattern recognized in said recognizing step. - View Dependent Claims (2, 3, 4)
-
-
5. A computer usable medium having computer readable program code means embodied therein for causing a computer to store information on a method of designing a state transition model in which a state shared structure of the state transition model is designed, the computer readable program code means comprising:
-
first computer readable program code means for causing the computer to input a speech signal; second computer readable program code means for causing the computer to recognize patterns in the input speech signal using a designed state transition model, said second computer readable program code means comprising means for causing the computer to design the state transition model comprising; third computer readable program code means for causing the computer to set the states of a triphone state transition model in an acoustic space as initial clusters; fourth computer readable program code means for causing the computer to generate a cluster contain the initial clusters by top-down clustering; fifth computer readable program code means for causing the computer to determine a state shared structure by assigning a short distance cluster among clusters caused to be generated by the computer by said fourth computer readable program code means, to the state transition model; and sixth computer readable program code means for causing the computer to learn a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure; and seventh computer readable program code means for causing the computer to output a speech signal representing the recognized pattern the computer is caused to recognize by said second computer readable program code means.
-
-
6. A voice recognition apparatus using a state transition model, comprising:
-
input means for inputting a speech signal containing voice information; processing means for processing the speech signal, said processing means comprising analyzing means for analyzing the voice information contained in the input speech signal input from said input means; likelihood generating means for generating a likelihood signal using the voice information analyzed by said analyzing means and the state transition mode; means for determining a language series responsive to the likelihood signal and outputting an electrical signal representing a recognition result comprising the determined language series; and means for determining the state transition model comprising; means for setting the states of a triphone state transition model in an acoustic space as initial clusters; means for generating a cluster containing said initial clusters by top-down clustering; means for determining a state shared structure by assigning a short distance cluster among clusters generated by said cluster generating means, to the state transition model; and means for learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure. - View Dependent Claims (7, 8, 9)
-
-
10. A voice recognition method using a state transition model, comprising:
-
an input step of inputting a speech signal containing voice information; a processing step of processing the electrical signal, said processing step comprising an analyzing step of analyzing the voice information contained in the input speech signal input by said input step; a likelihood signal generating step of generating a likelihood signal using the voice information analyzed by said analyzing means and the state transition mode; a step of determining a language series responsive to the likelihood signal and outputting a speech signal representing a recognition result comprising the determined language series; and a step of determining the state transition model comprising; a step of setting the states of a triphone state transition model in an acoustic space as initial clusters; a step of generating a cluster containing said initial clusters by top-down clustering; a step of determining a state shared structure by assigning a short distance cluster among clusters generated by said cluster generating step, to the state transition model; and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.
-
-
11. A computer usable medium having computer readable program code means embodied therein for causing a computer to store information on a voice recognition method using a state transition model, the computer readable program code means comprising:
-
first computer readable program code means for causing the computer to input a speech signal containing voice information; second computer readable program code means for causing the computer to process the speech signal, said second computer readable program code means causing the computer to analyze the voice information contained in the input speech signal caused to be input by said first computer readable program code means; third computer readable program code means for causing the computer to generate a likelihood signal using the voice information analyzed by said analyzing means and the state transition mode; fourth computer readable program code means for causing the computer to determine a language series responsive to the likelihood signal and to output a speech signal representing a recognition result comprising the determined language series; and fifth computer readable program code means for causing the computer to determine the state transition model comprising; sixth computer readable program code means for causing the computer to set the states of a triphone state transition model in an acoustic space as initial clusters; seventh computer readable program code means for causing the computer to generate a cluster containing said initial clusters by top-down clustering; eighth computer readable program code means for causing the computer to determine a state shared structure by assigning a short distance cluster among clusters caused to be generated by the computer by said seventh computer readable program code means, to the state transition model; and ninth computer readable program code means for causing the computer to learn a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.
-
Specification