State transition model design method and voice recognition method and apparatus using same

US 5,812,975 A
Filed: 06/18/1996
Issued: 09/22/1998
Est. Priority Date: 06/19/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A method of recognizing patterns in an input speech signal using a designed state transition model in which a state shared structure of the state transition model is designed, the method comprising:

a step of inputting a speech signal;

a step of recognizing patterns in the input speech signal using a designed state transition model, the designing of the state transition model comprising;

a step of setting the states of a triphone state transition model in an acoustic space as initial clusters;

a clustering step of generating a cluster containing said initial clusters by top-down clustering;

a step of determining a state shared structure by assigning a short distance cluster among clusters generated by said clustering step, to the state transition model; and

a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure; and

a step of outputting a speech signal representing the pattern recognized in said recognizing step.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of designing a state transition model capable of high speed voice recognition and a voice recognition method and apparatus using the state transition model is provided. The methods provide a state transition model in which a state shared structure of the state transition model is designed. The method includes a step of setting the states of a triphone state transition model in an acoustic space as initial clusters, a clustering step of generating a cluster containing the initial clusters by top-down clustering, a step of determining a state shared structure by assigning a short distance cluster among clusters generated by the clustering step, to the state transition model, and a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.

89 Citations

View as Search Results

11 Claims

1. A method of recognizing patterns in an input speech signal using a designed state transition model in which a state shared structure of the state transition model is designed, the method comprising:
- a step of inputting a speech signal;
  
  a step of recognizing patterns in the input speech signal using a designed state transition model, the designing of the state transition model comprising;
  
  a step of setting the states of a triphone state transition model in an acoustic space as initial clusters;
  
  a clustering step of generating a cluster containing said initial clusters by top-down clustering;
  
  a step of determining a state shared structure by assigning a short distance cluster among clusters generated by said clustering step, to the state transition model; and
  
  a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure; and
  
  a step of outputting a speech signal representing the pattern recognized in said recognizing step.
- View Dependent Claims (2, 3, 4)
- - 2. A method according to claim 1, wherein said clustering step executes clustering to generate a predetermined number of clusters by a Euclid distance calculation, and after generating the predetermined number of clusters, to generate clusters by accurate distance calculation.
  - 3. A method according to claim 2, wherein said accurate distance calculation uses a Bhattacharyya distance.
  - 4. A method according to claim 1, wherein said clustering step is defined by an output probability of states.

5. A computer usable medium having computer readable program code means embodied therein for causing a computer to store information on a method of designing a state transition model in which a state shared structure of the state transition model is designed, the computer readable program code means comprising:
- first computer readable program code means for causing the computer to input a speech signal;
  
  second computer readable program code means for causing the computer to recognize patterns in the input speech signal using a designed state transition model, said second computer readable program code means comprising means for causing the computer to design the state transition model comprising;
  
  third computer readable program code means for causing the computer to set the states of a triphone state transition model in an acoustic space as initial clusters;
  
  fourth computer readable program code means for causing the computer to generate a cluster contain the initial clusters by top-down clustering;
  
  fifth computer readable program code means for causing the computer to determine a state shared structure by assigning a short distance cluster among clusters caused to be generated by the computer by said fourth computer readable program code means, to the state transition model; and
  
  sixth computer readable program code means for causing the computer to learn a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure; and
  
  seventh computer readable program code means for causing the computer to output a speech signal representing the recognized pattern the computer is caused to recognize by said second computer readable program code means.

6. A voice recognition apparatus using a state transition model, comprising:
- input means for inputting a speech signal containing voice information;
  
  processing means for processing the speech signal, said processing means comprising analyzing means for analyzing the voice information contained in the input speech signal input from said input means;
  
  likelihood generating means for generating a likelihood signal using the voice information analyzed by said analyzing means and the state transition mode;
  
  means for determining a language series responsive to the likelihood signal and outputting an electrical signal representing a recognition result comprising the determined language series; and
  
  means for determining the state transition model comprising;
  
  means for setting the states of a triphone state transition model in an acoustic space as initial clusters;
  
  means for generating a cluster containing said initial clusters by top-down clustering;
  
  means for determining a state shared structure by assigning a short distance cluster among clusters generated by said cluster generating means, to the state transition model; and
  
  means for learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.
- View Dependent Claims (7, 8, 9)
- - 7. A voice recognition apparatus according to claim 6, wherein the top-down clustering generates a recognition model by executing clustering to generate a predetermined number of clusters by a Euclid distance calculation, and after generating the predetermined number of clusters, to generate clusters by accurate distance calculation.
  - 8. A voice recognition apparatus according to claim 7, wherein said accurate distance calculation uses a Bhattacharyya distance.
  - 9. A voice recognition apparatus according to claim 6, wherein the top-down clustering is defined by an output probability of states.

10. A voice recognition method using a state transition model, comprising:
- an input step of inputting a speech signal containing voice information;
  
  a processing step of processing the electrical signal, said processing step comprising an analyzing step of analyzing the voice information contained in the input speech signal input by said input step;
  
  a likelihood signal generating step of generating a likelihood signal using the voice information analyzed by said analyzing means and the state transition mode;
  
  a step of determining a language series responsive to the likelihood signal and outputting a speech signal representing a recognition result comprising the determined language series; and
  
  a step of determining the state transition model comprising;
  
  a step of setting the states of a triphone state transition model in an acoustic space as initial clusters;
  
  a step of generating a cluster containing said initial clusters by top-down clustering;
  
  a step of determining a state shared structure by assigning a short distance cluster among clusters generated by said cluster generating step, to the state transition model; and
  
  a step of learning a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.

11. A computer usable medium having computer readable program code means embodied therein for causing a computer to store information on a voice recognition method using a state transition model, the computer readable program code means comprising:
- first computer readable program code means for causing the computer to input a speech signal containing voice information;
  
  second computer readable program code means for causing the computer to process the speech signal, said second computer readable program code means causing the computer to analyze the voice information contained in the input speech signal caused to be input by said first computer readable program code means;
  
  third computer readable program code means for causing the computer to generate a likelihood signal using the voice information analyzed by said analyzing means and the state transition mode;
  
  fourth computer readable program code means for causing the computer to determine a language series responsive to the likelihood signal and to output a speech signal representing a recognition result comprising the determined language series; and
  
  fifth computer readable program code means for causing the computer to determine the state transition model comprising;
  
  sixth computer readable program code means for causing the computer to set the states of a triphone state transition model in an acoustic space as initial clusters;
  
  seventh computer readable program code means for causing the computer to generate a cluster containing said initial clusters by top-down clustering;
  
  eighth computer readable program code means for causing the computer to determine a state shared structure by assigning a short distance cluster among clusters caused to be generated by the computer by said seventh computer readable program code means, to the state transition model; and
  
  ninth computer readable program code means for causing the computer to learn a state shared model by analyzing the states of the triphones in accordance with the determined state shared structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Ohora, Yasunori, Komori, Yasuhiro
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Chawan, Vijay B.

Application Number

US08/665,503
Time in Patent Office

826 Days
Field of Search

395/2.65, 395/2.54, 395/2.45, 395/2.49, 395/2.53, 395/2.52, 395/2.64
US Class Current

704/256
CPC Class Codes

G10L 15/063   Training

G10L 15/144   Training of HMMs

G10L 2015/022   Demisyllables, biphones or ...

G10L 2015/0631   Creating reference template...

State transition model design method and voice recognition method and apparatus using same

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

89 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

State transition model design method and voice recognition method and apparatus using same

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

89 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links