SPEECH CLASSIFICATION APPARATUS, SPEECH CLASSIFICATION METHOD, AND SPEECH CLASSIFICATION PROGRAM

US 20100138223A1
Filed: 03/13/2008
Published: 06/03/2010
Est. Priority Date: 03/26/2007
Status: Active Grant

First Claim

Patent Images

1-26. -26. (canceled)

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An object of the present invention is to allow classification of sequentially input speech signals with good accuracy based on similarity of speakers and environments by using a realistic memory use amount, a realistic processing speed, and an on-line operation. A speech classification probability calculation means 103 calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model. A parameter updating means 107 successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation means 103 (in FIG. 1).

Citations

52 Claims

1-26. -26. (canceled)

27. A speech classification apparatus that classifies speech signals into clusters based on vocal similarity, comprising:
- a speech classification probability calculation means that calculates a probability that a latest input one of the speech signals sequentially input belongs to each of the clusters, based on a probability model for probabilistically determining to which cluster a certain speech signal belongs; and
  
  a parameter updating means that successively estimates values of parameters that define the probability model using each probability calculated by the speech classification probability calculation means;
  
  the speech classification probability calculation means calculating each probability based on the probability model defined by latest values of the parameters successively estimated by the parameter updating means.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34)
- - 28. The speech classification apparatus according to claim 27, comprising:
    - a speech classification probability updating means that recalculates probabilities that the speech signals which have been input within predetermined times in the past belong to the respective clusters, based on the probability model defined by the latest values of the parameters successively estimated by the parameter updating means;
      
      the parameter updating means estimating the parameters that define the probability model using each of the probabilities calculated by the speech classification probability updating means.
  - 29. The speech classification apparatus according to claim 27, comprising:
    - a new cluster registration means that generates a probability model that defines a new cluster to which the input speech signal belongs, assuming a case where the input speech signal does not belong to any cluster; and
      
      a cluster number determination means that determines whether or not to add the new cluster from a result of parameter estimation by the parameter updating means based on a result of calculation using the probability model generated by the new cluster registration means.
  - 30. The speech classification apparatus according to claim 27, wherein the probability model is a hidden Markov model in which states are in a one-to-one correspondence with the clusters.
  - 31. The speech classification apparatus according to claim 30, wherein the probability model is a hidden Markov model associated with a Gaussian mixture distribution having a number of mixtures corresponding to a number of types of phonemes.
  - 32. The speech classification apparatus according to claim 28, comprising:
    - an update target speech selection means that determinants whether or not to cause the speech classification probability updating means to recalculate a probability that each of the speech signals input within the predetermined times in the past belongs to each cluster.
  - 33. The speech classification apparatus according to claim 32, wherein the update target speech selection means determines whether or not recalculation of the probability that each of the speech signals belongs to each cluster is needed, based on an entropy of the calculated probability that each of the speech signals belongs to each cluster at a time of the determination as to the recalculation.
  - 34. The speech classification apparatus according to claim 29, wherein in case the speech signal where the cluster to which the speech signal should belong is known is provided in advance, the new cluster registration means generates a probability model that defines the cluster to which the speech signal should belong.

35. A speech classification method that classifies speech signals into clusters based on vocal similarity, comprising:
- calculating a probability that a latest input one of the speech signals sequentially input belongs to each cluster, based on a probability model for probabilistically determining to which cluster a certain speech signal belongs;
  
  successively estimating parameters that define the probability model using the probability; and
  
  calculating a probability that at least next input speech signal belongs to each cluster, based on the probability model defined by the successively estimated parameters.
- View Dependent Claims (36, 37, 38, 39, 40, 41, 42)
- - 36. The speech classification method according to claim 35, comprising:
    - recalculating probabilities that the speech signals which have been input within predetermined times in the past belong to the respective clusters, based on the probability model defined by the latest values of the successively estimated parameters; and
      
      estimating the parameters that define the probability model using each of the recalculated probabilities.
  - 37. The speech classification method according to claim 35, comprising:
    - generating a probability model that defines a new cluster to which the input speech signal belongs, assuming a case where the input speech signal does not belong to any cluster; and
      
      determining whether or not to add the new cluster from a result of parameter estimation based on a result of calculation using the generated probability model.
  - 38. The speech classification method according to claim 35, wherein the probability model is a hidden Markov model in which states are in a one-to-one correspondence with the clusters.
  - 39. The speech classification method according to claim 38, wherein the probability model is a hidden Markov model associated with a Gaussian mixture distribution having a number of mixtures corresponding to a number of types of phonemes.
  - 40. The speech classification method according to claim 36, comprising:
    - determining whether or not to need recalculation of a probability that each of the speech signals input within the predetermined times in the past belongs to each cluster.
  - 41. The speech classification method according to claim 40, comprising:
    - determining whether or not to need recalculation of the probability that each of the speech signals belongs to each cluster, based on an entropy of the calculated probability that each of the speech signals belongs to each cluster at a time of the determination as to the recalculation.
  - 42. The speech classification method according to claim 36, comprising:
    - Generating a probability model that defines the cluster to which the speech signal should belong when the speech signal where the cluster to which the speech data should belong is known is provided in advance.

43. A speech classification program that classifies speech signals into clusters based on vocal similarity, the program causing a computer to execute:
- a probability calculation processing that calculates a probability that a latest input one of the speech signals sequentially input belongs to each cluster, based on a probability model for probabilistically determining to which cluster a certain speech signal belongs; and
  
  a parameter update processing that successively estimates parameters that define the probability model using each probability calculated by the speech classification probability calculation processing;
  
  the probability calculation processing calculating each probability based on the probability model defined by latest values of the successively estimated parameters.
- View Dependent Claims (44, 45, 46, 47, 48, 49, 50)
- - 44. The speech classification program according to claim 43, causing the computer to execute:
    - a probability recalculation processing that recalculates probabilities that the speech signals which have been input within predetermined times in the past belong to the respective clusters, based on the probability model defined by the latest values of the successively estimated parameters,the parameter updating processing successively estimating the parameters that define the probability model using each of the probabilities calculated by the probability recalculation process.
  - 45. The speech classification program according to claim 43, causing the computer to execute the processing comprising:
    - generating a probability model that defines a new cluster to which the input speech signal belongs, assuming a case where the input speech signal does not belong to any cluster; and
      
      determining whether or not to add the new cluster from a result of parameter estimation based on a result of calculation using the probability model that defines the new cluster.
  - 46. The speech classification program according to claim 43, wherein the probability model is a hidden Markov model in which states are in a one-to-one correspondence with the clusters.
  - 47. The speech classification program according to claim 46, wherein the probability model is a hidden Markov model associated with a Gaussian mixture distribution having a number of mixtures corresponding to a number of types of phonemes.
  - 48. The speech classification program according to claim 44, causing the computer to execute the processing comprising:
    - determining whether or not to need recalculation of a probability that each of the speech signals input within the predetermined times in the past belongs to each cluster.
  - 49. The speech classification program according to claim 48, causing the computer to execute the processing comprising:
    - determining whether or not recalculation of the probability that each of the speech signals belongs to each cluster is needed in the determination process, based on an entropy of the calculated probability that each of the speech signals belongs to each cluster at a time of the determination as to the recalculation.
  - 50. The speech classification program according to claim 43, causing the computer to execute the processing comprising:
    - generating a probability model that defines the cluster to which the speech signal should belong when the speech signal where the cluster to which the speech signal should belong is known is provided in advance.

51. A speech clustering system that performs a clustering process which generates a cluster in response to each of sequentially input speech data on-line, the system comprising:
- a speech classification probability calculation means that derives a probability that a latest one of the sequentially input speech data belongs to each cluster, using a generation model which is defined by parameter values stored in parameter storage means, assuming a speech data distribution, and storing the probability in speech classification probability storage means;
  
  an update target speech selection means that determines whether or not recalculation of the probability that each of the speech data belongs to each cluster is needed according to a magnitude relation between a predetermined threshold value and an indicator, the indicator being obtained by reversing a sign of an entropy of the probability that each of the speech data belongs to each cluster;
  
  a speech classification probability updating means that derives a probability that speech data, the probability of which is determined to be needed by the update target speech selection means, out of predetermined items of the sequentially input speech data except the latest speech data belongs to each cluster and that updates the speech classification probability storage means; and
  
  a parameter updating means that calculates sufficient statistics necessary for calculating the generation model on each of the numbers of clusters to estimate parameter values of the generation model, assuming a current number of clusters and some numbers of clusters in the vicinity of the current number of clusters, based on results of calculations by the speech classification probability calculation means and the speech classification probability updating means, and that successively updates the parameter values in the parameter storage means with the estimated parameter values.
- View Dependent Claims (52)
- - 52. The speech classification system according to claim 51, comprising:
    - a new speaker registration means that reads parameters and sufficient statistics of the generation model stored in the parameter storage means and generating a generation model with the number of clusters being incremented by one; and
      
      a cluster number determination means that determines an optimal number of clusters among the some numbers of clusters assumed by the parameter updating means from a result of estimation of the parameter values of the generation model by the parameter updating means, and storing sufficient statistics and parameter values corresponding to the determined number of clusters in the parameter storage means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Koshinaka, Takafumi

Granted Patent

US 8,630,853 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/245
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 25/78 Detection of presence or ab...

SPEECH CLASSIFICATION APPARATUS, SPEECH CLASSIFICATION METHOD, AND SPEECH CLASSIFICATION PROGRAM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

52 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH CLASSIFICATION APPARATUS, SPEECH CLASSIFICATION METHOD, AND SPEECH CLASSIFICATION PROGRAM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

52 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links