Clustering system, clustering method, clustering program and attribute estimation system using clustering system

US 7,707,028 B2
Filed: 06/22/2006
Issued: 04/27/2010
Est. Priority Date: 03/20/2006
Status: Active Grant

First Claim

Patent Images

1. A clustering system that clusters a language model group including language models that correspond to a plurality of attribute values, each language model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including vocabularies appearing as speech uttered by or text written by one or more humans having attributes represented with the attribute values and data representing occurrence frequencies of the vocabularies, the clustering system comprising:

a union language model preparation unit that generates union data representing a union of vocabularies included in the language model group and prepares a union language model including the union of the vocabularies and occurrence frequencies of the vocabularies using the union data, the union language model being prepared for each language model included in the language model group, so as to prepare a union language model group; and

a clustering unit that performs clustering, which classifies the union language model group into a plurality of clusters in which a difference in similarities between union language models belonging to one cluster is minimized, and generates cluster data representing one or more of the union language models included in each cluster,wherein when the union language model preparation unit prepares a union language model for a certain language model, the union language model preparation unit records vocabularies included in the certain language model among the vocabularies included in the union data associated with occurrence frequencies of the vocabularies in the certain language model as entries in the union language model, and records vocabularies not included in the certain language model among the vocabularies included in the union data associated with data showing that an occurrence frequency is 0 as entries in the union language model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A clustering system that clusters a language model group includes a union language model preparation unit that prepares a union language model for each language model so as to include a union of vocabularies in the language model group as entries, and a clustering unit that performs clustering with respect to the union language model group so as to classify the union language model group into a plurality of clusters. When the union language model preparation unit prepares a union language model for a certain language model, the union language model preparation unit records, regarding vocabularies included in the certain language model as a basis, occurrence frequencies of the corresponding entries in the certain language model, and records, regarding vocabularies not included in the certain language model, data showing that an occurrence frequency is 0. Thereby, a clustering system capable of clustering language models that includes voice uttered by or text written by a plurality of speakers can be provided.

Citations

10 Claims

1. A clustering system that clusters a language model group including language models that correspond to a plurality of attribute values, each language model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including vocabularies appearing as speech uttered by or text written by one or more humans having attributes represented with the attribute values and data representing occurrence frequencies of the vocabularies, the clustering system comprising:
- a union language model preparation unit that generates union data representing a union of vocabularies included in the language model group and prepares a union language model including the union of the vocabularies and occurrence frequencies of the vocabularies using the union data, the union language model being prepared for each language model included in the language model group, so as to prepare a union language model group; and
  
  a clustering unit that performs clustering, which classifies the union language model group into a plurality of clusters in which a difference in similarities between union language models belonging to one cluster is minimized, and generates cluster data representing one or more of the union language models included in each cluster,wherein when the union language model preparation unit prepares a union language model for a certain language model, the union language model preparation unit records vocabularies included in the certain language model among the vocabularies included in the union data associated with occurrence frequencies of the vocabularies in the certain language model as entries in the union language model, and records vocabularies not included in the certain language model among the vocabularies included in the union data associated with data showing that an occurrence frequency is 0 as entries in the union language model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The clustering system according to claim 1, wherein the clustering unit further generates a clustered language model corresponding to each cluster represented with the cluster data, based on a union language model included in the each cluster.
  - 3. The clustering system according to claim 2, further comprising an entry deletion unit that, among entries included in the union language models or the clustered language models, deletes entries having occurrence frequencies less than a predetermined threshold value.
  - 4. The clustering system according to claim 2, further comprising an entry deletion unit that, among entries included in the union language models or the clustered language models, keeps N pieces of higher-rank entries in decreasing order of occurrence frequencies and deletes remaining entries.
  - 5. The clustering system according to claim 1, further comprising a linking recording unit that records audio models associated with the union language models having the corresponding attribute values as linking models for the respective attribute values, the audio models corresponding to a plurality of attribute values, each audio model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including audios included in speech of humans having attributes represented with the attribute values and data representing occurrence frequencies of the audios,wherein the clustering unit performs clustering with respect to the linking models having respective attribute values recorded by the linking recording unit so as to classify the linking models into a plurality of clusters, and generates cluster data representing each cluster.
  - 6. The clustering system according to claim 5, further comprising a weighting unit that multiplies at least one of data representing occurrence frequencies included in the entries of the audio models and data representing occurrence frequencies included in the entries of the union language models by a weighting factor so as to adjust at least one of a dispersion of the occurrence frequencies in the audio models and a dispersion of the occurrence frequencies in the union language models.
  - 7. An attribute estimation system that estimates an attributes of a human using cluster data and union language models generated and prepared by the clustering system according to claim 1, the attribute estimation system comprising:
    - an input unit by which language information on the human is input;
      
      a score calculation unit that calculates a score of the language information input by the input unit using the union language models, the score being calculated for each cluster represented by the cluster data; and
      
      an attribute estimation unit that generates data showing an attribute of the human based on the scores for the respective clusters, so as to estimate the attribute.
  - 8. An attribute estimation system that estimates an attribute of a human using cluster data and union language models generated and prepared by the clustering system according to claim 5, the attribute estimation system comprising:
    - an input unit by which data representing speech of the human is input;
      
      a language score calculation unit that calculates a language score of the speech input by the input unit using the union language models, the language score being calculated for each cluster represented by the cluster data;
      
      an audio score calculation unit that calculates an audio score of the speech input by the input unit, the audio score being calculated for each cluster represented by the cluster data; and
      
      an attribute estimation unit that generates data showing an attribute of the human based on the audio scores for the respective clusters and the language scores for the respective clusters, so as to estimate the attribute.

9. A clustering method for clustering a language model group using a computer, the language model group including language models that correspond to a plurality of attribute values, each language model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including vocabularies appearing as speech uttered by or text written by one or more humans having attributes represented with the attribute values and data representing occurrence frequencies of the vocabularies, the method comprising the steps of:
- a union preparation step in which a union language model preparation unit provided in the computer generates union data representing a union of vocabularies included in the language model group and prepares a union language model including the union of the vocabularies and occurrence frequencies of the vocabularies using the union data, the union language model being prepared for each language model included in the language model group, so as to prepare a union language model group; and
  
  a cluster data generation step in which a clustering unit provided in the computer performs clustering, which classifies the union language model group into a plurality of clusters in which a difference in similarities between union language models belonging to one cluster is minimized, and generates cluster data representing one or more of the union language models included in each cluster,wherein in the union preparation step, when the union language model preparation unit prepares a union language model for a certain language model, the union language model preparation unit records vocabularies included in the certain language model among the vocabularies included in the union data associated with occurrence frequencies of the vocabularies in the certain language model as entries in the union language model, and records vocabularies not included in the certain language model among the vocabularies included in the union data associated with data showing that an occurrence frequency is 0 as entries in the union language model.

10. A recording medium storing a clustering program that makes a computer execute a clustering process of a language model group including language models that correspond to a plurality of attribute values, each language model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including vocabularies appearing as speech uttered by or text written by one or more humans having attributes represented with the attribute values and data representing occurrence frequencies of the vocabularies, the program making the computer execute the following processes of:
- a union language model preparation process of generating union data representing a union of vocabularies included in the language model group and preparing a union language model including the union of the vocabularies and occurrence frequencies of the vocabularies using the union data, the union language model being prepared for each language model included in the language model group, so as to prepare a union language model group; and
  
  a clustering process of performing clustering, which classifies the union language model group into a plurality of clusters in which a difference in similarities between union language models belonging to one cluster is minimized, and generating cluster data representing one or more of the union language models included in each cluster,wherein in the union language model preparation process, when a union language model is prepared for a certain language model, the program makes the computer execute the process of recording vocabularies included in the certain language model among the vocabularies included in the union data associated with occurrence frequencies of the vocabularies in the certain language model as entries in the union language model, and recording vocabularies not included in the certain language model among the vocabularies included in the union data associated with data showing that an occurrence frequency is 0 as entries in the union language model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujitsu Limited
Original Assignee
Fujitsu Limited
Inventors
Kojima, Hideki
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US11/472,311
Publication Number

US 20070219779A1
Time in Patent Office

1,405 Days
Field of Search

None
US Class Current

704/9
CPC Class Codes

G06F 40/30 Semantic analysis

G10L 15/197 Probabilistic grammars, e.g...

Clustering system, clustering method, clustering program and attribute estimation system using clustering system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Clustering system, clustering method, clustering program and attribute estimation system using clustering system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links