Clustering system, clustering method, clustering program and attribute estimation system using clustering system
First Claim
1. A clustering system that clusters a language model group including language models that correspond to a plurality of attribute values, each language model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including vocabularies appearing as speech uttered by or text written by one or more humans having attributes represented with the attribute values and data representing occurrence frequencies of the vocabularies, the clustering system comprising:
- a union language model preparation unit that generates union data representing a union of vocabularies included in the language model group and prepares a union language model including the union of the vocabularies and occurrence frequencies of the vocabularies using the union data, the union language model being prepared for each language model included in the language model group, so as to prepare a union language model group; and
a clustering unit that performs clustering, which classifies the union language model group into a plurality of clusters in which a difference in similarities between union language models belonging to one cluster is minimized, and generates cluster data representing one or more of the union language models included in each cluster,wherein when the union language model preparation unit prepares a union language model for a certain language model, the union language model preparation unit records vocabularies included in the certain language model among the vocabularies included in the union data associated with occurrence frequencies of the vocabularies in the certain language model as entries in the union language model, and records vocabularies not included in the certain language model among the vocabularies included in the union data associated with data showing that an occurrence frequency is 0 as entries in the union language model.
1 Assignment
0 Petitions
Accused Products
Abstract
A clustering system that clusters a language model group includes a union language model preparation unit that prepares a union language model for each language model so as to include a union of vocabularies in the language model group as entries, and a clustering unit that performs clustering with respect to the union language model group so as to classify the union language model group into a plurality of clusters. When the union language model preparation unit prepares a union language model for a certain language model, the union language model preparation unit records, regarding vocabularies included in the certain language model as a basis, occurrence frequencies of the corresponding entries in the certain language model, and records, regarding vocabularies not included in the certain language model, data showing that an occurrence frequency is 0. Thereby, a clustering system capable of clustering language models that includes voice uttered by or text written by a plurality of speakers can be provided.
-
Citations
10 Claims
-
1. A clustering system that clusters a language model group including language models that correspond to a plurality of attribute values, each language model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including vocabularies appearing as speech uttered by or text written by one or more humans having attributes represented with the attribute values and data representing occurrence frequencies of the vocabularies, the clustering system comprising:
-
a union language model preparation unit that generates union data representing a union of vocabularies included in the language model group and prepares a union language model including the union of the vocabularies and occurrence frequencies of the vocabularies using the union data, the union language model being prepared for each language model included in the language model group, so as to prepare a union language model group; and a clustering unit that performs clustering, which classifies the union language model group into a plurality of clusters in which a difference in similarities between union language models belonging to one cluster is minimized, and generates cluster data representing one or more of the union language models included in each cluster, wherein when the union language model preparation unit prepares a union language model for a certain language model, the union language model preparation unit records vocabularies included in the certain language model among the vocabularies included in the union data associated with occurrence frequencies of the vocabularies in the certain language model as entries in the union language model, and records vocabularies not included in the certain language model among the vocabularies included in the union data associated with data showing that an occurrence frequency is 0 as entries in the union language model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A clustering method for clustering a language model group using a computer, the language model group including language models that correspond to a plurality of attribute values, each language model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including vocabularies appearing as speech uttered by or text written by one or more humans having attributes represented with the attribute values and data representing occurrence frequencies of the vocabularies, the method comprising the steps of:
-
a union preparation step in which a union language model preparation unit provided in the computer generates union data representing a union of vocabularies included in the language model group and prepares a union language model including the union of the vocabularies and occurrence frequencies of the vocabularies using the union data, the union language model being prepared for each language model included in the language model group, so as to prepare a union language model group; and a cluster data generation step in which a clustering unit provided in the computer performs clustering, which classifies the union language model group into a plurality of clusters in which a difference in similarities between union language models belonging to one cluster is minimized, and generates cluster data representing one or more of the union language models included in each cluster, wherein in the union preparation step, when the union language model preparation unit prepares a union language model for a certain language model, the union language model preparation unit records vocabularies included in the certain language model among the vocabularies included in the union data associated with occurrence frequencies of the vocabularies in the certain language model as entries in the union language model, and records vocabularies not included in the certain language model among the vocabularies included in the union data associated with data showing that an occurrence frequency is 0 as entries in the union language model.
-
-
10. A recording medium storing a clustering program that makes a computer execute a clustering process of a language model group including language models that correspond to a plurality of attribute values, each language model being associated with an attribute value showing a predetermined attribute of humans and having a plurality of entries including vocabularies appearing as speech uttered by or text written by one or more humans having attributes represented with the attribute values and data representing occurrence frequencies of the vocabularies, the program making the computer execute the following processes of:
-
a union language model preparation process of generating union data representing a union of vocabularies included in the language model group and preparing a union language model including the union of the vocabularies and occurrence frequencies of the vocabularies using the union data, the union language model being prepared for each language model included in the language model group, so as to prepare a union language model group; and a clustering process of performing clustering, which classifies the union language model group into a plurality of clusters in which a difference in similarities between union language models belonging to one cluster is minimized, and generating cluster data representing one or more of the union language models included in each cluster, wherein in the union language model preparation process, when a union language model is prepared for a certain language model, the program makes the computer execute the process of recording vocabularies included in the certain language model among the vocabularies included in the union data associated with occurrence frequencies of the vocabularies in the certain language model as entries in the union language model, and recording vocabularies not included in the certain language model among the vocabularies included in the union data associated with data showing that an occurrence frequency is 0 as entries in the union language model.
-
Specification