Information processing apparatus, clustering method, and recording medium storing clustering program

US 9,519,660 B2
Filed: 10/28/2013
Issued: 12/13/2016
Est. Priority Date: 11/26/2012
Status: Active Grant

First Claim

Patent Images

1. An information processing apparatus that clusters an input data set, comprising:

a memory having stored thereon (i) a plurality of reference data sets that were previously clustered, the plurality of reference data sets correspond to a plurality of reference images, and (ii) a first reference parameter for clustering a first reference data set of the plurality of reference data sets, the first reference data set of the plurality of reference data sets including a plurality of clusters corresponding to a region of a respective one of the plurality of reference images, the first reference parameter being a model parameter of mixture distribution, the first reference parameter corresponding to at least one of a number of clusters in the first reference data set, and a centroid of data points in each of the plurality of clusters; and

at least one processor configured to execute computer readable instructions to,search the memory to obtain the first reference parameter of at least one of the plurality of reference data sets that is similar to the input data set,determine an initial value of the model parameter of mixture distribution of the input data set by combining (a) the first reference parameter of the first reference data set with (b) a second reference parameter of a second reference data set based on similarity between the input data set and the combined first and second reference data sets, at least one of the first reference parameter and the second reference parameter are obtained from the plurality of reference images,modify the initial value of the model parameter of mixture distribution of the input data set to match a probability density distribution of the input data set to generate an updated initial value, andcluster the probability density distribution of the input data set on a feature space of the input data set using the updated initial value of the model parameter of mixture distribution.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information processing apparatus, a clustering method, and a clustering program stored on a recording medium, each of which determines an initial value of model parameter of an input data set based on a model parameter of a reference data set that is similar to the input data set and is previously clustered, modifies the initial value so as to match the input data set, and to obtain a clustering result of the input data set using the updated initial value of model parameter.

18 Citations

View as Search Results

20 Claims

1. An information processing apparatus that clusters an input data set, comprising:
- a memory having stored thereon (i) a plurality of reference data sets that were previously clustered, the plurality of reference data sets correspond to a plurality of reference images, and (ii) a first reference parameter for clustering a first reference data set of the plurality of reference data sets, the first reference data set of the plurality of reference data sets including a plurality of clusters corresponding to a region of a respective one of the plurality of reference images, the first reference parameter being a model parameter of mixture distribution, the first reference parameter corresponding to at least one of a number of clusters in the first reference data set, and a centroid of data points in each of the plurality of clusters; and
  
  at least one processor configured to execute computer readable instructions to,search the memory to obtain the first reference parameter of at least one of the plurality of reference data sets that is similar to the input data set,determine an initial value of the model parameter of mixture distribution of the input data set by combining (a) the first reference parameter of the first reference data set with (b) a second reference parameter of a second reference data set based on similarity between the input data set and the combined first and second reference data sets, at least one of the first reference parameter and the second reference parameter are obtained from the plurality of reference images,modify the initial value of the model parameter of mixture distribution of the input data set to match a probability density distribution of the input data set to generate an updated initial value, andcluster the probability density distribution of the input data set on a feature space of the input data set using the updated initial value of the model parameter of mixture distribution.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 18)
- - 2. The information processing apparatus of claim 1, whereinthe first reference parameter and the second reference parameter included in a plurality of reference parameters, andthe at least one processor is further configured to execute the computer readable instructions to,obtain the plurality of reference parameters based on a degree of similarity between the input data set and the plurality of reference data sets as the at least one of the plurality of reference data sets is similar to the input data set, andcombine the plurality of reference parameters based on the degree of similarity to determine the initial value of the model parameter of mixture distribution of the input data set.
  - 3. The information processing apparatus of claim 2, wherein the at least one processor is further configured to execute the computer readable instructions to,calculate the degree of similarity based on a feature of the input data set and a feature of the plurality of reference data sets, andsearch the at least one of the plurality of reference data sets based on the degree of similarity.
  - 4. The information processing apparatus of claim 3, wherein the feature of the input data set and the feature of the plurality of reference data sets differs from a feature for clustering the input data set.
  - 5. The information processing apparatus of claim 4, wherein the feature for calculating the input data set and the feature of the plurality of reference data sets includes at least one of (i) a feature indicating a shape information of at least one data point in the plurality of reference data sets and (ii) a feature indicating a texture information of the at least one data point in the plurality of reference data sets.
  - 6. The information processing apparatus of claim 4, wherein the feature for clustering the input data set includes a feature indicating a color information of at least one data point in the plurality of reference data sets.
  - 7. The information processing apparatus of claim 1, wherein the model of mixture distribution is a Gaussian mixture distribution model.
  - 8. The information processing apparatus of claim 1, wherein the input data set is an image, and each data point in the input data set is a pixel or a set of a plurality of pixels in the image.
  - 18. The information processing apparatus of claim 1, wherein the at least one processor is further configured to execute the computer readable instructions to use an EM algorithm to modify the initial value.

9. A method of clustering an input data set, comprising:
- storing, in a memory, for each of a plurality of reference data sets corresponding to a respective one of the plurality of reference images that were previously clustered, a first reference parameter for clustering a first reference data set of the plurality of reference data sets, the reference data set of the plurality of reference data sets including a plurality of clusters corresponding to a region of the respective one of the plurality of reference images, the first reference parameter being a model parameter of mixture distribution, the first reference parameter corresponding to at least one of a number of clusters in the first reference data set, and a centroid of data points in each of the plurality of clusters;
  
  searching, using at least one processor, the memory to obtain the first reference parameter of at least one of the plurality of reference data sets that is similar to the input data set;
  
  determining, using the at least one processor, an initial value of the model parameter of mixture distribution of the input data set by combining the first reference parameter of the first reference data set with a second reference parameter of a second reference data set based on similarity between the input data set and the combined first and second reference data sets, the first reference parameter and the second reference parameter included in a plurality of reference parameters, at least one of the first reference parameter and the second reference parameter are obtained from the plurality of reference images;
  
  modifying, using the at least one processor, the initial value of the model parameter of mixture distribution of the input data set to match a probability density distribution of the input data set to generate an updated initial value; and
  
  clustering, using the at least one processor, the probability density distribution of the input data set on a feature space of the input data set using the updated initial value of the model parameter of mixture distribution.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 19)
- - 10. The method of claim 9, further comprising:
    - obtaining, using the at least one processor, the plurality of reference parameters based on a degree of similarity between the input data set and the plurality of reference data sets as the at least one of the plurality of reference data sets is similar to the input data set; and
      
      combining, using the at least one processor, the plurality of reference parameters based on the degree of similarity to determine the initial value of the model parameter of mixture distribution of the input data set.
  - 11. The method of claim 10, further comprising:
    - calculating, using the at least one processor, the degree of similarity based on a feature of the input data set and a feature of the plurality of reference data sets; and
      
      searching, using the at least one processor, the at least one of the plurality of reference data sets based on the degree of similarity.
  - 12. The method of claim 11, wherein the calculating, using the at least one processor, the feature of the input data set and the feature of the plurality of reference data sets differs from a feature for clustering the input data set.
  - 13. The method of claim 12, wherein the calculating, using the at least one processor, the feature of the input data set and the feature of the plurality of reference data sets includes at least one of (i) a feature indicating a shape information of at least one data point in the plurality of reference data sets and (ii) a feature indicating a texture information of the at least one data point in the plurality of reference data sets.
  - 14. The method of claim 12, wherein the calculating, using the at least one processor, the feature for clustering the input data set includes a feature indicating a color information of at least one data point in the plurality of reference data sets.
  - 15. The method of claim 9, wherein the storing in the memory of the first reference parameter is the model of mixture distribution which is a Gaussian mixture distribution model.
  - 16. The method of claim 9, wherein the input data set is an image, and each data point in the input data set is a pixel or a set of a plurality of pixels in the image.
  - 19. The method of claim 9, wherein the modifying, using the at least one processor, of the initial value is based on an EM algorithm.

17. A non-transitory computer readable recording medium having computer readable instructions stored thereon, that when executed by at least one processor, configure the at least one processor to:
- store, in a memory, for each of a plurality of reference data sets corresponding to a respective one of the plurality of reference images that were previously clustered, a first reference parameter for clustering a first reference data set of the plurality of reference data sets, the first reference data set of the plurality of reference data sets including a plurality of clusters corresponding to a region of the respective one of the plurality of reference images, the first reference parameter being a model parameter of mixture distribution, the first reference parameter corresponding to at least one of a number of clusters in the first reference data set, and a centroid of data points in each of the plurality of clusters;
  
  search the memory to obtain the first reference parameter of at least one of the plurality of reference data sets that is similar to an input data set;
  
  determine an initial value of the model parameter of mixture distribution of the input data set by combining the first reference parameter of the first reference data set with a second reference parameter of a second reference data set based on similarity between the input data set and the combined first and second reference data sets, at least one of the first reference parameter and the second reference parameter are obtained from the plurality of reference images;
  
  modify the initial value of the model parameter of mixture distribution of the input data set to match a probability density distribution of the input data set to generate an updated initial value; and
  
  cluster the probability density distribution of the input data set on a feature space of the input data set using the updated initial value of the model parameter of mixture distribution.
- View Dependent Claims (20)
- - 20. The non-transitory computer readable recording medium of claim 17, wherein the at least one processor is further configured to execute the computer readable instructions to use an EM algorithm to modify the initial value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ricoh Company Limited
Original Assignee
Ricoh Company Limited
Inventors
Nakamura, Satoshi
Primary Examiner(s)
Perveen, Rehana
Assistant Examiner(s)
Wong, Huen

Application Number

US14/064,484
Publication Number

US 20140149412A1
Time in Patent Office

1,142 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/355   Class or cluster creation o...

G06F 16/583   using metadata automaticall...

G06F 18/23213   with fixed number of cluste...

G06F 18/24137   Distances to cluster centroïds

G06V 10/763   Non-hierarchical techniques...

G06V 10/764   using classification, e.g. ...

G06V 20/35   Categorising the entire sce...

Information processing apparatus, clustering method, and recording medium storing clustering program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

18 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Information processing apparatus, clustering method, and recording medium storing clustering program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links