Similarity calculation system, method of calculating similarity, and program

US 10,140,342 B2
Filed: 06/30/2014
Issued: 11/27/2018
Est. Priority Date: 06/30/2014
Status: Active Grant

First Claim

Patent Images

1. A similarity calculation system for increasing the efficiency of a computer when performing searching, comprising:

at least one processor; and

at least one memory device that stores a plurality of instructions, which when executed by the at least one processor, causes the at least one processor to operate to;

acquire a query vector;

acquire a plurality of target vectors;

calculate a similarity between each of the plurality of target vectors belonging to any one of the plurality of clusters and the query vector,calculate, for each of the plurality of target vectors, a calculation amount to be estimated when calculating the similarity between the each of the plurality of target vectors and the query vector,cluster the plurality of target vectors based on the calculation amount to be estimated for each of the plurality of target vectors,wherein, in the calculation, the processor calculates a number of non-zero elements of each of the plurality of target vectors as the estimated calculation amount,wherein, in the clustering, the processor clusters the plurality of target vectors so that a difference in a total sum of the calculated calculation amounts for all of the plurality of target vectors belonging to each of the plurality of clusters among the plurality of clusters decreases,wherein, in the clustering, the processor clusters the plurality of target vectors by generating a graph comprising;

a plurality of first nodes that correspond to each of the plurality of target vectors and that has the calculation amount estimated for a corresponding one of the plurality of target vectors as a weight,a plurality of second nodes corresponding to an element type of the plurality of target vectors, anda plurality of edges connecting each of the plurality of first nodes to any one of the plurality of second nodes, and by dividing the generated graph based on the weight of each of the plurality of first nodes.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided is a similarity calculation system for equalizing the time for calculating a similarity between target vectors and a query vector. The similarity calculation system includes target vector acquisition part for acquiring a plurality of target vectors, and clustering part for clustering the plurality of target vectors based on a calculation amount to be estimated for each of the plurality of target vectors, the calculation amount being estimated when calculating a similarity between each of the plurality of target vectors and a given reference query vector, so that a difference in total calculation amount for a similarity between all of the target vectors belonging to each of a plurality of clusters and the given reference query vector among the plurality of clusters decreases.

8 Citations

View as Search Results

6 Claims

1. A similarity calculation system for increasing the efficiency of a computer when performing searching, comprising:
- at least one processor; and
  
  at least one memory device that stores a plurality of instructions, which when executed by the at least one processor, causes the at least one processor to operate to;
  
  acquire a query vector;
  
  acquire a plurality of target vectors;
  
  calculate a similarity between each of the plurality of target vectors belonging to any one of the plurality of clusters and the query vector,calculate, for each of the plurality of target vectors, a calculation amount to be estimated when calculating the similarity between the each of the plurality of target vectors and the query vector,cluster the plurality of target vectors based on the calculation amount to be estimated for each of the plurality of target vectors,wherein, in the calculation, the processor calculates a number of non-zero elements of each of the plurality of target vectors as the estimated calculation amount,wherein, in the clustering, the processor clusters the plurality of target vectors so that a difference in a total sum of the calculated calculation amounts for all of the plurality of target vectors belonging to each of the plurality of clusters among the plurality of clusters decreases,wherein, in the clustering, the processor clusters the plurality of target vectors by generating a graph comprising;
  
  a plurality of first nodes that correspond to each of the plurality of target vectors and that has the calculation amount estimated for a corresponding one of the plurality of target vectors as a weight,a plurality of second nodes corresponding to an element type of the plurality of target vectors, anda plurality of edges connecting each of the plurality of first nodes to any one of the plurality of second nodes, and by dividing the generated graph based on the weight of each of the plurality of first nodes.
- View Dependent Claims (2, 3, 4)
- - 2. The similarity calculation system according to claim 1, wherein the processor clusters the plurality of target vectors so that a difference in a total calculation amount among a plurality of clusters decreases,wherein the total calculation amount being estimated for each of the plurality of clusters is based on a calculation amount estimated for each of the plurality of target vectors belonging to the each of the plurality of clusters.
  - 3. The similarity calculation system according to claim 1,wherein each of the plurality of edges comprises a cost that is based on a value of an element of the target vector corresponding to a corresponding one of the plurality of edges, andwherein the processor clusters the plurality of target vectors by dividing the generated graph based further on the cost of each of the plurality of edges.
  - 4. The similarity calculation system according to claim 1, further comprising the processor being caused to:
    - select, based on the element type corresponding to the second node classified into the plurality of clusters by the processor and on the query vector including a plurality of elements, the cluster for which the similarity between the query vector and each of the plurality of target vectors is to be calculated,wherein the processor calculates the similarity between each of the plurality of target vectors belonging to the cluster selected by the processor and the query vector.

5. A method of calculating a similarity among target vectors for increasing the efficiency of a computer when performing searching, comprising:
- acquiring a query vector;
  
  acquiring, with at least one processor operating with a memory device in a server, a plurality of target vectors;
  
  calculating a similarity between each of the plurality of target vectors belonging to any one of the plurality of clusters and the query vector,calculating, for each of the plurality of target vectors, a calculation amount to be estimated when calculating the similarity between the each of the plurality of target vectors and the query vector, by calculating a number of non-zero elements of each of the plurality of target vectors as the estimated calculation amount;
  
  clustering, with the at least one processor operating with the memory device in the server, the plurality of target vectors based on the calculation amount to be estimated for each of the plurality of target vectors such that the processor clusters the plurality of target vectors so that a difference in a total sum of the calculated calculation amounts for all of the plurality of target vectors belonging to each of the plurality of clusters among the plurality of clusters decreases,clustering the plurality of target vectors b generating a graph, the graph comprising;
  
  a plurality of first nodes that correspond to each of the plurality of target vectors and that has the calculation amount estimated for a corresponding one of the plurality of target vectors as a weight,a plurality of second nodes corresponding to an element type of the plurality of target vectors, anda plurality of edges connecting each of the plurality of first nodes to any one of the plurality of second nodes, and by dividing the generated graph based on the weight of each of the plurality of first nodes.

6. A computer-readable non-transitory storage medium storing a plurality of instructions for calculating a similarity among target vectors for increasing the efficiency of a computer when performing searching, wherein when executed by at least one processor, the plurality of instructions cause the at least one processor to:
- acquire a query vector;
  
  acquire a plurality of target vectors;
  
  calculate a similarity between each of the plurality of target vectors belonging to any one of the plurality of clusters and the query vector,calculate, for each of the plurality of target vectors, a calculation amount to be estimated when calculating the similarity between the each of the plurality of target vectors and the query vector,cluster the plurality of target vectors based on the calculation amount to be estimated for each of the plurality of target vectors,wherein, in the calculation, the processor calculates a number of non-zero elements of each of the plurality of target vectors as the estimated calculation amount,wherein, in the clustering, the processor clusters the plurality of target vectors so that a difference in a total sun of the calculated calculation amounts for all of the plurality of target vectors belonging to each of the plurality of clusters among the plurality of clusters decreases,wherein, in the clustering, the processor clusters the plurality of target vectors by generating a graph comprising;
  
  a plurality of first nodes that correspond to each of the plurality of target vectors and that has the calculation amount estimated for a corresponding one of the plurality of target vectors as a weight,a plurality of second nodes corresponding to an element type of the plurality of target vectors, anda plurality of edges connecting each of the plurality of first nodes to any one of the plurality of second nodes, and by dividing the generated graph based on the weight of each of the plurality of first nodes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Rakuten Group, Inc.
Original Assignee
Rakuten, Inc. (Rakuten Group, Inc.)
Inventors
Cevahir, Ali
Primary Examiner(s)
Trujillo, James
Assistant Examiner(s)
Le, Jessica N

Application Number

US15/028,439
Publication Number

US 20160321265A1
Time in Patent Office

1,611 Days
Field of Search

707737
US Class Current
CPC Class Codes

G06F 16/24578 using ranking

G06F 16/285 Clustering or classification

Similarity calculation system, method of calculating similarity, and program

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

6 Claims

Specification

Use Cases

Quick Links

Others

Similarity calculation system, method of calculating similarity, and program

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

6 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others