System, method, and recording medium for efficient cohesive subgraph identification in entity collections for inlier and outlier detection
First Claim
Patent Images
1. A similarity detection system receiving a plurality of input entities, the system comprising:
- a face detection and verification device configured to extract a face region from the plurality of input entities and to extract shift features from the face region as images, the face detection and verification device excludes items in the input entities that have facial features but are not a real face of a person;
a cohesive subgraph identification device configured to calculate, based on attributes of the plurality of input entities, a first parameter and a second parameter based on the first parameter, and further configured to identify a plurality of subgraphs from the second parameter; and
a subgraph correlation tracking and clustering device configured to determine a relationship between different subgraphs based on a similarity factor between the second parameter and the plurality of subgraphs,wherein the subgraph correlation tracking and clustering device is further configured to output outliers that include the subgraphs not combined by the subgraph correlation tracking and clustering device,wherein the subgraph correlation tracking and clustering device uses source nodes and target nodes of nodes in a post network such that on each iteration, each source node propagates edge similarities to a neighborhood of the source node along paths, and, each target node aggregates the similarities that are propagated in from their neighbors, andwherein the subgraph correlation tracking and clustering device measures the correlations between subgraphs as an amount of average path similarities that are propagated and aggregated to detect outliers within the images and based on the correlation between the subgraphs and similarities, the subgraph correlation tracking and clustering device combines subgraphs to output inliers,wherein the first parameter includes k-cores and the second parameter includes (k, d)-cores,wherein the (k, d)-cores include a cohesive internal structure adjusted by at least d common neighbors as witnesses of commonality between the nodes in the post network connected by each edge,wherein d is an integer greater than zero,wherein given the post network as G(V,E) defined as k-cores from all the nodes represented as G(V,E), the cohesive subgraph identification device recursively removes the nodes with a degree less than k from G(V,E), until all the remaining nodes have a degree of at least k with a result being a set of k-cores that form the basis for the (k, d)-cores generation,wherein the cohesive subgraph identification device uses k from the (k, d)-cores to adjust an edge density, and uses d from the (k, d) cores to control a strength of the similarities between the subgraphs generated and the k-cores, thereby the cohesive subgraph identification device generates a maximal (k, d)-core which is a subgraph of a maximal k-core,wherein the cohesive subgraph identification device generates the (k, d)-core by removing n nodes with a degree less than k recursively by only verifying the degree of n nodes,wherein given a maximal path length, the average path similarities aggregated by each of the target nodes are computed by multiplying all the edge similarities along the path, dampened by a dampening factor on each hop.
1 Assignment
0 Petitions
Accused Products
Abstract
A similarity detection system receiving a plurality of input entities, the system including a cohesive subgraph identification device configured to calculate, based on attributes of the plurality of input entities, a first parameter and a second parameter based on the first parameter, and further configured to identify a plurality of subgraphs from the second parameter and a subgraph correlation tracking and clustering device configured to determine a relationship between different subgraphs based on a similarity factor between the second parameter and the plurality of subgraphs.
10 Citations
20 Claims
-
1. A similarity detection system receiving a plurality of input entities, the system comprising:
-
a face detection and verification device configured to extract a face region from the plurality of input entities and to extract shift features from the face region as images, the face detection and verification device excludes items in the input entities that have facial features but are not a real face of a person; a cohesive subgraph identification device configured to calculate, based on attributes of the plurality of input entities, a first parameter and a second parameter based on the first parameter, and further configured to identify a plurality of subgraphs from the second parameter; and a subgraph correlation tracking and clustering device configured to determine a relationship between different subgraphs based on a similarity factor between the second parameter and the plurality of subgraphs, wherein the subgraph correlation tracking and clustering device is further configured to output outliers that include the subgraphs not combined by the subgraph correlation tracking and clustering device, wherein the subgraph correlation tracking and clustering device uses source nodes and target nodes of nodes in a post network such that on each iteration, each source node propagates edge similarities to a neighborhood of the source node along paths, and, each target node aggregates the similarities that are propagated in from their neighbors, and wherein the subgraph correlation tracking and clustering device measures the correlations between subgraphs as an amount of average path similarities that are propagated and aggregated to detect outliers within the images and based on the correlation between the subgraphs and similarities, the subgraph correlation tracking and clustering device combines subgraphs to output inliers, wherein the first parameter includes k-cores and the second parameter includes (k, d)-cores, wherein the (k, d)-cores include a cohesive internal structure adjusted by at least d common neighbors as witnesses of commonality between the nodes in the post network connected by each edge, wherein d is an integer greater than zero, wherein given the post network as G(V,E) defined as k-cores from all the nodes represented as G(V,E), the cohesive subgraph identification device recursively removes the nodes with a degree less than k from G(V,E), until all the remaining nodes have a degree of at least k with a result being a set of k-cores that form the basis for the (k, d)-cores generation, wherein the cohesive subgraph identification device uses k from the (k, d)-cores to adjust an edge density, and uses d from the (k, d) cores to control a strength of the similarities between the subgraphs generated and the k-cores, thereby the cohesive subgraph identification device generates a maximal (k, d)-core which is a subgraph of a maximal k-core, wherein the cohesive subgraph identification device generates the (k, d)-core by removing n nodes with a degree less than k recursively by only verifying the degree of n nodes, wherein given a maximal path length, the average path similarities aggregated by each of the target nodes are computed by multiplying all the edge similarities along the path, dampened by a dampening factor on each hop. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A similarity detection method, comprising:
-
receiving a plurality of input entities; extracting a face region from the plurality of input entities and shift features from the face region as images; excluding items in the input entities that have facial features but are not a real face of a person; calculating, based on attributes of the plurality of input entities, a first parameter and a second parameter based on the first parameter; identifying a plurality of subgraphs from the second parameter; determining a relationship between different subgraphs based on a similarity factor between the second parameter and the plurality of subgraphs; outputting outliers that include the subgraphs not combined by the combining; using source nodes and target nodes of nodes in a post network such that on each iteration, each source node propagates edge similarities to a neighborhood of the source node along paths, and each target node aggregates the similarities that are propagated in from their neighbors; and measuring the correlations between subgraphs as an amount of average path similarities that are propagated and aggregated to detect outliers within the images and based on the correlation between the subgraphs and similarities, and combining subgraphs to output inliers, wherein the first parameter includes k-cores and the second parameter includes (k, d)-cores, wherein the (k, d)-cores include a cohesive internal structure adjusted by at least d common neighbors as witnesses of commonality between the nodes in the post network connected by each edge, wherein d is an integer greater than zero, wherein given the post network as G(V,E) defined as k-cores from all the nodes represented as G(V,E), the cohesive subgraph identification device recursively removes the nodes with a degree less than k from G(V,E), until all the remaining nodes have a degree of at least k with a result being a set of k-cores that form the basis for the (k, d)-cores generation, wherein the calculating uses k from the (k, d)-cores to adjust an edge density, and uses d from the (k, d)-cores to control a strength of the similarities between the subgraphs generated and the k-cores, thereby generating a maximal (k, d)-core which is a subgraph of a maximal k-core, wherein the cohesive subgraph identification device generates the (k, d)-core by removing n nodes with a degree less than k recursively by only verifying the degree of n nodes, wherein given a maximal path length, the average path similarities aggregated by each of the target nodes are computed by multiplying all the edge similarities along the path, dampened by a dampening factor on each hop. - View Dependent Claims (18)
-
-
19. A non-transitory computer-readable recording medium recording a similarity detection program, the program causing a computer to perform:
-
receiving a plurality of input entities; extracting a face region from the plurality of input entities and shift features from the face region as images; excluding items in the input entities that have facial features but are not a real face of a person; calculating, based on attributes of the plurality of input entities, a first parameter and a second parameter based on the first parameter; identifying a plurality of subgraphs from the second parameter; determining a relationship between different subgraphs based on a similarity factor between the second parameter and the plurality of subgraphs; outputting outliers that include the subgraphs not combined by the combining; using source nodes and target nodes of nodes in a post network such that on each iteration, each source node propagates edge similarities to a neighborhood of the source node along paths, and each target node aggregates the similarities that are propagated in from their neighbors; and measuring the correlations between subgraphs as an amount of average path similarities that are propagated and aggregated to detect outliers within the images and based on the correlation between the subgraphs and similarities, and combining subgraphs to output inliers, wherein the first parameter includes k-cores and the second parameter includes (k, d)-cores, wherein the (k, d) -cores include a cohesive internal structure adjusted by at least d common neighbors as witnesses of commonality between the nodes in the post network connected by each edge, wherein d is an integer greater than zero, wherein given the post network as G(V,E) defined as k-cores from all the nodes represented as G(V,E), the cohesive subgraph identification device recursively removes the nodes with a degree less than k from G(V,E), until all the remaining nodes have a degree of at least k with a result being a set of k-cores that form the basis for the (k, d)-cores generation, wherein the calculating uses k from the (k, d)-cores to adjust an edge density, and uses d from the (k, d)-cores to control a strength of the similarities between the subgraphs generated and the k cores, thereby generating a maximal (k, d)-core which is a subgraph of a maximal k-core, wherein the cohesive subgraph identification device generates the (k, d)-core by removing a nodes with a degree less than k recursively by only verifying the degree of n nodes, wherein given a maximal path length, the average path similarities aggregated by each of the target nodes are computed by multiplying all the edge similarities along the path, dampened by a dampening, factor on each hop. - View Dependent Claims (20)
-
Specification