LARGE GRAPH MEASUREMENT
First Claim
1. A method for efficiently computing a pairwise distance between nodes in a large graph, comprising:
- generating a URL-sketch of respective web-pages in an index of web-pages, comprising;
extracting labels from the respective web-pages; and
storing sets of labels for respective web-pages in the index for efficient storage and retrieval;
generating a neighborhood-sketch for the respective web-pages in the index using the URL-sketches, comprising;
determining a neighborhood for a web-page comprising neighboring nodes; and
generating a neighborhood sketch for the web-page using labels associated with the respective neighboring nodes; and
determining a distance between two nodes in the index, comprising;
computing an approximate number of paths between the two nodes, using the neighborhood sketches of the two nodes; and
computing an approximate path length between the two nodes, using the neighborhood sketches of the two nodes.
2 Assignments
0 Petitions
Accused Products
Abstract
As provided herein, a pairwise distance between nodes in a large graph can be determined efficiently. URL-sketches are generated for respective nodes in an index by extracting labels from respective nodes, which provide a reference to a link between the nodes, aggregating the labels into sets for respective nodes, and storing the sets of labels as URL-sketches. Neighborhood-sketches are generated for the respective nodes in the index using the URL-sketches, by determining a neighborhood for a node and generating a sketch using labels that are associated with the respective neighboring nodes. A distance between two nodes is determined by computing an approximate number of paths and an approximate path length between the two nodes, using the neighborhood sketches for the two nodes.
43 Citations
20 Claims
-
1. A method for efficiently computing a pairwise distance between nodes in a large graph, comprising:
-
generating a URL-sketch of respective web-pages in an index of web-pages, comprising; extracting labels from the respective web-pages; and storing sets of labels for respective web-pages in the index for efficient storage and retrieval; generating a neighborhood-sketch for the respective web-pages in the index using the URL-sketches, comprising; determining a neighborhood for a web-page comprising neighboring nodes; and generating a neighborhood sketch for the web-page using labels associated with the respective neighboring nodes; and determining a distance between two nodes in the index, comprising; computing an approximate number of paths between the two nodes, using the neighborhood sketches of the two nodes; and computing an approximate path length between the two nodes, using the neighborhood sketches of the two nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for efficiently computing a pairwise distance between nodes in a large graph, comprising:
-
a URL-sketch generator configured to create URL-sketches of web-pages in an index of web-pages, comprising; a web-page label extractor configured to extract a set of labels that reference a first web-page from a set of second web-pages in an index for respective web-pages in the index; and a label set store configured to store sets of labels for respective web-pages in for efficient storage and retrieval; a neighborhood generator configured to generate a neighborhood-sketch for the respective web-pages in the index, comprising; a neighborhood determiner configured to compute a neighborhood for a web-page comprising neighboring nodes, using the URL-sketches; and a neighborhood sketcher configured to generate a neighborhood sketch for the web-page using labels associated with the respective neighboring nodes; and a node distance measuring component configured to determining a distance between two nodes in the index. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A method for efficiently computing a pairwise distance between nodes in a large graph, comprising:
-
generating a URL-sketch of respective web-pages in an index of web-pages, comprising; extracting sets of keywords from the respective web-pages, comprising identifying proxy-strings in a first web-page that reference a link to a second web-page; and associating a set of proxy-strings from the web-pages in the index to the second web-page; and storing sets of keywords for respective web-pages for efficient storage and retrieval, comprising; mapping the set of keywords for a web-page into a bit-vector in hamming space; and storing the URL-sketch for respective web-pages in a hash table; generating a neighborhood-sketch for the respective web-pages in the index using the URL-sketches, comprising; determining a neighborhood for a web-page comprising one of; computing neighboring nodes using a random-surfer model starting from the web-page; computing neighboring nodes using a node similarity bias walk starting from the web-page; and computing neighboring nodes using a breadth-first type graph exploration starting from the web-page; generating a neighborhood sketch for the web-page using labels associated with the respective neighboring nodes, comprising; identifying a set of labels that are common to the respective nodes in the neighborhood; and storing sets of labels for respective nodes for efficient storage and retrieval; and determining a distance between two nodes in the index, comprising one of; computing an effective resistance between the two nodes using a hamming distance between the two nodes; and computing an approximate path length between the two nodes, comprising; determining an approximate number of paths between the two nodes; and determining a path length between the neighborhood-sketches for the two nodes using the approximate number of paths between the two nodes.
-
Specification