LARGE GRAPH MEASUREMENT
First Claim
1. A method for efficiently computing a pairwise distance between nodes in a large graph, comprising:
- generating a URL-sketch of respective web-pages in an index of web-pages, comprising;
extracting labels from the respective web-pages; and
storing sets of labels for respective web-pages in the index for efficient storage and retrieval;
generating a neighborhood-sketch for the respective web-pages in the index using the URL-sketches, comprising;
determining a neighborhood for a web-page comprising neighboring nodes; and
generating a neighborhood sketch for the web-page using labels associated with the respective neighboring nodes; and
determining a distance between two nodes in the index, comprising;
computing an approximate number of paths between the two nodes, using the neighborhood sketches of the two nodes; and
computing an approximate path length between the two nodes, using the neighborhood sketches of the two nodes.
2 Assignments
0 Petitions
Accused Products
Abstract
As provided herein, a pairwise distance between nodes in a large graph can be determined efficiently. URL-sketches are generated for respective nodes in an index by extracting labels from respective nodes, which provide a reference to a link between the nodes, aggregating the labels into sets for respective nodes, and storing the sets of labels as URL-sketches. Neighborhood-sketches are generated for the respective nodes in the index using the URL-sketches, by determining a neighborhood for a node and generating a sketch using labels that are associated with the respective neighboring nodes. A distance between two nodes is determined by computing an approximate number of paths and an approximate path length between the two nodes, using the neighborhood sketches for the two nodes.
-
Citations
20 Claims
-
1. A method for efficiently computing a pairwise distance between nodes in a large graph, comprising:
-
generating a URL-sketch of respective web-pages in an index of web-pages, comprising; extracting labels from the respective web-pages; and storing sets of labels for respective web-pages in the index for efficient storage and retrieval; generating a neighborhood-sketch for the respective web-pages in the index using the URL-sketches, comprising; determining a neighborhood for a web-page comprising neighboring nodes; and generating a neighborhood sketch for the web-page using labels associated with the respective neighboring nodes; and determining a distance between two nodes in the index, comprising; computing an approximate number of paths between the two nodes, using the neighborhood sketches of the two nodes; and computing an approximate path length between the two nodes, using the neighborhood sketches of the two nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for efficiently computing a pairwise distance between nodes in a large graph, comprising:
-
a URL-sketch generator configured to create URL-sketches of web-pages in an index of web-pages, comprising; a web-page label extractor configured to extract a set of labels that reference a first web-page from a set of second web-pages in an index for respective web-pages in the index; and a label set store configured to store sets of labels for respective web-pages in for efficient storage and retrieval; a neighborhood generator configured to generate a neighborhood-sketch for the respective web-pages in the index, comprising; a neighborhood determiner configured to compute a neighborhood for a web-page comprising neighboring nodes, using the URL-sketches; and a neighborhood sketcher configured to generate a neighborhood sketch for the web-page using labels associated with the respective neighboring nodes; and a node distance measuring component configured to determining a distance between two nodes in the index. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A method for efficiently computing a pairwise distance between nodes in a large graph, comprising:
-
generating a URL-sketch of respective web-pages in an index of web-pages, comprising; extracting sets of keywords from the respective web-pages, comprising identifying proxy-strings in a first web-page that reference a link to a second web-page; and associating a set of proxy-strings from the web-pages in the index to the second web-page; and storing sets of keywords for respective web-pages for efficient storage and retrieval, comprising; mapping the set of keywords for a web-page into a bit-vector in hamming space; and storing the URL-sketch for respective web-pages in a hash table; generating a neighborhood-sketch for the respective web-pages in the index using the URL-sketches, comprising; determining a neighborhood for a web-page comprising one of; computing neighboring nodes using a random-surfer model starting from the web-page; computing neighboring nodes using a node similarity bias walk starting from the web-page; and computing neighboring nodes using a breadth-first type graph exploration starting from the web-page; generating a neighborhood sketch for the web-page using labels associated with the respective neighboring nodes, comprising; identifying a set of labels that are common to the respective nodes in the neighborhood; and storing sets of labels for respective nodes for efficient storage and retrieval; and determining a distance between two nodes in the index, comprising one of; computing an effective resistance between the two nodes using a hamming distance between the two nodes; and computing an approximate path length between the two nodes, comprising; determining an approximate number of paths between the two nodes; and determining a path length between the neighborhood-sketches for the two nodes using the approximate number of paths between the two nodes.
-
Specification