SYSTEM AND METHOD FOR DISTRIBUTED INDEX SEARCHING OF ELECTRONIC CONTENT
First Claim
Patent Images
1. A system comprising:
- a peer network node;
a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node; and
wherein the peer network node includes logic for executing software to;
parse a document into keywords in a search term list;
rank order the keywords within the search term list;
for each of the rank-ordered keywords in the search term list;
identify the rank-ordered keyword as a primary keyword;
determine a unique node identifier corresponding to a hosting node in the peer network, the hosting node configured to;
store an inverted index entry including the primary keyword and an identifier corresponding to the document; and
store a string in a Bloom filter data structure stored on the hosting node;
identify one or more secondary keywords in the search term list;
store the primary keyword and the document identifier in the inverted index stored in the hosting node; and
store the one or more secondary keywords in the Bloom filter data structure.
0 Assignments
0 Petitions
Accused Products
Abstract
There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes.
-
Citations
68 Claims
-
1. A system comprising:
-
a peer network node; a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node; and wherein the peer network node includes logic for executing software to; parse a document into keywords in a search term list; rank order the keywords within the search term list; for each of the rank-ordered keywords in the search term list; identify the rank-ordered keyword as a primary keyword; determine a unique node identifier corresponding to a hosting node in the peer network, the hosting node configured to; store an inverted index entry including the primary keyword and an identifier corresponding to the document; and store a string in a Bloom filter data structure stored on the hosting node; identify one or more secondary keywords in the search term list; store the primary keyword and the document identifier in the inverted index stored in the hosting node; and store the one or more secondary keywords in the Bloom filter data structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14)
-
-
12. The system of claim 12 wherein the software stores the distance of the one or more secondary key strings in the Bloom filter data structure.
-
15. A method for indexing a document to be searched within a peer-to-peer network architecture, the method comprising:
-
parsing a document into keywords in a search term list; ranking order the keywords within the search term list; for each of the rank-ordered keywords in the search term list; identifying the rank-ordered keyword as a primary keyword; determining a unique node identifier corresponding to a hosting node in the peer network, whereby the hosting node; stores an inverted index entry including the primary keyword and an identifier corresponding to the document; and stores a string in a Bloom filter data structure stored on the hosting node; identifying one or more secondary keywords in the search term list; storing the primary keyword and the document identifier in the inverted index stored in the hosting node; and storing the one or more secondary keywords in the Bloom filter data structure. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A system comprising:
-
a peer network node; a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node; wherein the peer network node includes logic for executing software to; obtain a primary keyword from a search string; obtain one or more secondary keywords from the search string; determine a unique node identifier corresponding to a hosting node in the peer network, wherein the hosting node stores; an inverted index including the primary keyword and a reference identifier to a document that contains the primary keyword; and a bloom function data structure corresponding to one or more related strings within the document; and wherein the software determines whether the one or more secondary keywords are present within the document by determining whether the one or more secondary keywords have been stored within the Bloom function data structure. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 42)
-
- 40. The system of claim 40 wherein the software further determines that the distance is within a predetermined keyword distance.
-
43. A method for searching for one or more documents indexed in a peer-to-peer network architecture, the method comprising:
-
obtaining a primary keyword from a search string; obtaining one or more secondary keywords from the search string; determining a unique node identifier corresponding to a hosting node in the peer network, wherein the hosting node stores; an inverted index including the primary keyword and a reference identifier to a document that contains the primary keyword; and a bloom function data structure corresponding to one or more related strings within the document; and wherein the software determines whether the one or more secondary keywords are present within the document by determining whether the one or more secondary keywords have been stored within the Bloom function data structure. - View Dependent Claims (44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
-
-
57. A system comprising:
-
a peer network node; a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node; a means for indexing documents for searching, the indexing performed on keyword combinations and partitioned between multiple nodes in the network; and a means for searching for the indexed documents by multiple keyword combinations indexed across multiple nodes in the network.
-
-
58. A system comprising:
-
a peer network node; a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node; and wherein the peer network node includes logic for executing software to; parse a document into separate keywords in a search term list; rank order the keywords within the search term list; for each of the rank-ordered keywords in the search term list; (i) create a list of addresses referring to one or more web pages that include at least one instance of the rank ordered keyword; (ii) rank order the list of addresses by relevance; and (iii) reduce the list of addresses is to k-most relevant addresses, where k is a predetermined number; create a set of query index terms from the search term list, the set of index query terms comprising at least one of a keyword from the search term list and a combination of keywords from the search term list; remove from the set of query index terms at least one combination of keywords that represents a shorter keyword combination; and for each of the remaining query index terms in the set; (i) identify the query index term as a primary query index term; determine a unique node identifier corresponding to a hosting node in the peer network, the hosting node configured to; store an inverted index entry including the a primary query index term and identifiers corresponding to the to k-most relevant addresses for that query index term; and store a string in a Bloom filter data structure stored on the hosting node; (ii) identify one or more secondary query index terms; (iii) store the a primary query index term and identifiers corresponding to the to k-most relevant addresses for that query index term in the inverted index of the hosting node; and (iv) store the one or more secondary query index terms and their respectively associated k-most relevant addresses in the Bloom filter data structure. - View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
-
Specification