SYSTEM AND METHOD FOR DISTRIBUTED INDEX SEARCHING OF ELECTRONIC CONTENT

US 20100094877A1
Filed: 10/13/2009
Published: 04/15/2010
Est. Priority Date: 10/13/2008
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a peer network node;

a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node; and

wherein the peer network node includes logic for executing software to;

parse a document into keywords in a search term list;

rank order the keywords within the search term list;

for each of the rank-ordered keywords in the search term list;

identify the rank-ordered keyword as a primary keyword;

determine a unique node identifier corresponding to a hosting node in the peer network, the hosting node configured to;

store an inverted index entry including the primary keyword and an identifier corresponding to the document; and

store a string in a Bloom filter data structure stored on the hosting node;

identify one or more secondary keywords in the search term list;

store the primary keyword and the document identifier in the inverted index stored in the hosting node; and

store the one or more secondary keywords in the Bloom filter data structure.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There are provided methods and systems for efficient search in a peer-to-peer network topology. In various embodiments, search methods and systems provide for response times and network traffic that are independent from the number of query terms, thereby producing constant run-time searches and bandwidth hits in a P2P network search implementation. By distributing inverse indexes between peers, and storing with each inverse index a Bloom filter populated with selected keywords, multi-term search and analysis can be conducted on one network node without requiring exchange of posting lists between various network nodes.

Citations

68 Claims

1. A system comprising:
- a peer network node;
  
  a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node; and
  
  wherein the peer network node includes logic for executing software to;
  
  parse a document into keywords in a search term list;
  
  rank order the keywords within the search term list;
  
  for each of the rank-ordered keywords in the search term list;
  
  identify the rank-ordered keyword as a primary keyword;
  
  determine a unique node identifier corresponding to a hosting node in the peer network, the hosting node configured to;
  
  store an inverted index entry including the primary keyword and an identifier corresponding to the document; and
  
  store a string in a Bloom filter data structure stored on the hosting node;
  
  identify one or more secondary keywords in the search term list;
  
  store the primary keyword and the document identifier in the inverted index stored in the hosting node; and
  
  store the one or more secondary keywords in the Bloom filter data structure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14)
- - 2. The system of claim 1 wherein the logic is a computer configured with a processor coupled to:
    - a memory, a display, a user interface, and a network interface.
  - 3. The system of claim 1 wherein all terms from the document are rank ordered and used as separate primary keywords within the search term list.
  - 4. The system of claim 3 wherein the rank ordering is performed by alphanumeric order.
  - 5. The system of claim 1 further comprising changing case of keywords within the search term list to lower case.
  - 6. The system of claim 1 further including removing duplicate keywords from the search term list.
  - 7. The system of claim 1 further including removing stop words from the search term list.
  - 8. The system of claim 1 wherein the one or more secondary keywords are stored in the Bloom filter data structure if they are of lower rank order than the primary keyword.
  - 9. The system of claim 1 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes obtaining a hash value of the primary keyword and determining a closest unique node identifier to that hash value.
  - 10. The system of claim 9 wherein the primary keyword contains multiple keywords.
  - 11. The system of claim 1 wherein the software further determines a distance of one or more secondary keywords from the primary keyword.
  - 13. The system of claim 1 wherein the primary keyword stored in the inverted index further includes multiple keywords.
  - 14. The system of claim 1 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes determining which node in the peer-to-peer network stores an inverted index containing primary keyword that has a plurality of keywords within the primary keyword.

12. The system of claim 12 wherein the software stores the distance of the one or more secondary key strings in the Bloom filter data structure.

15. A method for indexing a document to be searched within a peer-to-peer network architecture, the method comprising:
- parsing a document into keywords in a search term list;
  
  ranking order the keywords within the search term list;
  
  for each of the rank-ordered keywords in the search term list;
  
  identifying the rank-ordered keyword as a primary keyword;
  
  determining a unique node identifier corresponding to a hosting node in the peer network, whereby the hosting node;
  
  stores an inverted index entry including the primary keyword and an identifier corresponding to the document; and
  
  stores a string in a Bloom filter data structure stored on the hosting node;
  
  identifying one or more secondary keywords in the search term list;
  
  storing the primary keyword and the document identifier in the inverted index stored in the hosting node; and
  
  storing the one or more secondary keywords in the Bloom filter data structure.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 16. The method of claim 15 further comprising rank ordering all terms from the document and using all terms as separate primary keywords within the search term list.
  - 17. The method of claim 16 wherein the rank ordering is performed by alphanumeric order.
  - 18. The method of claim 15 further comprising changing case of keywords within the search term list to lower case.
  - 19. The method of claim 15 further including removing duplicate keywords from the search term list.
  - 20. The method of claim 15 further including removing stop words from the search term list.
  - 21. The method of claim 15 wherein the one or more secondary keywords are stored in the Bloom filter data structure if they are of lower rank order than the primary keyword.
  - 22. The method of claim 15 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes obtaining a hash value of the primary keyword and determining a closest unique node identifier to that hash value.
  - 23. The method of claim 22 wherein the primary keyword contains multiple keywords.
  - 24. The method of claim 15 further comprising determining, by the software, a distance of one or more secondary keywords from the primary keyword.
  - 25. The method of claim 24 wherein the software stores the distance of the one or more secondary key strings in the Bloom filter data structure.
  - 26. The method of claim 15 wherein the primary keyword stored in the inverted index further includes multiple keywords.
  - 27. The method of claim 15 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes determining which node in the peer-to-peer network stores an inverted index containing primary keyword that has a plurality of keywords within the primary keyword.

28. A system comprising:
- a peer network node;
  
  a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node;
  
  wherein the peer network node includes logic for executing software to;
  
  obtain a primary keyword from a search string;
  
  obtain one or more secondary keywords from the search string;
  
  determine a unique node identifier corresponding to a hosting node in the peer network, wherein the hosting node stores;
  
  an inverted index including the primary keyword and a reference identifier to a document that contains the primary keyword; and
  
  a bloom function data structure corresponding to one or more related strings within the document; and
  
  wherein the software determines whether the one or more secondary keywords are present within the document by determining whether the one or more secondary keywords have been stored within the Bloom function data structure.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 42)
- - 29. The system of claim 28 wherein the logic is a computer configured with a processor coupled to:
    - a memory, a display, a user interface, and a network interface.
  - 30. The system of claim 28 wherein the logic is further configured to format a report to a user, the report comprising a list of addresses containing documents corresponding to at least one of:
    - the primary keyword; and
      
      the one or more secondary keywords stored within the Bloom function data structure.
  - 31. The system of claim 28 wherein the primary keyword is obtained by rank ordering search terms from the search string by a predetermined rank order criterion and selecting the highest rank-ordered search term as the primary keyword.
  - 32. The system of claim 31 wherein the rank order criterion is alphanumeric order.
  - 33. The system of claim 28 further comprising changing case of keywords within the search string to lower case.
  - 34. The system of claim 28 further including removing duplicate keywords from the search string.
  - 35. The system of claim 28 further including removing stop words from the search string.
  - 36. The system of claim 28 wherein secondary keywords are searched in the Bloom filter data structure if they are of lower rank order than the primary keyword.
  - 37. The system of claim 28 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes obtaining a hash value of the primary keyword and determining the closest unique node identifier to that hash value.
  - 38. The system of claim 28 wherein the software further determines the existence of a plurality of all secondary keywords in the Bloom function data structure.
  - 39. The system of claim 38 wherein the software further determines the existence of all distance indicators for all secondary keywords from the primary keywords in the bloom function data structure.
  - 42. The system of claim 28 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes determining which node in the peer-to-peer network stores an inverted index containing primary keyword that has predetermined multiple keywords within the primary keyword.

40. The system of claim 40 wherein the software further determines that the distance is within a predetermined keyword distance.
- View Dependent Claims (41)
- - 41. The system of claim 40 wherein the primary keyword stored in the inverted index further includes multiple keywords.

43. A method for searching for one or more documents indexed in a peer-to-peer network architecture, the method comprising:
- obtaining a primary keyword from a search string;
  
  obtaining one or more secondary keywords from the search string;
  
  determining a unique node identifier corresponding to a hosting node in the peer network, wherein the hosting node stores;
  
  an inverted index including the primary keyword and a reference identifier to a document that contains the primary keyword; and
  
  a bloom function data structure corresponding to one or more related strings within the document; and
  
  wherein the software determines whether the one or more secondary keywords are present within the document by determining whether the one or more secondary keywords have been stored within the Bloom function data structure.
- View Dependent Claims (44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
- - 44. The method of claim 43 further comprising formatting a report to a user, the report comprising a list of addresses containing documents corresponding to at least one of:
    - the primary keyword; and
      
      the one or more secondary keywords stored within the Bloom function data structure.
  - 45. The method of claim 43 wherein the primary keyword is obtained by rank ordering search terms from the search string by a predetermined rank order criterion and selecting the highest rank-ordered search term as the primary keyword.
  - 46. The method of claim 45 wherein the rank order criterion is alphanumeric order.
  - 47. The method of claim 43 further comprising changing case of keywords within the search string to lower case.
  - 48. The method of claim 43 further including removing duplicate keywords from the search string.
  - 49. The method of claim 43 further including removing stop words from the search string.
  - 50. The method of claim 43 wherein secondary keywords are searched in the Bloom filter data structure if they are of lower rank order than the primary keyword.
  - 51. The method of claim 43 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes obtaining a hash value of the primary keyword and determining the closest unique node identifier to that hash value.
  - 52. The method of claim 51 wherein the software further determines the existence of a plurality of all secondary keywords in the Bloom function data structure.
  - 53. The method of claim 43 wherein the software further determines the existence of all distance indicators for all secondary keywords from the primary keywords in the bloom function data structure.
  - 54. The method of claim 53 wherein the software further determines that the distance is within a predetermined keyword distance.
  - 55. The method of claim 53 wherein the primary keyword stored in the inverted index further includes multiple keywords.
  - 56. The method of claim 43 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes determining which node in the peer-to-peer network stores an inverted index containing primary keyword that has predetermined multiple keywords within the primary keyword.

57. A system comprising:
- a peer network node;
  
  a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node;
  
  a means for indexing documents for searching, the indexing performed on keyword combinations and partitioned between multiple nodes in the network; and
  
  a means for searching for the indexed documents by multiple keyword combinations indexed across multiple nodes in the network.

58. A system comprising:
- a peer network node;
  
  a provided peer-to-peer network connected to the peer network node and configured to interoperate with the peer network node; and
  
  wherein the peer network node includes logic for executing software to;
  
  parse a document into separate keywords in a search term list;
  
  rank order the keywords within the search term list;
  
  for each of the rank-ordered keywords in the search term list;
  
  (i) create a list of addresses referring to one or more web pages that include at least one instance of the rank ordered keyword;
  
  (ii) rank order the list of addresses by relevance; and
  
  (iii) reduce the list of addresses is to k-most relevant addresses, where k is a predetermined number;
  
  create a set of query index terms from the search term list, the set of index query terms comprising at least one of a keyword from the search term list and a combination of keywords from the search term list;
  
  remove from the set of query index terms at least one combination of keywords that represents a shorter keyword combination; and
  
  for each of the remaining query index terms in the set;
  
  (i) identify the query index term as a primary query index term;
  
  determine a unique node identifier corresponding to a hosting node in the peer network, the hosting node configured to;
  
  store an inverted index entry including the a primary query index term and identifiers corresponding to the to k-most relevant addresses for that query index term; and
  
  store a string in a Bloom filter data structure stored on the hosting node;
  
  (ii) identify one or more secondary query index terms;
  
  (iii) store the a primary query index term and identifiers corresponding to the to k-most relevant addresses for that query index term in the inverted index of the hosting node; and
  
  (iv) store the one or more secondary query index terms and their respectively associated k-most relevant addresses in the Bloom filter data structure.
- View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
- - 59. The system of claim 58 wherein the logic is a computer configured with a processor coupled to:
    - a memory, a display, a user interface, and a network interface.
  - 60. The system of claim 58 wherein all terms from the document are rank ordered in alphanumeric order;
    - andall terms are used as separate primary keywords within the search term list.
  - 61. The system of claim 58 further comprising changing case of keywords within the search term list to lower case.
  - 62. The system of claim 58 further comprising forming keywords in the search term list to root stem words.
  - 63. The system of claim 58 further including removing duplicate keywords from the search term list.
  - 64. The system of claim 58 further including removing stop words from the search term list.
  - 65. The system of claim 58 wherein the one or more secondary keywords are stored in the Bloom filter data structure if they are of lower rank order than the primary keyword.
  - 66. The system of claim 58 wherein determining a unique node identifier corresponding to a hosting node in the peer network further includes obtaining a hash value of the primary keyword and determining a closest unique node identifier to that hash value.
  - 67. The system of claim 58 wherein combinations of keywords are created which do not include shorter keyword combinations with less than the top-k most relevant pages stored in the index.
  - 68. The system of claim 58 wherein the system is further configured to:
    - obtain a primary keyword from a search string;
      
      obtain one or more secondary keywords from the search string, the secondary keywords comprising at least one of a single word or a combination of words from the search string;
      
      create a limited search set comprising one or more keyword combinations from the primary and secondary keywords, wherein each of the respective primary and secondary keyword elements have the top-k most relevant pages stored in the inverted index;
      
      identify one or more hosting nodes of the peer-to-peer network that store in the inverted index at least one keyword from the limited search set;
      
      for each of the identified hosting nodes;
      
      (i) if the keyword stored in the inverted index of the hosting node is a single word, format a report for a user containing addresses of all documents that are referenced by that keyword in the inverted index;
      
      (ii) if the keyword in the inverted index of the hosting node comprises a plurality of words, format a report for a user containing addresses of all documents that are referenced by the Bloom filter data structure entry for that keyword.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Wolf Garbe
Original Assignee
Wolf Garbe
Inventors
Garbe, Wolf

Granted Patent

US 8,359,318 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/742
CPC Class Codes

G06F 16/14   Details of searching files ...

G06F 16/1834   implemented based on peer-t...

G06F 16/93   Document management systems

G06F 16/95   Retrieval from the web

SYSTEM AND METHOD FOR DISTRIBUTED INDEX SEARCHING OF ELECTRONIC CONTENT

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

68 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR DISTRIBUTED INDEX SEARCHING OF ELECTRONIC CONTENT

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

68 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links