Method for semantic based storage and retrieval of information

US 20090182730A1
Filed: 03/06/2008
Published: 07/16/2009
Est. Priority Date: 01/14/2008
Status: Active Grant

First Claim

Patent Images

1. A method of storing semantically similar documents on proximally located peer in a structured peer to peer overlay network, each peer being assigned a unique identifier, each document comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the method comprising:

a. extracting a predetermined number of words from a document, belonging to a set of documents;

b. computing a concept similarity (CS) metrics between at least one pair of the extracted words;

c. computing a score S(T) for the extracted set of words by using the computed CS metrics, branch numbers to which the extracted words belong and the total number of branches in the hierarchical structured collection of words;

d. computing a hash value hash(T) for the document by using the computed S(T);

e. routing the computed hash(T) over the structured peer to peer overlay network;

f. storing hash(T) at a first successor peer node whose unique identifier is greater than hash(T); and

g. repeating steps a to f for each document belonging to the set of documents.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of storing semantically similar documents on proximally located peers in a structured peer to peer overlay network, where each peer is assigned a unique identifier and each document includes one or more words belonging to at least one hierarchical structured collection of words. A method of searching and retrieving documents, corresponding to a search query, from a structured peer to peer overlay network is also provided.

Citations

25 Claims

1. A method of storing semantically similar documents on proximally located peer in a structured peer to peer overlay network, each peer being assigned a unique identifier, each document comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the method comprising:
- a. extracting a predetermined number of words from a document, belonging to a set of documents;
  
  b. computing a concept similarity (CS) metrics between at least one pair of the extracted words;
  
  c. computing a score S(T) for the extracted set of words by using the computed CS metrics, branch numbers to which the extracted words belong and the total number of branches in the hierarchical structured collection of words;
  
  d. computing a hash value hash(T) for the document by using the computed S(T);
  
  e. routing the computed hash(T) over the structured peer to peer overlay network;
  
  f. storing hash(T) at a first successor peer node whose unique identifier is greater than hash(T); and
  
  g. repeating steps a to f for each document belonging to the set of documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method as claimed in claim 1, wherein the unique identifiers assigned to each peer range from 0 to (2ⁿ−
    - 1), where the structured peer to peer overlay network is a n bit Chord ring.
  - 3. The method as claimed in claim 1, wherein each peer maintains at least one of a successor table, a predecessor table and a finger table.
  - 4. The method as claimed in claim 1, wherein each document comprises one or more words belonging to at least one hierarchical tree structured taxonomy of words.
  - 5. The method as claimed in claim 1, wherein the number of words extracted from the document is greater than one.
  - 6. The method as claimed in claim 1, wherein the extracted words are arranged in a descending order of their importance.
  - 7. The method as claimed in claim 1, wherein the CS metrics computed for a pair of words is based on the location of the words in the hierarchical structured collection of words.
  - 8. The method as claimed in claim 1, wherein the CS metrics is computed with respect to every pair of the extracted words.
  - 9. The method as claimed in claim 1, wherein the value of S(T) ranges between 0 and 1.
  - 10. The method as claimed in claim 1, wherein hash(T) is computed by multiplying S(T) with (2ⁿ−
    - 1), where the structured peer to peer overlay network is a n bit Chord ring.
  - 11. The method as claimed in claim 1, wherein the value of hash(T) ranges from 0 to (2ⁿ−
    - 1), where the structured peer to peer overlay network is a n bit Chord ring.
  - 12. The method as claimed in claim 1, further comprising the step of storing a pointer pointing to the address of the document in the structured peer to peer overlay network and the hash(T) at a first successor peer node whose unique identifier is greater than hash(T).

13. A method of searching and retrieving documents corresponding to a search query, from a structured peer to peer overlay network, each peer being assigned a unique identifier, each document being assigned a document identifier key, each search query comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the method comprising:
- a. entering a query comprising a predetermined number of keywords;
  
  b. computing a concept similarity (CS) metrics between at least one pair of the keywords;
  
  c. computing a first score S(T₁) for a first set of the keywords by using the computed CS metrics, branch numbers to which the first set of keywords belong and the total number of branches in the hierarchical structured collection of words;
  
  d. computing a second score S(T₂) for a second set of the keywords by using the computed CS metrics, branch numbers to which the second set of keywords belong and the total number of branches in the hierarchical structured collection of words;
  
  e. computing a first hash value hash(T₁) by using the computed S(T₁);
  
  f. computing a second hash value hash(T₂) by using the computed S(T₂);
  
  g. computing a radius of search by obtaining the difference between the values of hash(T₁) and hash(T₂);
  
  h. computing a distributed hash table (DHT) key by using the computed radius of search and at least one of the computed hash values;
  
  i. routing the computed DHT key over the structured peer to peer overlay network; and
  
  j. extracting all document identification keys stored at a first peer whose unique identifier is greater than the DHT key; and
  
  k. retrieving the documents corresponding to the extracted document identification keys.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 25)
- - 14. The method as claimed in claim 13, further comprising:
    - a. routing the DHT key to an immediate successor peer;
      
      b. determining if the unique identifier of the peer is less than or equal to a value obtained by summing the computed radius of search and at least one of the computed hash values;
      
      c. extracting all document identification keys stored at the peer if the unique identifier of the peer is less than or equal to the value obtained by summing the computed radius of search and at least one of the computed hash values;
      
      d. retrieving the documents corresponding to the extracted document identification keys; and
      
      e. ranking the retrieved documents if the unique identifier of the peer is greater than the value obtained by summing the computed radius of search and at least one of the computed hash values.
  - 15. The method as claimed in claim 13, wherein each search query comprises one or more keywords belonging to at least one hierarchical tree structured taxonomy of words.
  - 16. The method as claimed in claim 13, wherein the number of keywords in a search query is greater than one.
  - 17. The method as claimed in claim 13, wherein the keywords are arranged in a descending order of their importance.
  - 18. The method as claimed in claim 13, wherein the CS metrics computed for a pair of keywords is based on the location of the words in the hierarchical structured collection of words.
  - 19. The method as claimed in claim 13, wherein the CS metrics is computed with respect to every pair of the keywords.
  - 20. The method as claimed in claim 13, wherein the values of S(T₁) and S(T₂) ranges between 0 and 1.
  - 21. The method as claimed in claim 13, wherein hash(T₁) and hash(T₂) are computed by multiplying S(T₁) and (ST₂) with (2ⁿ−
    - 1) respectively, where the structured peer to peer overlay network is a n bit Chord ring.
  - 22. The method as claimed in claim 13, wherein the values of hash(T₁) and hash(T₂) range from 0 to (2ⁿ−
    - 1), where the structured peer to peer overlay network is a n bit Chord ring.
  - 25. The computer readable storage medium as claimed in claim 13, wherein the program code further comprises instructions for:
    - a. routing the DHT key to an immediate successor peer;
      
      b. determining if the unique identifier of the peer is less than or equal to a value obtained by summing the computed radius of search and at least one of the computed hash values;
      
      c. extracting all document identification keys stored at the peer if the unique identifier of the peer is less than or equal to the value obtained by summing the computed radius of search and at least one of the computed hash values;
      
      d. retrieving the documents corresponding to the extracted document identification keys; and
      
      e. ranking the retrieved documents if the unique identifier of the peer is greater than the value obtained by summing the computed radius of search and at least one of the computed hash values.

23. A computer program readable storage medium having a computer readable program code embodied therein for storing semantically similar documents on proximally located peer in a structured peer to peer overlay network, each peer being assigned a unique identifier, each document comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the computer readable program code containing instructions for:
- a. extracting a predetermined number of words from a document, belonging to a set of documents;
  
  b. computing a concept similarity (CS) metrics between at least one pair of the extracted words;
  
  c. computing a score S(T) for the extracted set of words by using the computed CS metrics, branch numbers to which the extracted words belong and the total number of branches in the hierarchical structured collection of words;
  
  d. computing a hash value hash(T) for the document by using the computed S(T);
  
  e. routing the computed hash(T) over the structured peer to peer overlay network;
  
  f. storing hash(T) at a first successor peer node whose unique identifier is greater than hash(T); and
  
  g. repeating steps a to f for each document belonging to the set of documents.

24. A computer readable storage medium having a computer readable program code embodied therein for searching and retrieving documents corresponding to a search query, from a structured peer to peer overlay network, each peer being assigned a unique identifier, each document being assigned a document identifier key, each search query comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the computer readable program code containing instructions for:
- a. entering a query comprising a predetermined number of keywords;
  
  b. computing a concept similarity (CS) metrics between at least one pair of the keywords;
  
  c. computing a first score S(T₁) for a first set of the keywords by using the computed CS metrics, branch numbers to which the first set of keywords belong and the total number of branches in the hierarchical structured collection of words;
  
  d. computing a second score S(T₂) for a second set of the keywords by using the computed CS metrics, branch numbers to which the second set of keywords belong and the total number of branches in the hierarchical structured collection of words;
  
  e. computing a first hash value hash(T₁) by using the computed S(T₁);
  
  f. computing a second hash value hash(T₂) by using the computed S(T₂);
  
  g. computing a radius of search by obtaining the difference between the values of hash(T₁) and hash(T₂);
  
  h. computing a distributed hash table (DHT) key by using the computed radius of search and at least one of the computed hash values;
  
  i. routing the computed DHT key over the structured peer to peer overlay network; and
  
  j. extracting all document identification keys stored at a first peer whose unique identifier is greater than the DHT key; and
  
  k. retrieving the documents corresponding to the extracted document identification keys.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Infosys Limited
Original Assignee
Infosys Technologies Limited (Infosys Limited)
Inventors
Mondal, Abdul Sakib, Krishnamoorthy, Srikumar

Granted Patent

US 7,870,133 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/355 Class or cluster creation o...

G06F 16/93 Document management systems

Method for semantic based storage and retrieval of information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Method for semantic based storage and retrieval of information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links