Method for semantic based storage and retrieval of information
First Claim
1. A method of storing semantically similar documents on proximally located peer in a structured peer to peer overlay network, each peer being assigned a unique identifier, each document comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the method comprising:
- a. extracting a predetermined number of words from a document, belonging to a set of documents;
b. computing a concept similarity (CS) metrics between at least one pair of the extracted words;
c. computing a score S(T) for the extracted set of words by using the computed CS metrics, branch numbers to which the extracted words belong and the total number of branches in the hierarchical structured collection of words;
d. computing a hash value hash(T) for the document by using the computed S(T);
e. routing the computed hash(T) over the structured peer to peer overlay network;
f. storing hash(T) at a first successor peer node whose unique identifier is greater than hash(T); and
g. repeating steps a to f for each document belonging to the set of documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of storing semantically similar documents on proximally located peers in a structured peer to peer overlay network, where each peer is assigned a unique identifier and each document includes one or more words belonging to at least one hierarchical structured collection of words. A method of searching and retrieving documents, corresponding to a search query, from a structured peer to peer overlay network is also provided.
-
Citations
25 Claims
-
1. A method of storing semantically similar documents on proximally located peer in a structured peer to peer overlay network, each peer being assigned a unique identifier, each document comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the method comprising:
-
a. extracting a predetermined number of words from a document, belonging to a set of documents; b. computing a concept similarity (CS) metrics between at least one pair of the extracted words; c. computing a score S(T) for the extracted set of words by using the computed CS metrics, branch numbers to which the extracted words belong and the total number of branches in the hierarchical structured collection of words; d. computing a hash value hash(T) for the document by using the computed S(T); e. routing the computed hash(T) over the structured peer to peer overlay network; f. storing hash(T) at a first successor peer node whose unique identifier is greater than hash(T); and g. repeating steps a to f for each document belonging to the set of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of searching and retrieving documents corresponding to a search query, from a structured peer to peer overlay network, each peer being assigned a unique identifier, each document being assigned a document identifier key, each search query comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the method comprising:
-
a. entering a query comprising a predetermined number of keywords; b. computing a concept similarity (CS) metrics between at least one pair of the keywords; c. computing a first score S(T1) for a first set of the keywords by using the computed CS metrics, branch numbers to which the first set of keywords belong and the total number of branches in the hierarchical structured collection of words; d. computing a second score S(T2) for a second set of the keywords by using the computed CS metrics, branch numbers to which the second set of keywords belong and the total number of branches in the hierarchical structured collection of words; e. computing a first hash value hash(T1) by using the computed S(T1); f. computing a second hash value hash(T2) by using the computed S(T2); g. computing a radius of search by obtaining the difference between the values of hash(T1) and hash(T2); h. computing a distributed hash table (DHT) key by using the computed radius of search and at least one of the computed hash values; i. routing the computed DHT key over the structured peer to peer overlay network; and j. extracting all document identification keys stored at a first peer whose unique identifier is greater than the DHT key; and k. retrieving the documents corresponding to the extracted document identification keys. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 25)
-
-
23. A computer program readable storage medium having a computer readable program code embodied therein for storing semantically similar documents on proximally located peer in a structured peer to peer overlay network, each peer being assigned a unique identifier, each document comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the computer readable program code containing instructions for:
-
a. extracting a predetermined number of words from a document, belonging to a set of documents; b. computing a concept similarity (CS) metrics between at least one pair of the extracted words; c. computing a score S(T) for the extracted set of words by using the computed CS metrics, branch numbers to which the extracted words belong and the total number of branches in the hierarchical structured collection of words; d. computing a hash value hash(T) for the document by using the computed S(T); e. routing the computed hash(T) over the structured peer to peer overlay network; f. storing hash(T) at a first successor peer node whose unique identifier is greater than hash(T); and g. repeating steps a to f for each document belonging to the set of documents.
-
-
24. A computer readable storage medium having a computer readable program code embodied therein for searching and retrieving documents corresponding to a search query, from a structured peer to peer overlay network, each peer being assigned a unique identifier, each document being assigned a document identifier key, each search query comprising one or more words belonging to at least one hierarchical structured collection of words, the hierarchical structure comprising a plurality of branches, the branches being sequentially numbered, the computer readable program code containing instructions for:
-
a. entering a query comprising a predetermined number of keywords; b. computing a concept similarity (CS) metrics between at least one pair of the keywords; c. computing a first score S(T1) for a first set of the keywords by using the computed CS metrics, branch numbers to which the first set of keywords belong and the total number of branches in the hierarchical structured collection of words; d. computing a second score S(T2) for a second set of the keywords by using the computed CS metrics, branch numbers to which the second set of keywords belong and the total number of branches in the hierarchical structured collection of words; e. computing a first hash value hash(T1) by using the computed S(T1); f. computing a second hash value hash(T2) by using the computed S(T2); g. computing a radius of search by obtaining the difference between the values of hash(T1) and hash(T2); h. computing a distributed hash table (DHT) key by using the computed radius of search and at least one of the computed hash values; i. routing the computed DHT key over the structured peer to peer overlay network; and j. extracting all document identification keys stored at a first peer whose unique identifier is greater than the DHT key; and k. retrieving the documents corresponding to the extracted document identification keys.
-
Specification