Process for determination of text relevancy
First Claim
1. A Computer implemented method for ranking documents being searched in a database by a word query according to text relevancy comprising the steps of:
- (a) inputting a word query to a computer database of documents;
(b) selecting each document by the word query;
(c) determining a real value number for each document, comprising the steps of;
(i) calculating a first importance value for each word in the selected document;
(ii) calculating a second importance value for each word in the query that matches a word in the document;
(iii) determining a probability value for each word in the query matching a semantic category;
(iv) determining a probability value for each word in the document matching a semantic category;
(v) adjusting for each word in .the query that does not exist in the database of the document;
(vi) repeating steps (i) to (iv) for each adjusted word;
(vii) calculating weights of a semantic component in the query based on the importance value, the probability value and frequency of the word in the document;
(viii) calculating weights of a semantic component in the document based on the importance value, the probability value and frequency of word in the query;
(ix) multiplying query component weights by document component weights into products; and
(x) adding the products together to represent the real-value number for the selected document; and
(d) repeating step (c) for each additional document selected by the query; and
(e) sorting the documents of the database according to their respective real value numbers.
2 Assignments
0 Petitions
Accused Products
Abstract
This is a procedure for determining text relevancy and can be used to enhance the retrieval of text documents by search queries. This system helps a user intelligently and rapidly locate information found in large textual databases. A first embodiment determines the common meanings between each word in the query and each word in the document. Then an adjustment is made for words in the query that are not in the documents. Further, weights are calculated for both the semantic components in the query and the semantic components in the documents. These weights are multiplied together, and their products are subsequently added to one another to determine a real value number (similarity coefficient) for each document. Finally, the documents are sorted in sequential order according to their real value number from largest to smallest value. Another, embodiment is for routing documents to topics/headings (sometimes referred to as filtering). Here, the importance of each word in both topics and documents are calculated. Then, the real value number (similarity coefficient) for each document is determined. Then each document is routed one at a time according to their respective real value numbers to one or more topics. Finally, once the documents are located with their topics, the documents can be sorted. This system can be used to search and route all kinds of document collections, such as collections of legal documents, medical documents, news stories, and patents.
-
Citations
9 Claims
-
1. A Computer implemented method for ranking documents being searched in a database by a word query according to text relevancy comprising the steps of:
-
(a) inputting a word query to a computer database of documents; (b) selecting each document by the word query; (c) determining a real value number for each document, comprising the steps of; (i) calculating a first importance value for each word in the selected document; (ii) calculating a second importance value for each word in the query that matches a word in the document; (iii) determining a probability value for each word in the query matching a semantic category; (iv) determining a probability value for each word in the document matching a semantic category; (v) adjusting for each word in .the query that does not exist in the database of the document; (vi) repeating steps (i) to (iv) for each adjusted word; (vii) calculating weights of a semantic component in the query based on the importance value, the probability value and frequency of the word in the document; (viii) calculating weights of a semantic component in the document based on the importance value, the probability value and frequency of word in the query; (ix) multiplying query component weights by document component weights into products; and (x) adding the products together to represent the real-value number for the selected document; and (d) repeating step (c) for each additional document selected by the query; and (e) sorting the documents of the database according to their respective real value numbers. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer implemented method of routing and filtering documents to topics comprising the steps of:
-
breaking down each document for routing into small portions of up to approximately 250 words in length; calculating importance values of each word in both topics and the small portions of the documents; determining real value numbers for each of the small portions of document to each topic based on the importance values; calculating the real value number for the selected document based on adding the real value numbers of the small portions of the selected document; routing each document according to their respective real value numbers to one or more topics; and sorting the routed documents at each topic. - View Dependent Claims (7, 8, 9)
-
Specification