Process for determination of text relevancy
First Claim
1. A computer implemented method of creating similarity coefficients between sequences of words in documents that are being searched in a database by a natural word query without parsing the query words nor the words in the documents, and without removing any of the query words and any of the words in the documents, the method comprising the steps of:
- (a) branching out the meanings of each and every word in a natural word query into respective probabilities of occurrence for each of the meanings in the natural word query;
(b) branching out the meanings of words in a document searched by the natural word query into respective probabilities of occurrence for each of the meanings of the words in each of the documents;
(c) determining a similarity coefficient between the probabilities of occurrence of words in the natural language query and the probabilities of occurrence of the words in the document;
(d) repeating steps (a) to (c) for each additional document searched by the natural language query; and
(e) ranking the documents being searched in order of their similarity coefficients without parsing of the natural language query and the documents, and without removing any words from the natural language query nor from the documents.
1 Assignment
0 Petitions
Accused Products
Abstract
This is a procedure for determining text relevancy and can be used to enhance the retrieval of text documents by search queries. This system helps a user intelligently and rapidly locate information found in large textual databases. A first embodiment determines the common meanings between each word in the query and each word in the document. Then an adjustment is made for words in the query that are not in the documents. Further, weights are calculated for both the semantic components in the query and the semantic components in the documents. These weights are multiplied together, and their products are subsequently added to one another to determine a real value number(similarity coefficient) for each document. Finally, the documents are sorted in sequential order according to their real value number from largest to smallest value. Another, embodiment is for routing documents to topics/headings (sometimes referred to as faltering). Here, the importance of each word in both topics and documents are calculated. Then, the real value number (similarity coefficient) for each document is determined. Then each document is routed one at a time according to their respective real value numbers to one or more topics. Finally, once the documents are located with their topics, the documents can be sorted. This system can be used to search and route all kinds of document collections, such as collections of legal documents, medical documents, news stories, and patents.
-
Citations
9 Claims
-
1. A computer implemented method of creating similarity coefficients between sequences of words in documents that are being searched in a database by a natural word query without parsing the query words nor the words in the documents, and without removing any of the query words and any of the words in the documents, the method comprising the steps of:
-
(a) branching out the meanings of each and every word in a natural word query into respective probabilities of occurrence for each of the meanings in the natural word query; (b) branching out the meanings of words in a document searched by the natural word query into respective probabilities of occurrence for each of the meanings of the words in each of the documents; (c) determining a similarity coefficient between the probabilities of occurrence of words in the natural language query and the probabilities of occurrence of the words in the document; (d) repeating steps (a) to (c) for each additional document searched by the natural language query; and (e) ranking the documents being searched in order of their similarity coefficients without parsing of the natural language query and the documents, and without removing any words from the natural language query nor from the documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer implemented method of creating similarity coefficients between sequences of words in documents that are being searched in a database by a natural word query without parsing the query words nor the words in the documents, and without removing any of the query words and any of the words in the documents, the method comprising the steps of:
-
(a) branching out the meanings of each and every word in a natural word query into respective probabilities of occurrence for each of the meanings in the natural word query, wherein the query includes at least one word; (b) branching out the meanings of each and every word in a document searched by the natural word query into respective probabilities of occurrence for each of the meanings of the words in each of the documents, wherein the document includes at least one word; (c) determining a similarity coefficient between the probabilities of occurrence of words in the natural language query and the probabilities of occurrence of the words in the document; (d) repeating steps (a) to (c) for each additional document searched by the natural language query; and (e) ranking all the documents being searched in order of their similarity coefficients without parsing of the natural language query and the documents, and without removing any words from the natural language query nor from the documents.
-
Specification