Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text
First Claim
1. A method for retrieving relevant text data from a text database collection in a computer without annotating, parsing or pruning the text database collection, comprising the steps of:
- (a) searching a text database collection in a computer using a first search query of natural language to retrieve a first group of selected small pieces of text, where each of the selected small pieces of text corresponds to a document;
(b) weighting each word of the selected small pieces of text with semantics to form document weighted values for each of the selected small pieces of text in the first group;
(c) weighting each word in the first search query with semantics to form query weighted values;
(d) combining the query weighted values and the document weighted values to form similarity values for each of the selected small pieces of text;
(e) ranking the similarity values for each of the selected small pieces of text to form a first ranked list;
(f) applying feedback information based on a manual determination of the relevancy of each of the selected small pieces of text in the first ranked list to automatically create a second search query;
(g) repeating steps (a) to (e) to form a second ranked list, wherein the second ranked list includes a second group of selected small pieces of text, wherein the second group is missing at least one of the selected small pieces of text in the first group.
2 Assignments
0 Petitions
Accused Products
Abstract
Search system and method for retrieving relevant documents from a text data base collection comprised of patents, medical and legal documents, journals, news stories and the like. Each small piece of text within the documents such as a sentence, phrase and semantic unit in the data base is treated as a document. Natural language queries are used to search for relevant documents from the data base. A first search query creates a selected group of documents. Each word in both the search query and in the documents are given weighted values. Combining the weighted values creates similarity values for each document which are then ranked according to their relevant importance to the search query. A user reading and passing through this ranked list checks off which documents are relevant or not. Then the system automatically causes the original search query to be updated into a second search query which can include the same words, less words or different words than the first search query. Words in the second search query can have the same or different weights compared to the first search query. The system automatically searches the text data base and creates a second group of documents, which as a minimum does not include at least one of the documents found in the first group. The second group can also be comprised of additional documents not found in the first group. The ranking of documents in the second group is different than the first ranking such that the more relevant documents are found closer to the top of the list.
-
Citations
14 Claims
-
1. A method for retrieving relevant text data from a text database collection in a computer without annotating, parsing or pruning the text database collection, comprising the steps of:
-
(a) searching a text database collection in a computer using a first search query of natural language to retrieve a first group of selected small pieces of text, where each of the selected small pieces of text corresponds to a document; (b) weighting each word of the selected small pieces of text with semantics to form document weighted values for each of the selected small pieces of text in the first group; (c) weighting each word in the first search query with semantics to form query weighted values; (d) combining the query weighted values and the document weighted values to form similarity values for each of the selected small pieces of text; (e) ranking the similarity values for each of the selected small pieces of text to form a first ranked list; (f) applying feedback information based on a manual determination of the relevancy of each of the selected small pieces of text in the first ranked list to automatically create a second search query; (g) repeating steps (a) to (e) to form a second ranked list, wherein the second ranked list includes a second group of selected small pieces of text, wherein the second group is missing at least one of the selected small pieces of text in the first group. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for retrieving relevant text data from a text database collection in a computer without annotating, parsing or pruning the text database collection, comprising the steps of:
-
(a) searching a text database collection in a computer using a first non-boolean search query to retrieve a first group of selected small pieces of text, where each of the selected small pieces of text corresponds to a document; (b) weighting each word of the selected small pieces of text with semantics to form document weighted values for each of the selected small pieces of text in the first group; (c) weighting each word in the first search query with semantics to form query weighted values; (d) combining the query weighted values and the document weighted values to form similarity values for each of the selected small pieces of text; (e) ranking the similarity values for each of the selected small pieces of text to form a first ranked list; (f) applying feedback information based on a manual relevancy determination of each of the selected small pieces of text in the first ranked list to automatically create a second non-boolean search query; (g) repeating steps (a) to (e) to form a second ranked list, wherein the second ranked list includes at least one additional document not found in the first ranked list. - View Dependent Claims (12)
-
-
13. A method for retrieving relevant text data from a text database collection in a computer without annotating, parsing or pruning, comprising the steps of:
-
(a) searching a text database collection in a computer using a first search query of natural language to retrieve a first group of selected small pieces of text, where each of the selected small pieces of text corresponds to a document; (b) weighting each word of the selected small pieces of text by semantics to form document weighted values for each of the selected small pieces of text in the first group; (c) weighting each word in the first search query by semantics to form query weighted values; (d) combining the query weighted values and the document weighted values to form similarity values for each of the selected small pieces of text; (e) ranking the similarity values for each of the selected small pieces of text to form a first ranked list; (f) automatically updating the first search query into a second search query based on feedback information from a manual determination on whether documents in the first ranked list are relevant, (g) repeating steps (a) to (e) to form a second ranked list, wherein the second ranked list includes a second group of selected small pieces of text, wherein the second group is missing at least one of the selected small pieces of text found in the first group. - View Dependent Claims (14)
-
Specification