Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text

US 5,642,502 A
Filed: 12/06/1994
Issued: 06/24/1997
Est. Priority Date: 12/06/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method for retrieving relevant text data from a text database collection in a computer without annotating, parsing or pruning the text database collection, comprising the steps of:

(a) searching a text database collection in a computer using a first search query of natural language to retrieve a first group of selected small pieces of text, where each of the selected small pieces of text corresponds to a document;

(b) weighting each word of the selected small pieces of text with semantics to form document weighted values for each of the selected small pieces of text in the first group;

(c) weighting each word in the first search query with semantics to form query weighted values;

(d) combining the query weighted values and the document weighted values to form similarity values for each of the selected small pieces of text;

(e) ranking the similarity values for each of the selected small pieces of text to form a first ranked list;

(f) applying feedback information based on a manual determination of the relevancy of each of the selected small pieces of text in the first ranked list to automatically create a second search query;

(g) repeating steps (a) to (e) to form a second ranked list, wherein the second ranked list includes a second group of selected small pieces of text, wherein the second group is missing at least one of the selected small pieces of text in the first group.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Search system and method for retrieving relevant documents from a text data base collection comprised of patents, medical and legal documents, journals, news stories and the like. Each small piece of text within the documents such as a sentence, phrase and semantic unit in the data base is treated as a document. Natural language queries are used to search for relevant documents from the data base. A first search query creates a selected group of documents. Each word in both the search query and in the documents are given weighted values. Combining the weighted values creates similarity values for each document which are then ranked according to their relevant importance to the search query. A user reading and passing through this ranked list checks off which documents are relevant or not. Then the system automatically causes the original search query to be updated into a second search query which can include the same words, less words or different words than the first search query. Words in the second search query can have the same or different weights compared to the first search query. The system automatically searches the text data base and creates a second group of documents, which as a minimum does not include at least one of the documents found in the first group. The second group can also be comprised of additional documents not found in the first group. The ranking of documents in the second group is different than the first ranking such that the more relevant documents are found closer to the top of the list.

287 Citations

14 Claims

1. A method for retrieving relevant text data from a text database collection in a computer without annotating, parsing or pruning the text database collection, comprising the steps of:
- (a) searching a text database collection in a computer using a first search query of natural language to retrieve a first group of selected small pieces of text, where each of the selected small pieces of text corresponds to a document;
  
  (b) weighting each word of the selected small pieces of text with semantics to form document weighted values for each of the selected small pieces of text in the first group;
  
  (c) weighting each word in the first search query with semantics to form query weighted values;
  
  (d) combining the query weighted values and the document weighted values to form similarity values for each of the selected small pieces of text;
  
  (e) ranking the similarity values for each of the selected small pieces of text to form a first ranked list;
  
  (f) applying feedback information based on a manual determination of the relevancy of each of the selected small pieces of text in the first ranked list to automatically create a second search query;
  
  (g) repeating steps (a) to (e) to form a second ranked list, wherein the second ranked list includes a second group of selected small pieces of text, wherein the second group is missing at least one of the selected small pieces of text in the first group.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method for retrieving relevant text data of claim 1, wherein each of the small pieces of text includes at least one of:
    - a sentence, a phrase, and a semantic unit.
  - 3. The method for retrieving relevant text data of claim 1, wherein the steps of weighting includes:
    - statistical weighting.
  - 4. The method for retrieving relevant text data of claim 1, wherein the second search query includes:
    - at least one less word from the first search query.
  - 5. The method for retrieving relevant text data of claim 1, wherein the second search query includes:
    - at least one additional word to the first search query.
  - 6. The method for retrieving relevant text data of claim 1, further including:
    - at least one identical word in the second search query has a weighted value different to its weighted value in the first search query.
  - 7. The method for retrieving relevant text data of claim 1, further including:
    - at least one document in the second group has a different similarity value to the same document in the first group.
  - 8. The method for retrieving relevant text data of claim 1, wherein the second group includes:
    - at least one less document that had been listed in the first group.
  - 9. The method for retrieving relevant text data of claim 1, wherein the second group includes:
    - at least one additional document that was not found in the first group.
  - 10. The method for retrieving relevant text data of claim 1, wherein the second ranked list includes:
    - a different raked order of documents than the first ranked list.

11. A method for retrieving relevant text data from a text database collection in a computer without annotating, parsing or pruning the text database collection, comprising the steps of:
- (a) searching a text database collection in a computer using a first non-boolean search query to retrieve a first group of selected small pieces of text, where each of the selected small pieces of text corresponds to a document;
  
  (b) weighting each word of the selected small pieces of text with semantics to form document weighted values for each of the selected small pieces of text in the first group;
  
  (c) weighting each word in the first search query with semantics to form query weighted values;
  
  (d) combining the query weighted values and the document weighted values to form similarity values for each of the selected small pieces of text;
  
  (e) ranking the similarity values for each of the selected small pieces of text to form a first ranked list;
  
  (f) applying feedback information based on a manual relevancy determination of each of the selected small pieces of text in the first ranked list to automatically create a second non-boolean search query;
  
  (g) repeating steps (a) to (e) to form a second ranked list, wherein the second ranked list includes at least one additional document not found in the first ranked list.
- View Dependent Claims (12)
- - 12. The method for retrieving relevant text data of claim 11, wherein each of the small pieces of text includes at least one of:
    - a sentence, a phrase, and a semantic unit.

13. A method for retrieving relevant text data from a text database collection in a computer without annotating, parsing or pruning, comprising the steps of:
- (a) searching a text database collection in a computer using a first search query of natural language to retrieve a first group of selected small pieces of text, where each of the selected small pieces of text corresponds to a document;
  
  (b) weighting each word of the selected small pieces of text by semantics to form document weighted values for each of the selected small pieces of text in the first group;
  
  (c) weighting each word in the first search query by semantics to form query weighted values;
  
  (d) combining the query weighted values and the document weighted values to form similarity values for each of the selected small pieces of text;
  
  (e) ranking the similarity values for each of the selected small pieces of text to form a first ranked list;
  
  (f) automatically updating the first search query into a second search query based on feedback information from a manual determination on whether documents in the first ranked list are relevant,(g) repeating steps (a) to (e) to form a second ranked list, wherein the second ranked list includes a second group of selected small pieces of text, wherein the second group is missing at least one of the selected small pieces of text found in the first group.
- View Dependent Claims (14)
- - 14. The method for retrieving relevant text data of claim 13, wherein each of the small pieces of text includes at least one of:
    - a sentence, a phrase, and a semantic unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
University of Central Florida Research Foundation Inc. (State University System of Florida)
Original Assignee
University of Central Florida (State University System of Florida)
Inventors
Driscoll, James R.
Primary Examiner(s)
Black, Thomas G.
Assistant Examiner(s)
HOMERE, JEAN RAYMOND

Application Number

US08/350,334
Time in Patent Office

931 Days
Field of Search

395/600, 395/605, 395/616, 395/606, 395/761, 364/419.05, 364/DIG. 1
US Class Current

1/1
CPC Class Codes

G06F 16/3322   using system suggestions G0...

G06F 16/3329   Natural language query form...

G06F 16/3334   Selection or weighting of t...

G06F 16/3344   using natural language anal...

G06F 16/3346   using probabilistic model

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Y10S 707/99939   Privileged access

Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

287 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

287 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links