Process for determination of text relevancy

US 5,576,954 A
Filed: 11/05/1993
Issued: 11/19/1996
Est. Priority Date: 11/05/1993
Status: Expired due to Term

First Claim

Patent Images

1. A Computer implemented method for ranking documents being searched in a database by a word query according to text relevancy comprising the steps of:

(a) inputting a word query to a computer database of documents;

(b) selecting each document by the word query;

(c) determining a real value number for each document, comprising the steps of;

(i) calculating a first importance value for each word in the selected document;

(ii) calculating a second importance value for each word in the query that matches a word in the document;

(iii) determining a probability value for each word in the query matching a semantic category;

(iv) determining a probability value for each word in the document matching a semantic category;

(v) adjusting for each word in .the query that does not exist in the database of the document;

(vi) repeating steps (i) to (iv) for each adjusted word;

(vii) calculating weights of a semantic component in the query based on the importance value, the probability value and frequency of the word in the document;

(viii) calculating weights of a semantic component in the document based on the importance value, the probability value and frequency of word in the query;

(ix) multiplying query component weights by document component weights into products; and

(x) adding the products together to represent the real-value number for the selected document; and

(d) repeating step (c) for each additional document selected by the query; and

(e) sorting the documents of the database according to their respective real value numbers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This is a procedure for determining text relevancy and can be used to enhance the retrieval of text documents by search queries. This system helps a user intelligently and rapidly locate information found in large textual databases. A first embodiment determines the common meanings between each word in the query and each word in the document. Then an adjustment is made for words in the query that are not in the documents. Further, weights are calculated for both the semantic components in the query and the semantic components in the documents. These weights are multiplied together, and their products are subsequently added to one another to determine a real value number (similarity coefficient) for each document. Finally, the documents are sorted in sequential order according to their real value number from largest to smallest value. Another, embodiment is for routing documents to topics/headings (sometimes referred to as filtering). Here, the importance of each word in both topics and documents are calculated. Then, the real value number (similarity coefficient) for each document is determined. Then each document is routed one at a time according to their respective real value numbers to one or more topics. Finally, once the documents are located with their topics, the documents can be sorted. This system can be used to search and route all kinds of document collections, such as collections of legal documents, medical documents, news stories, and patents.

Citations

9 Claims

1. A Computer implemented method for ranking documents being searched in a database by a word query according to text relevancy comprising the steps of:
- (a) inputting a word query to a computer database of documents;
  
  (b) selecting each document by the word query;
  
  (c) determining a real value number for each document, comprising the steps of;
  
  (i) calculating a first importance value for each word in the selected document;
  
  (ii) calculating a second importance value for each word in the query that matches a word in the document;
  
  (iii) determining a probability value for each word in the query matching a semantic category;
  
  (iv) determining a probability value for each word in the document matching a semantic category;
  
  (v) adjusting for each word in .the query that does not exist in the database of the document;
  
  (vi) repeating steps (i) to (iv) for each adjusted word;
  
  (vii) calculating weights of a semantic component in the query based on the importance value, the probability value and frequency of the word in the document;
  
  (viii) calculating weights of a semantic component in the document based on the importance value, the probability value and frequency of word in the query;
  
  (ix) multiplying query component weights by document component weights into products; and
  
  (x) adding the products together to represent the real-value number for the selected document; and
  
  (d) repeating step (c) for each additional document selected by the query; and
  
  (e) sorting the documents of the database according to their respective real value numbers.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The computer implemented method for ranking documents of claim 1, wherein the inputting step further includes:
    - imputing a natural language word query.
  - 3. The computer implemented method for ranking documents of claim 1, wherein the calculating the first and the second importance values is based on Log₁₀ (N/df), wherein N=total number of documents, and df=number of documents each word is located within.
  - 4. The computer implemented method for ranking documents of claim 1, wherein the semantic category further includes:
    - correlating a semantic lexicon of approximately 36 semantic categories between the word query and each document.
  - 5. The computer implemented method for ranking documents of claim 1, wherein the size of each document is chosen from at least one of:
    - a word, a sentence, a line, a phrase and a paragraph.

6. A computer implemented method of routing and filtering documents to topics comprising the steps of:
- breaking down each document for routing into small portions of up to approximately 250 words in length;
  
  calculating importance values of each word in both topics and the small portions of the documents;
  
  determining real value numbers for each of the small portions of document to each topic based on the importance values;
  
  calculating the real value number for the selected document based on adding the real value numbers of the small portions of the selected document;
  
  routing each document according to their respective real value numbers to one or more topics; and
  
  sorting the routed documents at each topic.
- View Dependent Claims (7, 8, 9)
- - 7. A computer implemented method of routing and filtering documents to topics of claim 6, wherein the calculating step is based on Log₁₀ (NT/dft), where NT is the total number of topics and dft is the number of topics each word is located within.
  - 8. A computer implemented method of routing and filtering documents to topics of claim 6, wherein the size of each of the small portions are chosen from at least one of:
    - a word, a line, a sentence, and a paragraph.
  - 9. A computer implemented method of routing and filtering documents to topics of claim 6, wherein the determining a real value number step further includes the steps of:
    - (i) calculating a first importance value for each word in the selected portion;
      
      (ii) calculating a second importance value for each word in the query that matches a word in the selected portion;
      
      (iii) determining a probability value for each word in the query matching a semantic category;
      
      (iv) determining a probability value for each word in the selected portion matching a semantic category;
      
      (v) adjusting for each word in the query that does not exist in the selected portion;
      
      (vi) repeating steps (i) to (iv) for each adjusted word;
      
      (vii) calculating weights of a semantic component in the query based on the importance value, the probability value and frequency of the word in the selected portion;
      
      (viii) calculating weights of a semantic component in the selected portion based on the importance value, the probability value and frequency of word in the query;
      
      (ix) multiplying query component weights by selected portion component weights into products; and
      
      (x) adding the products together to represent the real-value number for the selected document; and
      
      repeating steps (i) to (x) for each additional document selected.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
University of Central Florida Research Foundation Inc. (State University System of Florida)
Original Assignee
University of Central Florida (State University System of Florida)
Inventors
Driscoll, Jim
Primary Examiner(s)
Weinhardt, Robert A.
Assistant Examiner(s)
Dixon, Jennifer L.

Application Number

US08/148,688
Time in Patent Office

1,110 Days
Field of Search

364/419.13, 364/419.19, 364/419.1, 364/419.11
US Class Current

1/1
CPC Class Codes

G06F 16/3334   Selection or weighting of t...

G06F 16/3344   using natural language anal...

G06F 16/3346   using probabilistic model

G06F 16/353   into predefined classes

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Process for determination of text relevancy

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Process for determination of text relevancy

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links