Process for determination of text relevancy

US 5,694,592 A
Filed: 08/28/1995
Issued: 12/02/1997
Est. Priority Date: 11/05/1993
Status: Expired due to Term

First Claim

Patent Images

1. A computer implemented method of creating similarity coefficients between sequences of words in documents that are being searched in a database by a natural word query without parsing the query words nor the words in the documents, and without removing any of the query words and any of the words in the documents, the method comprising the steps of:

(a) branching out the meanings of each and every word in a natural word query into respective probabilities of occurrence for each of the meanings in the natural word query;

(b) branching out the meanings of words in a document searched by the natural word query into respective probabilities of occurrence for each of the meanings of the words in each of the documents;

(c) determining a similarity coefficient between the probabilities of occurrence of words in the natural language query and the probabilities of occurrence of the words in the document;

(d) repeating steps (a) to (c) for each additional document searched by the natural language query; and

(e) ranking the documents being searched in order of their similarity coefficients without parsing of the natural language query and the documents, and without removing any words from the natural language query nor from the documents.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This is a procedure for determining text relevancy and can be used to enhance the retrieval of text documents by search queries. This system helps a user intelligently and rapidly locate information found in large textual databases. A first embodiment determines the common meanings between each word in the query and each word in the document. Then an adjustment is made for words in the query that are not in the documents. Further, weights are calculated for both the semantic components in the query and the semantic components in the documents. These weights are multiplied together, and their products are subsequently added to one another to determine a real value number(similarity coefficient) for each document. Finally, the documents are sorted in sequential order according to their real value number from largest to smallest value. Another, embodiment is for routing documents to topics/headings (sometimes referred to as faltering). Here, the importance of each word in both topics and documents are calculated. Then, the real value number (similarity coefficient) for each document is determined. Then each document is routed one at a time according to their respective real value numbers to one or more topics. Finally, once the documents are located with their topics, the documents can be sorted. This system can be used to search and route all kinds of document collections, such as collections of legal documents, medical documents, news stories, and patents.

Citations

9 Claims

1. A computer implemented method of creating similarity coefficients between sequences of words in documents that are being searched in a database by a natural word query without parsing the query words nor the words in the documents, and without removing any of the query words and any of the words in the documents, the method comprising the steps of:
- (a) branching out the meanings of each and every word in a natural word query into respective probabilities of occurrence for each of the meanings in the natural word query;
  
  (b) branching out the meanings of words in a document searched by the natural word query into respective probabilities of occurrence for each of the meanings of the words in each of the documents;
  
  (c) determining a similarity coefficient between the probabilities of occurrence of words in the natural language query and the probabilities of occurrence of the words in the document;
  
  (d) repeating steps (a) to (c) for each additional document searched by the natural language query; and
  
  (e) ranking the documents being searched in order of their similarity coefficients without parsing of the natural language query and the documents, and without removing any words from the natural language query nor from the documents.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer implemented method of creating similarity coefficients between sequences of words of claim 1, wherein the meanings of a word further include definitions of words.
  - 3. A computer implemented method of creating similarity coefficients between sequences of words of claim 1, wherein the meanings of a word further include senses.
  - 4. A computer implemented method of creating similarity coefficients between sequences of words of claim 1, wherein the meanings of a word further include categories.
  - 5. The computer implemented method of creating similarity coefficients between sequences of words of claim 4, wherein the categories further includes:
    - a semantic lexicon of categories.
  - 6. The computer implemented method of creating similarity coefficients between sequences of words of claim 5, wherein step(a) further includes:
    - determining a probability value for each word in the query matching the semantic categories; and
      
      wherein step(b) further includes;
      
      determining a probability value for each word in the document matching the semantic categories.
  - 7. The computer implemented method of creating similarity coefficients between sequences of word of claim 6, wherein step(c) of determining similarity coefficients further includes:
    - (i) calculating weights of a semantic component in the query based on the probability values of the words in the query;
      
      (ii) calculating weights of a semantic component in the document based on the probability values of the words in the document;
      
      (iii) multiplying query component weights by document component weights into products; and
      
      (iv) adding the products together to represent the similarity coefficient as a real-value number for the document.
  - 8. The computer implemented method of creating similarity coefficients between sequences of word of claim 1, wherein each document is chosen from at least one of:
    - a word, a sentence, a line, a phrase and a paragraph.

9. A computer implemented method of creating similarity coefficients between sequences of words in documents that are being searched in a database by a natural word query without parsing the query words nor the words in the documents, and without removing any of the query words and any of the words in the documents, the method comprising the steps of:
- (a) branching out the meanings of each and every word in a natural word query into respective probabilities of occurrence for each of the meanings in the natural word query, wherein the query includes at least one word;
  
  (b) branching out the meanings of each and every word in a document searched by the natural word query into respective probabilities of occurrence for each of the meanings of the words in each of the documents, wherein the document includes at least one word;
  
  (c) determining a similarity coefficient between the probabilities of occurrence of words in the natural language query and the probabilities of occurrence of the words in the document;
  
  (d) repeating steps (a) to (c) for each additional document searched by the natural language query; and
  
  (e) ranking all the documents being searched in order of their similarity coefficients without parsing of the natural language query and the documents, and without removing any words from the natural language query nor from the documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
University of Central Florida Research Foundation Inc. (State University System of Florida)
Original Assignee
University of Central Florida (State University System of Florida)
Inventors
Driscoll, Jim
Primary Examiner(s)
Black, Thomas G.
Assistant Examiner(s)
Lewis, C.

Application Number

US08/520,027
Time in Patent Office

827 Days
Field of Search

395/600, 395/604, 395/603, 395/605, 395/759, 395/793, 364/300
US Class Current

1/1
CPC Class Codes

G06F 16/3334   Selection or weighting of t...

G06F 16/3344   using natural language anal...

G06F 16/3346   using probabilistic model

G06F 16/353   into predefined classes

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Process for determination of text relevancy

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Process for determination of text relevancy

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links