Process and system for determining relevance

US 5,713,016 A
Filed: 09/05/1995
Issued: 01/27/1998
Est. Priority Date: 09/05/1995
Status: Expired due to Term

First Claim

Patent Images

1. A process for determining relevance between two documents implemented using an electronic system, the process comprising:

providing a first feature vector representing a first document;

providing a second feature vector representing a second document;

providing an indexing parameter;

providing a parametric family of sampling distributions for the first feature vector using the indexing parameter;

providing a parametric family of sampling distributions for the second feature vector using the indexing parameter;

providing a prior distribution of the indexing parameter;

assigning a distribution of the indexing parameter, given the second feature vector and an event that the first document is not relevant to the second document, the value of the prior distribution of the indexing parameter;

assigning a distribution of the indexing parameter, given the second feature vector and an event that the first document is relevant to the second document, the value of the posterior distribution of the indexing parameter given the second feature vector;

generating a log likelihood ratio that the first document is relevant to the second document using the two assigned distributions of the indexing parameter; and

storing the log likelihood ratio as representing relevance between the first document and the second document.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A process is provided for determining relevance using an electronic system. The process includes providing a first feature vector, providing a second feature vector, and providing an indexing parameter. A parametric family of sampling distributions are provided for the first feature vector using the indexing parameter. A parametric family of sampling distributions are also provided for the second feature vector using the indexing parameter. The process further includes providing a prior distribution of the indexing parameter. A distribution of the indexing parameter, given the second feature vector and an event that the first feature vector is not relevant to the second feature vector, is assigned the value of the prior distribution of the indexing parameter. A distribution of the indexing parameter, given the second feature vector and an event that the first feature vector is relevant to the second feature vector, is assigned the value of the posterior distribution of the indexing parameter given the second feature vector. A log likelihood ratio that the first feature vector is relevant to the second feature vector is then generated using the two assigned distributions of the indexing parameter. The log likelihood ratio is stored as representing relevance between the first feature vector and the second feature vector.

Citations

22 Claims

1. A process for determining relevance between two documents implemented using an electronic system, the process comprising:
- providing a first feature vector representing a first document;
  
  providing a second feature vector representing a second document;
  
  providing an indexing parameter;
  
  providing a parametric family of sampling distributions for the first feature vector using the indexing parameter;
  
  providing a parametric family of sampling distributions for the second feature vector using the indexing parameter;
  
  providing a prior distribution of the indexing parameter;
  
  assigning a distribution of the indexing parameter, given the second feature vector and an event that the first document is not relevant to the second document, the value of the prior distribution of the indexing parameter;
  
  assigning a distribution of the indexing parameter, given the second feature vector and an event that the first document is relevant to the second document, the value of the posterior distribution of the indexing parameter given the second feature vector;
  
  generating a log likelihood ratio that the first document is relevant to the second document using the two assigned distributions of the indexing parameter; and
  
  storing the log likelihood ratio as representing relevance between the first document and the second document.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The process of claim 1, wherein providing a prior distribution of the indexing parameter comprises generating a posterior distribution of the indexing parameter given a set of training vectors using a hyperparameter.
  - 3. The process of claim 1, wherein providing a first feature vector representing a first document comprises providing a first feature vector representing properties of the first document, the first document representing text information.
  - 4. The process of claim 3, wherein providing a first feature vector representing properties comprises providing a first feature vector representing properties that comprise a frequency of occurrence of selected words with respect to the first document.
  - 5. The process of claim 3, wherein providing a second feature vector representing a second document comprises providing a second feature vector representing properties of the second document, the second document representing text information.
  - 6. The process of claim 5, wherein providing a second feature vector representing properties comprises providing a second feature vector representing properties that comprise a frequency of occurrence of selected words with respect to the second document.
  - 7. The process of claim 1, wherein the process is accomplished using a computer system having a processor and a memory operating under control of program instructions stored in the memory.
  - 8. The process of claim 1, wherein the process is accomplished using an electronic hardware system.

9. A computer system operable to determine relevance between two documents, comprising:
- a memory operable to store program instructions and data;
  
  a first feature vector representing a first document, the first feature vector stored in the memory;
  
  a second feature vector representing a second document, the second feature vector stored in the memory;
  
  an indexing parameter, the indexing parameter stored in the memory;
  
  a parametric family of sampling distributions for the first feature vector using the indexing parameter, the parametric family stored in the memory;
  
  a parametric family of sampling distributions for the second feature vector using the indexing parameter, the parametric family stored in the memory;
  
  a prior distribution for the indexing parameter, the prior distribution stored in memory; and
  
  a processor coupled to the memory and operable to access the program instructions and data, the processor operable to perform a process under control of the program instructions for;
  
  assigning a distribution of the indexing parameter, given the second feature vector and an event that the first document is not relevant to the second document, the value of the prior distribution for the indexing parameter;
  
  assigning a distribution of the indexing parameter, given the second feature vector and an event that the first document is relevant to the second document, the value of a distribution of the indexing parameter given the second document;
  
  determining a log likelihood ratio that the first document is relevant to the second document using the two assigned distributions of the indexing parameter; and
  
  storing the log likelihood ratio in the memory as representing a relevance between the first document and the second document.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The computer system of claim 9, wherein the prior distribution for the indexing parameter comprises a posterior distribution of the indexing parameter given a set of training vectors.
  - 11. The computer system of claim 9, wherein the first feature vector representing the first document comprises a first feature vector representing properties of a first document, the first document representing text information.
  - 12. The computer system of claim 11, wherein the properties of the first document comprise a frequency of occurrence of selected words with respect to the first document.
  - 13. The computer system of claim 11, wherein the second feature vector representing the second document comprises a second feature vector representing properties of a second document, the second document representing text information.
  - 14. The computer system of claim 13, wherein the properties of the second document comprise a frequency of occurrence of selected words with respect to the second document.

15. A relevance generation system operable to determine relevance between two documents, comprising:
- a first feature vector representing a first document;
  
  a second feature vector representing a second document;
  
  an indexing parameter;
  
  a parametric family of sampling distributions for the first feature vector using the indexing parameter;
  
  a parametric family of sampling distributions for the second feature vector using the indexing parameter;
  
  a prior distribution for the indexing parameter; and
  
  a relevance generator operable to access the first feature vector, the second feature vector, the parametric families and the prior distribution, the relevance generator operable to;
  
  assign a distribution of the indexing parameter, given the second feature vector and an event that the first document is not relevant to the second document, the value of the prior distribution for the indexing parameter;
  
  assign a distribution of the indexing parameter, given the second feature vector and an event that the first document is relevant to the second document, the value of a distribution of the indexing parameter given the second document;
  
  generate a log likelihood ratio that the first document is relevant to the second document using the two assigned distributions of the indexing parameter; and
  
  store the log likelihood ratio as representing a relevance between the first document and the second document.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The relevance generation system of claim 15, wherein the prior distribution for the indexing parameter comprises a posterior distribution of the indexing parameter given a set of training vectors.
  - 17. The relevance generation system of claim 15, wherein the first feature vector representing the first document comprises a first feature vector representing properties of a first document, the first document representing text information.
  - 18. The relevance generation system of claim 17, wherein the properties of the first document comprise a frequency of occurrence of selected words with respect to the first document.
  - 19. The relevance generation system of claim 15, wherein the second feature vector representing the second document comprises a second feature vector representing properties of a second document, the second document representing text information.
  - 20. The relevance generation system of claim 19, wherein the properties of the second document comprise a frequency of occurrence of selected words with respect to the second document.

21. A process for determining relevance implemented using an electronic system, the process comprising:
- providing a first feature vector;
  
  providing a second feature vector;
  
  providing an indexing parameter;
  
  providing a parametric family of sampling distributions for the first feature vector using the indexing parameter;
  
  providing a parametric family of sampling distributions for the second feature vector using the indexing parameter;
  
  providing a prior distribution of the indexing parameter;
  
  assigning a distribution of the indexing parameter, given the second feature vector and an event that the first feature vector is not relevant to the second feature vector, the value of the prior distribution of the indexing parameter;
  
  assigning a distribution of the indexing parameter, given the second feature vector and an event that the first feature vector is relevant to the second feature vector, the value of the posterior distribution of the indexing parameter given the second feature vector;
  
  generating a log likelihood ratio that the first feature vector is relevant to the second feature vector using the two assigned distributions of the indexing parameter; and
  
  storing the log likelihood ratio as representing relevance between the first feature vector and the second feature vector.
- View Dependent Claims (22)
- - 22. The process of claim 21, wherein:
    - the first feature vector represents a first document; and
      
      the second feature vector represents a second document;
      
      such that the log likelihood ratio represents the relevance between the first document and the second document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Electronic Data Systems Corporation (Perspecta, Inc.)
Inventors
Hill, Joe R.
Primary Examiner(s)
Black, Thomas G.
Assistant Examiner(s)
Coby, Frantz

Application Number

US08/523,233
Time in Patent Office

875 Days
Field of Search

382/38, 382/15, 364/419.19, 395/605, 395/606
US Class Current

1/1
CPC Class Codes

G06F 16/3346   using probabilistic model

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Process and system for determining relevance

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Process and system for determining relevance

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links