Inverse inference engine for high performance web search

US 20030217047A1
Filed: 12/09/2002
Published: 11/20/2003
Est. Priority Date: 03/23/1999
Status: Active Grant

First Claim

Patent Images

1. An information retrieval method comprising the steps of:

generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files;

generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;

receiving a user query from a user, said user query consisting of at least one term;

in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;

generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;

formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a Lagrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;

generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of each said information files, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said information files; and

providing an information response to said user reflecting said document weights.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An information retrieval system that deals with the problems of synonymy, polysemy, and retrieval by concept by allowing for a wide margin of uncertainty in the initial choice of keywords in a query. For each input query vector and an information matrix, the disclosed system solves an optimization problem which maximizes the stability of a solution at a given level of misfit. The disclosed system may include a decomposition of the information matrix in terms of orthogonal basis functions. Each basis encodes groups of conceptually related keywords. The bases are arranged in order of decreasing statistical relevance to a query. The disclosed search engine approximates the input query with a weighted sum of the first few bases. Other commercial applications than the disclosed search engine can also be built on the disclosed techniques.

Citations

22 Claims

1. An information retrieval method comprising the steps of:
- generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files;
  
  generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
  
  receiving a user query from a user, said user query consisting of at least one term;
  
  in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
  
  generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
  
  formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a Lagrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
  
  generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of each said information files, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said information files; and
  
  providing an information response to said user reflecting said document weights.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The information retrieval method of claim 1, further comprising:
    - parsing electronic text contained within said information files, wherein said parsing includes recognizing acronyms.
  - 3. The information retrieval method of claim 2, wherein said parsing further includes recording term positions.
  - 4. The information retrieval method of claim 3, wherein said parsing further includes processing tag information within said information files.
  - 5. The information retrieval method of claim 4, wherein said tag information includes one or more HTML tags.
  - 6. The information retrieval method of claim 5, wherein said tag information includes one or more XML tags.
  - 7. The information retrieval method of claim 6, wherein said parsing further includes extracting word roots.
  - 8. The information retrieval method of claim 7, wherein said parsing further includes generating concept identification numbers.
  - 9. The information retrieval method of claim 1, further comprising:
    - generating an auxiliary data structure, said auxiliary data structure being indexed by said concept identification numbers, and said data structure storing the positions of all terms contained within the information files.
  - 10. The information retrieval method of claim 9, wherein said auxiliary data structure further stores tag information associated with respective ones of said information files, wherein said tag information reflects at least one characteristic of said respective ones of said information files.
  - 11. The information retrieval method of claim 10, wherein said tag information reflects at least one date associated with each respective one of said information files.
  - 12. The information retrieval method of claim 2, wherein said parsing includes counting term occurrences in each information file.
  - 13. The information retrieval method of claim 1, wherein said step of generating said term-document matrix includes generating elements in said matrix reflecting the number of occurrences of each one of said terms in each one of said information files.
  - 14. The information retrieval method of claim 1, further comprising:
    - determining that said user query includes at least one phrase; and
      
      responsive to said determining that said user query includes a phrase, adding a new row to said term-document matrix, each element in said new row containing the number of occurrences of said phrase in the respective one of said information files.
  - 15. The information retrieval method of claim 14, further comprising determining said number of occurrences of said phrase in each said respective one of said information files by the number of occurrences of the individual terms composing said phrase and the proximity of said terms as indicated by the relative positions of said individual terms contained in said auxiliary data structure.
  - 16. The information retrieval method of claim 1, wherein said step of generating said term-document matrix includes generating each element in said term-document matrix as a binary weight denoting the presence or absence of a respective one of said terms.
  - 17. The information retrieval method of claim 1, wherein said step of generating said term-document matrix includes weighting each element in said term-document matrix by a number of occurrence of a respective one of said terms within a respective one of said information files and by distribution of said respective one of said terms across the complete set of said information files.
  - 18. The information retrieval method of claim 1, further comprising sorting said document weights based on a predetermined ordering.
  - 19. The information retrieval method of claim 18, wherein said predetermined ordering is decreasing order.
  - 20. The information retrieval method of claim 1, further comprising automatically building a lexical knowledge base responsive to the solution of said constrained optimization problem, wherein said building includes cross-multiplying said term-document matrix, rather than said term-spread matrix, by said document weights to generate a plurality of term weights, one for each one of said terms.
  - 21. The information retrieval method of claim 20, further comprising sorting said term weights based on a predetermined ordering.
  - 22. The information retrieval method of claim 21, wherein said predetermined ordering is decreasing order.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fiver LLC
Original Assignee
Insightful Corporation (Cloud Software Group)
Inventors
Marchisio, Giovanni B.

Granted Patent

US 7,051,017 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/3
CPC Class Codes

G06F 16/334   Query execution G06F16/335 ...

G06F 16/954   Navigation, e.g. using cate...

G06F 40/216   using statistical methods

G06F 40/268   Morphological analysis

Y10S 707/99931   Database or file accessing

Y10S 707/99933   Query processing, i.e. sear...

Inverse inference engine for high performance web search

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Inverse inference engine for high performance web search

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links