Inverse inference engine for high performance web search
First Claim
1. An information retrieval system comprising:
- an information file processing component that is structured to generate a term-document matrix to represent electronic information files stored in a computer system, each element in the term-document matrix indicating a number of occurrences of a term within a respective one of the electronic information files; and
generate a term-spread matrix to produce a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and the extent to which terms are correlated;
a query mechanism that is structured to receive a query consisting of at least one term and generate a query vector based upon use received query, wherein the query vector has as many elements as the rows of the term-spread matrix; and
an optimization engine that is structured to formulate, based upon the term-spread matrix and query vector, a constrained optimization problem description, wherein the choice of a stabilization parameter in the problem description determines the extent of a trade-off between a degree of fit and the stability of all solutions to the constrained optimization problem description;
generate a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each one of the plurality of document weights corresponding to one of each the information files, wherein each of the document weights reflects a degree of correlation between the query and the corresponding one of the information files; and
providing an information response reflecting the document weights.
7 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system that deals with the problems of synonymy, polysemy, and retrieval by concept by allowing for a wide margin of uncertainty in the initial choice of keywords in a query. For each input query vector and an information matrix, the disclosed system solves an optimization problem which maximizes the stability of a solution at a given level of misfit. The disclosed system may include a decomposition of the information matrix in terms of orthogonal basis functions. Each basis encodes groups of conceptually related keywords. The bases are arranged in order of decreasing statistical relevance to a query. The disclosed search engine approximates the input query with a weighted sum of the first few bases. Other commercial applications than the disclosed search engine can also be built on the disclosed techniques.
-
Citations
67 Claims
-
1. An information retrieval system comprising:
-
an information file processing component that is structured to generate a term-document matrix to represent electronic information files stored in a computer system, each element in the term-document matrix indicating a number of occurrences of a term within a respective one of the electronic information files; and
generate a term-spread matrix to produce a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and the extent to which terms are correlated;
a query mechanism that is structured to receive a query consisting of at least one term and generate a query vector based upon use received query, wherein the query vector has as many elements as the rows of the term-spread matrix; and
an optimization engine that is structured to formulate, based upon the term-spread matrix and query vector, a constrained optimization problem description, wherein the choice of a stabilization parameter in the problem description determines the extent of a trade-off between a degree of fit and the stability of all solutions to the constrained optimization problem description;
generate a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each one of the plurality of document weights corresponding to one of each the information files, wherein each of the document weights reflects a degree of correlation between the query and the corresponding one of the information files; and
providing an information response reflecting the document weights. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable memory medium containing instructions for controlling a computer processor to retrieve information by:
-
generating a term-document matrix to represent electronic information files stored in a computer system, each element in the term-document matrix indicating a number of occurrences of a term within a respective one of the electronic information files;
generating a term-spread matrix, wherein the term-spread matrix is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and the extent in which terms are correlated;
receiving a query consisting of at least one term;
in response to receiving the query, generating a query vector, wherein the query vector has as many elements as the rows of the term-spread matrix;
formulating, based upon the term-spread matrix and query vector, a constrained optimization problem description, wherein the choice of a stabilization parameter in the problem description determines the extent a trade-off between a degree of fit and the stability of all solutions to the constrained optimization problem description;
generating a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each one of the plurality of document weights corresponding to one of each the information flies, wherein each of the document weights reflects a degree of correlation between the query and the corresponding one of the information files; and
providing an information response reflecting the document weights. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. An information retrieval method comprising of:
-
generating a term-document matrix to represent electronic information files stored in a computer system, each element in the term-document matrix indicating a number of occurrences of a term within a respective one of the electronic information files;
generating a term-spread matrix, wherein the term-spread matrix is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and the extent to which terms are correlated;
receiving a user-query consisting of at least one term;
in response to receiving the query, generating a query vector, wherein the query vector has as many elements as the rows of the term-spread matrix;
formulating, based upon the term-spread matrix and query vector, a constrained optimization problem description wherein the choice of a stabilization parameter in the problem description determines the extent of a trade-off between a degree of fit and the stability of all solutions to the constrained optimization problem description;
generating a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each one of the plurality of document weights corresponding to one of the information files, wherein each of the document weights reflects a degree of correlation between the query and the corresponding one of the information files; and
providing an information response reflecting the document weights. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67)
-
Specification