Extended functionality for an inverse inference engine based web search
5 Assignments
0 Petitions
Accused Products
Abstract
An extension of an inverse inference search engine is disclosed which provides cross language document retrieval, in which the information matrix used as input to the inverse inference engine is organized into rows of blocks corresponding to languages within a predetermined set of natural languages. The information matrix is further organized into two column-wise partitions. The first partition consists of blocks of entries representing fully translated documents, while the second partition is a matrix of blocks of entries representing documents for which translations are not available in all of the predetermined languages. Further in the second partition, entries in blocks outside the main diagonal of blocks are zero. Another disclosed extension to the inverse inference retrieval document retrieval system supports automatic, knowledge based training. This approach applies the idea of using a training set to the problem of searching databases where information that is diluted or not reliable enough to allow the creation of robust semantic links. To address this situation, the disclosed system loads the left-hand partition of the input matrix for the inverse inference engine with information from reliable sources.
-
Citations
60 Claims
-
1-12. -12. (Canceled)
-
13. A multi-language information retrieval method for retrieving information from a plurality of target documents using at least one reference document, the target documents and at least one reference document stored as electronic information files in a computer system, comprising:
-
generating a term-document matrix to represent the electronic information files, each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the electronic information files, the term-document matrix including a first partition of entries that represent a first version of the at least one reference document comprising content in a first natural language and a second Yersion of the at least one reference document comprising content in a second natural language such that the first and second versions of the reference document can be used to semantically link documents between the first and second natural languages, the term-document matrix including a second partition of entries that represent the target documents, the target documents comprising content in the first natural language or the second natural language;
generating a term-spread matrix that is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and the extent to which terms are correlated;
receiving a query consisting of at least one term;
in response to receiving the query, generating a query vector having as many elements as rows of the generated term-spread matrix;
formulating, based upon the generated term-spread matrix and query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the target documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
determining a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each weight corresponding to one of the target documents and reflecting a degree of correlation between the query and the corresponding target document; and
providing a response to the received query that reflects the document weights. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer-readable memory medium containing instructions that control a computer processor to retrieve information from a plurality of target documents using at least one reference document, the target documents and at least one reference document stored as electronic information files in a computer system, by:
-
generating a term-document matrix to represent the electronic information files, each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the electronic information files, the term-document matrix including a first partition of entries that represent a first version of the at least one reference document comprising content in a first natural language and a second version of the at least one reference document comprising content in a second natural language such that the first and second versions of the reference document can be used to semantically link documents between the first and second natural languages, the term-document matrix including a second partition of entries that represent the target documents, the target documents comprising content in the first natural language or the second natural language;
generating a term-spread matrix that is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and the extent to which terms are correlated;
receiving a query consisting of at least one term;
in response to receiving the query, generating a query vector having as many elements as rows of the generated term-spread matrix;
formulating, based upon the generated term-spread matrix and query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the target documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
determining a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each weight corresponding to one of the target documents and reflecting a degree of correlation between the query and the corresponding target document; and
providing a response to the received query that reflects the document weights. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. An information retrieval system having a plurality of target documents and at least one reference document stored as electronic information files, comprising:
-
an information file processing component that is structured to generate a term-document matrix to represent the electronic information files, each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the electronic information files, the term-document matrix including a first partition of entries that represent a first version of the at least one reference document comprising content in a first natural language and a second version of the at least one reference document comprising content in a second natural language such that the first and second versions of the reference document can be used to semantically link documents between the first and second natural languages, the term-document matrix including a second partition of entries that represent the target documents, the target documents comprising content in the first natural language or the second natural language; and
generate a term-spread matrix that is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and the extent to which terms are correlated;
a query mechanism that is structured to receive a query of at least one term and to generate a query vector having as many elements as the rows of the generated term-spread matrix; and
an inverse inference engine that is structured to formulate, based upon the generated term-spread matrix and the query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the target documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
determine a solution vector to the constrained optimization problem description, the solution vector including a plurality of document weights, each weight corresponding to one of the target documents and reflecting a degree of correlation between the query and the corresponding target document; and
provide a response to the received query that reflects the document weights. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
-
49. A computer-implemented method for retrieving information from a plurality of search documents using at least one reference document, the search documents and the at least one reference document stored as electronic information files in a computer system, comprising:
-
generating a term-document matrix to represent the files, each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the files, the term-document matrix including a first partition of entries that represent the at least one reference document, wherein the reference document is predetermined to contain reliable information, the term-document matrix further including a second partition of entries that represent the plurality of search documents, wherein the search documents contain potentially insufficient information for establishing semantic links between terms of the search documents;
generating a term-spread matrix that is a weighted autocorrelation of the term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the files and the extent to which terms are correlated;
receiving a query of at least one term;
in response to receiving the query, generating a query vector having as many elements as the rows of the generated term-spread matrix;
formulating, based upon the generated term-spread matrix and query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the search documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
determining a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each weight corresponding to one of the search documents and reflecting a degree of correlation between the query and the corresponding search document; and
providing a response to the received query that reflects the document weights. - View Dependent Claims (50, 51, 52)
-
-
53. A computer-readable memory medium containing instructions for controlling a computer processor to retrieve information from a plurality of search documents using at least one reference document, the search documents and the at least one reference document stored as electronic information files in a computer system, by:
-
generating a term-document matrix to represent the files, each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the files, the term-document matrix including a first partition of entries that represent the at least one reference document, wherein the reference document is predetermined to contain reliable information, the term-document matrix further including a second partition of entries that represent the plurality of search documents, wherein the search documents contain potentially insufficient information for establishing semantic links between terms of the search documents;
generating a term-spread matrix that is a weighted autocorrelation of the term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the files and the extent to which terms are correlated;
receiving a query of at least one term;
in response to receiving the query, generating a query vector having as many elements as the rows of the generated term-spread matrix;
formulating, based upon the generated term-spread matrix and query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the search documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
determining a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each weight corresponding to one of the search documents and reflecting a degree of correlation between the query and the corresponding search document; and
providing a response to the received query that reflects the document weights. - View Dependent Claims (54, 55, 56)
-
-
57. An information retrieval system having a plurality of search documents and at least one reference document stored as electronic information files, comprising:
-
an information file processing component that is structured to generate a term-document matrix to represent the files, each element in the term-document matrix indicating a number of occurrences of a term within a respective one of the files, the term-document matrix including a first partition of entries that represent the at least one reference document, wherein the reference document is predetermined to contain reliable information, the term-document matrix further including a second partition of entries that represent the plurality of search documents, wherein the search documents contain potentially insufficient information for establishing semantic links between terms of the search documents; and
generate a term-spread matrix that is a weighted autocorrelation of the term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and the extent to which terms are correlated;
a query mechanism that is structured to receive a query of at least one term and to generate a query vector having as many elements as the rows of the generated term-spread matrix; and
an inverse inference engine that is structured to formulate, based upon the generated term-spread matrix and query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the search documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
determine a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each weight corresponding to one of the search documents and reflecting a degree of correlation between the query and the corresponding search document; and
provide a response to the received query that reflects the document weights. - View Dependent Claims (58, 59, 60)
-
Specification