Extended functionality for an inverse inference engine based web search

US 6,757,646 B2
Filed: 09/25/2001
Issued: 06/29/2004
Est. Priority Date: 03/22/2000
Status: Expired due to Term

First Claim

Patent Images

1. An information retrieval method comprising the steps of:

generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least a first version and a second version of at least one reference document within said electronic information files, wherein said first version of said reference document is in a first natural language and said second version of said reference document is a translation of said first version of said reference document into a second natural language, and wherein said term-document matrix further includes a second partition, elements in said second partition representing at least one target document within said electronic information files, wherein said target document is in one of the set of natural languages consisting of said first natural language and said second natural language;

generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;

receiving a user query from a user, said user query consisting of at least one term;

in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;

generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;

formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;

generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of each said target documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said target documents; and

providing an information response to said user reflecting said document weights, wherein at least one of said document weights is positive and at least one of said document weights is negative, wherein said positive document weights represent the relevance of selected ones of said target documents in said first natural language to said user query, and wherein absolute values of said negative document weights represent the relevance of selected ones of said target documents in said second natural language to said user query.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An extension of an inverse inference search engine is disclosed which provides cross language document retrieval, in which the information matrix used as input to the inverse inference engine is organized into rows of blocks corresponding to languages within a predetermined set of natural languages. The information matrix is further organized into two column-wise partitions. The first partition consists of blocks of entries representing fully translated documents, while the second partition is a matrix of blocks of entries representing documents for which translations are not available in all of the predetermined languages. Further in the second partition, entries in blocks outside the main diagonal of blocks are zero. Another disclosed extension to the inverse inference retrieval document retrieval system supports automatic, knowledge based training. This approach applies the idea of using a training set to the problem of searching databases where information that is diluted or not reliable enough to allow the creation of robust semantic links. To address this situation, the disclosed system loads the left-hand partition of the input matrix for the inverse inference engine with information from reliable sources.

462 Citations

12 Claims

1. An information retrieval method comprising the steps of:
- generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least a first version and a second version of at least one reference document within said electronic information files, wherein said first version of said reference document is in a first natural language and said second version of said reference document is a translation of said first version of said reference document into a second natural language, and wherein said term-document matrix further includes a second partition, elements in said second partition representing at least one target document within said electronic information files, wherein said target document is in one of the set of natural languages consisting of said first natural language and said second natural language;
  
  generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
  
  receiving a user query from a user, said user query consisting of at least one term;
  
  in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
  
  generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
  
  formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
  
  generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of each said target documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said target documents; and
  
  providing an information response to said user reflecting said document weights, wherein at least one of said document weights is positive and at least one of said document weights is negative, wherein said positive document weights represent the relevance of selected ones of said target documents in said first natural language to said user query, and wherein absolute values of said negative document weights represent the relevance of selected ones of said target documents in said second natural language to said user query.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 11, 12)
- - 2. The method of claim 1, wherein said providing said information response further comprises organizing display objects representing said target documents associated with said document weights according to the sign of each said of said document weights, whereby said documents in said first natural language are displayed in proximity to each other and documents in said second natural language are displayed in proximity to each other.
  - 3. The method of claim 2, wherein said providing said information response further comprises organizing said display objects representing documents associated with said document weights according to the absolute value of each said of said document weights, such that said display object are displayed in decreasing absolute value of associated document weight.
  - 4. The method of claim 1, wherein said step of generating said term-document matrix includes generating elements in said matrix reflecting the number of occurrences of each one of said terms in each one of said information files.
  - 5. The method of claim 1, wherein rows of said term-document matrix are each associated with a respective term, and wherein a first set of said rows are associated with terms in said first natural language, and a second set of said rows are associated with terms in said second natural language.
  - 6. The method of claim 5, wherein said first partition including entries representing at least a first version, and a second version of said at least one reference document, wherein said first version of said reference document is in said first natural language, and said second version of said reference document is a translation of said first version of said reference document into said second natural language.
  - 7. The method of claim 1, wherein said second version of said reference document is another document that is topically related to said first version of said reference document.
  - 8. The method of claim 1, wherein said term-document matrix is one of a plurality of term document matrices, each of said plurality of term document matrices associated with a translation from a source language to a target foreign language, and wherein said first natural language comprises said source language and said second natural language comprises said target natural language.
  - 11. The method of claim 8, wherein said reference document comprises an encyclopedia.
  - 12. The method of claim 8, wherein said reference document comprises a collection of news reports.

9. An information retrieval method comprising the steps of:
- generating a term-document matrix to represent electronic information files stored in a computer system, each element in said term-document matrix indicating a number of occurrences of a term within a respective one of said electronic information files, wherein said term-document matrix includes a first partition, said first partition including entries representing at least one reference document within said electronic information files, wherein said reference document is predetermined to contain reliable information, and wherein said term-document matrix further includes a second partition, elements in said second partition representing a plurality of search documents within said electronic information files, wherein said search documents are predetermined to contain insufficient information for establishing semantic links;
  
  generating, responsive to said term-document matrix, a term-spread matrix, wherein said term spread matrix is a weighted autocorrelation of said term-document matrix, said term-spread matrix indicating an amount of variation in term usage in the information files and, also, the extent to which terms are correlated;
  
  receiving a user query from a user, said user query consisting of at least one term;
  
  in response to said user query, generating a user query vector, wherein said user query vector has as many elements as the rows of the term-spread matrix;
  
  generating, responsive to said user query vector, an error-covariance matrix, wherein said error-covariance matrix reflects an expected degree of uncertainty in the initial choice of keywords of said user;
  
  formulating, responsive to said term-spread matrix, error-covariance matrix, and user query vector, a constrained optimization problem, wherein the choice of a lambda value equal to a LaGrange multiplier value in said constrained optimization problem determines the extent of a trade-off between a degree of fit and the stability of all solutions to said constrained optimization problem;
  
  generating, responsive to said constrained optimization problem, a solution vector including a plurality of document weights, each one of said plurality of document weights corresponding to one of said plurality of search documents, wherein each of said document weights reflects a degree of correlation between said user query and the corresponding one of said plurality of search documents; and
  
  providing an information response to said user reflecting said document weights.
- View Dependent Claims (10)
- - 10. The method of claim 9, further comprising periodically accumulating information from multiple sources, and adding said information to said search documents.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fiver LLC
Original Assignee
Insightful Corporation (Cloud Software Group)
Inventors
Marchisio, Giovanni B.
Primary Examiner(s)
Breene, John
Assistant Examiner(s)
Lu, Kuen S.

Application Number

US09/962,798
Publication Number

US 20020156763A1
Time in Patent Office

1,008 Days
Field of Search

704/9, 704/8, 704/2, 707/3, 707/4, 707/9, 707/10
US Class Current

704/8
CPC Class Codes

G06F 16/30   of unstructured textual dat...

G06F 16/334   Query execution G06F16/335 ...

G06F 16/954   Navigation, e.g. using cate...

G06F 40/169   Annotation, e.g. comment da...

G06F 40/216   using statistical methods

G06F 40/268   Morphological analysis

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

G06F 40/58   Use of machine translation,...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99939   Privileged access

Y10S 707/99943   Generating database or data...

Extended functionality for an inverse inference engine based web search

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

462 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Extended functionality for an inverse inference engine based web search

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

462 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links