Extended functionality for an inverse inference engine based web search

US 7,269,598 B2
Filed: 05/26/2004
Issued: 09/11/2007
Est. Priority Date: 03/22/2000
Status: Expired due to Term

First Claim

Patent Images

1. A multi-language information retrieval method for retrieving information from a plurality of target documents using at least one reference document, the target documents and at least one reference document stored as electronic information files in a computer system, comprising:

generating a term-document matrix to represent the electronic information files,each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the electronic information files,the term-document matrix including a first partition of entries that represent a first version of the at least one reference document comprising content in a first natural language and a second version of the at least one reference document comprising content in a second natural language such that the first and second versions of the reference document can be used to semantically link documents between the first and second natural languages,the term-document matrix including a second partition of entries that represent the target documents,the target documents comprising content in the first natural language or the second natural language;

generating a term-spread matrix that is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and an extent to which terms are correlated;

receiving a query consisting of at least one term;

in response to receiving the query, generating a query vector having as many elements as rows of the generated term-spread matrix;

formulating, based upon the generated term-spread matrix and query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the target documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;

determining a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each weight corresponding to one of the target documents and reflecting a degree of correlation between the query and the corresponding target document; and

providing a response to the received query that reflects the document weights.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An extension of an inverse inference search engine is disclosed which provides cross language document retrieval, in which the information matrix used as input to the inverse inference engine is organized into rows of blocks corresponding to languages within a predetermined set of natural languages. The information matrix is further organized into two column-wise partitions. The first partition consists of blocks of entries representing fully translated documents, while the second partition is a matrix of blocks of entries representing documents for which translations are not available in all of the predetermined languages. Further in the second partition, entries in blocks outside the main diagonal of blocks are zero. Another disclosed extension to the inverse inference retrieval document retrieval system supports automatic, knowledge based training. This approach applies the idea of using a training set to the problem of searching databases where information that is diluted or not reliable enough to allow the creation of robust semantic links. To address this situation, the disclosed system loads the left-hand partition of the input matrix for the inverse inference engine with information from reliable sources.

Citations

36 Claims

1. A multi-language information retrieval method for retrieving information from a plurality of target documents using at least one reference document, the target documents and at least one reference document stored as electronic information files in a computer system, comprising:
- generating a term-document matrix to represent the electronic information files,each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the electronic information files,the term-document matrix including a first partition of entries that represent a first version of the at least one reference document comprising content in a first natural language and a second version of the at least one reference document comprising content in a second natural language such that the first and second versions of the reference document can be used to semantically link documents between the first and second natural languages,the term-document matrix including a second partition of entries that represent the target documents,the target documents comprising content in the first natural language or the second natural language;
  
  generating a term-spread matrix that is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and an extent to which terms are correlated;
  
  receiving a query consisting of at least one term;
  
  in response to receiving the query, generating a query vector having as many elements as rows of the generated term-spread matrix;
  
  formulating, based upon the generated term-spread matrix and query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the target documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
  
  determining a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each weight corresponding to one of the target documents and reflecting a degree of correlation between the query and the corresponding target document; and
  
  providing a response to the received query that reflects the document weights.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein at least one of the document weights in the determined solution vector is positive and at least one of the document weights in the determined solution vector is negative, wherein the positive document weights represent the relevance of the corresponding target documents in the first natural language to the query, and wherein absolute values of the negative document weights represent the relevance of the corresponding target documents in the second natural language to the query.
  - 3. The method of claim 1, the providing the response further comprising:
    - organizing, according to the sign of each document weight, display objects that represent the target documents that correspond to the document weights, thereby displaying the objects that represent documents comprising content in the first natural language in proximity to each other and displaying the objects that represent documents comprising content in the second natural language in proximity to each other.
  - 4. The method of claim 3, the providing the response further comprising:
    - organizing the display objects according to the absoLute value of each document weight, such that the display objects are displayed in decreasing absolute value of the corresponding document weights.
  - 5. The method of claim 1 wherein each row of the term-document matrix is associated with a respective term, and wherein a first set of the rows are associated with terms in the first natural language and a second set of the rows are associated with terms in the second natural language.
  - 6. The method of claim 1 wherein the second version of the reference document comprises terms that are a translation into the second natural language of terms of the first version of the reference document.
  - 7. The method of claim 1 wherein the second version of the reference document is topically related to the first version of the reference document.
  - 8. The method of claim 7 wherein the second version of the reference document is a translation into the second natural language of the first version of the reference document comprising content in the first natural language.
  - 9. The method of claim 1 wherein the first version and the second version of the reference document are used to find semantic links from terms in the first natural language to terms in the second natural language.
  - 10. The method of claim 1, wherein the term-document matrix is one of a plurality of term-document matrices, each term-document matrix having a first partition similar to the first partition of the term-document matrix and having entries that represent content in a first natural language and content in a second natural language, each term-document matrix associated with a translation from a source language to a different target foreign language, wherein, in each term-document matrix, the first natural language comprises the source language and the second natural language comprises the target foreign natural language.
  - 11. The method of claim 1, the first partition further comprising entries that represent a third version of the at least one reference document comprising content in a third natural language, such that the first, second, and third versions of the at Least one reference document can be used to semantically line documents between the first, second, and third natural languages.
  - 12. The method of claim 11 wherein the first and second versions of the at least one reference document are used to translate terms between the first and second natural language and the first and third versions of the at least one reference document are used to translate terms between the first and third natural language.

13. A computer-readable memory medium containing instructions that control a computer processor to retrieve information from a plurality of target documents using at least one reference document, the target documents and at least one reference document stored as electronic information files in a computer system, by:
- generating a term-document matrix to represent the electronic information files,each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the electronic information files,the term-document matrix including a first partition of entries that represent a first version of the at least one reference document comprising content in a first natural language and a second version of the at least one reference document comprising content in a second natural language such that the first and second versions of the reference document can be used to semantically link documents between the first and second natural languages,the term-document matrix including a second partition of entries that represent the target documents,the target documents comprising content in the first natural language or the second natural language;
  
  generating a term-spread matrix that is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and an extent to which terms are correlated;
  
  receiving a query consisting of at least one term;
  
  in response to receiving the query, generating a query vector having as many elements as rows of the generated term-spread matrix;
  
  formulating, based upon the generated term-spread matrix and query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the target documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
  
  determining a solution vector to the constrained optimization problem description, the vector including a plurality of document weights, each weight corresponding to one of the target documents and reflecting a degree of correlation between the query and the corresponding target document; and
  
  providing a response to the received query that reflects the document weights.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The memory medium of claim 13 wherein at least one of the document weights in the determined solution vector is positive and at least one of the document weights in the determined solution vector is negative, wherein the positive document weights represent the relevance of the corresponding target documents in the first natural language to the query, and wherein absolute values of the negative document weights represent the relevance of the corresponding target documents in the second natural language to the query.
  - 15. The memory medium of claim 13, the response further comprising:
    - organizing, according to the sign of each document weight, display objects that represent the target documents that correspond to the document weights, thereby displaying the objects that represent documents comprising content in the first naturaL language in proximity to each other and displaying the objects that represent documents comprising content in the second natural language in proximity to each other.
  - 16. The memory medium of claim 15, the response further comprising:
    - organizing the display objects according to the absolute value of each document weight, such that the display objects are displayed in decreasing absolute value of the corresponding document weights.
  - 17. The memory medium of claim 13 wherein each row of the term-document matrix is associated with a respective term, and wherein a first set of the rows are associated with terms in the first natural language and a second set of the rows are associated with terms in the second natural language.
  - 18. The memory medium of claim 13 wherein the second version of the reference document comprises terms that are a translation into the second natural language of terms of the first version of the reference document.
  - 19. The memory medium of claim 13 wherein the second version of the reference document is topically related to the first version of the reference document.
  - 20. The memory medium of claim 19 wherein the second version of the reference document is a translation into the second natural language of the first version of the reference document comprising content in the first natural language.
  - 21. The memory medium of claim 13 wherein the first version and the second version of the reference document are used to find semantic links from terms in the first natural language to terms in the second natural language.
  - 22. The memory medium of claim 13 wherein the term-document matrix is one of a plurality of term-document matrices, each term-document matrix having a first partition similar to the first partition of the term-document matrix and having entries that represent content in a first natural language and content in a second natural language, each term-document matrix associated with a translation from a source language to a different target foreign language, wherein, in each term-document matrix, the first natural language comprises the source language and the second naturaL language comprises the target foreign natural language.
  - 23. The memory medium of claim 13, the first partition further comprising entries that represent a third version of the at least one reference document comprising content in a third natural language, such that the first, second, and third versions of the at least one reference document can be used to semantically line documents between the first, second, and third natural languages.
  - 24. The memory medium of claim 23 wherein the first and second versions of the at least one reference document are used to translate terms between the first and second natural language and the first and third versions of the at least one reference document are used to translate terms between the first and third natural language.

25. An information retrieval system having a plurality of target documents and at least one reference document stored as electronic information files, comprising:
- a memory;
  
  an information file processing component stored on the memory that is configured to, when executedgenerate a term-document matrix to represent the electronic information flies,each element in the term-document matrix indicating a measure of a number of occurrences of a term within a respective one of the electronic information files,the term-document matrix including a first partition of entries that represent a first version of the at least one reference document comprising content in a first natural language and a second version of the at least one reference document comprising content in a second natural language such that the first and second versions of the reference document can be used to semantically link documents between the first and second natural languages,the term-document matrix including a second partition of entries that represent the target documents,the target documents comprising content in the first natural language or the second natural language; and
  
  generate a term-spread matrix that is a weighted autocorrelation of the generated term-document matrix, the term-spread matrix indicating an amount of variation in term usage in the information files and an extent to which terms are correlated;
  
  a query mechanism stored on the memory that is configured to, when executed, receive a query of at least one term and to generate a query vector having as many elements as the rows of the generated term-spread matrix; and
  
  an inverse inference engine stored on the memory that is configured to, when executedformulate, based upon the generated term-spread matrix and the query vector, a constrained optimization problem description for determining a degree of correlation between the query vector and the target documents, wherein the choice of a stabilization parameter determines the extent of a trade-off between a degree of fit and stability of all solutions to the constrained optimization problem description;
  
  determine a solution vector to the constrained optimization problem description, the solution vector including a plurality of document weights, each weight corresponding to one of the target documents and reflecting a degree of correlation between the query and the corresponding target document; and
  
  provide a response to the received query that reflects the document weights.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 26. The information retrieval system of claim 25 wherein at least one of the document weights in the determined solution vector is positive and at least one of the document weights in the determined solution vector is negative, wherein the positive document weights represent the relevance of the corresponding target documents in the first natural language to the query, and wherein absolute values of the negative document weights represent the relevance of the corresponding target documents in the second natural language to the query.
  - 27. The information retrieval system of claim 25, the response further comprising:
    - display objects that each represent a target documents that correspond to one of the document weights and are organized according to the sign of each document weight, thereby causing the objects that represent documents comprising content in the first natural language to be displayed in proximity to each other and the objects that represent documents comprising content in the second natural language to be displayed in proximity to each other.
  - 28. The information retrieval system of claim 26, the objects further structured to be organized according to the absolute value of each document weight, thereby causing the objects to be displayed in decreasing absolute value of the corresponding document weights.
  - 29. The information retrieval system of claim 25 wherein each row of the term-document matrix is associated with a respective term, and wherein a first set of the rows are associated with terms in the first natural language and a second set of the rows are associated with terms in the second natural language.
  - 30. The information retrieval system of claim 25 wherein the second version of the reference document comprises terms that are a translation into the second natural language of terms of the first version of the reference document.
  - 31. The information retrieval system of claim 25 wherein the second version of the reference document is topically related to the first version of the reference document.
  - 32. The information retrieval system of claim 31 wherein the second version of the reference document is a translation into the second natural language of the first version of the reference document comprising content in the first natural language.
  - 33. The information retrieval system of claim 25 wherein the first version and the second version of the reference document are used to find semantic links from terms in the first natural language to terms in the second natural language.
  - 34. The information retrieval system of claim 25 wherein the term-document matrix is one of a plurality of term-document matrices, each term-document matrix having a first partition similar to the first partition of the term-document matrIx and having entries that represent content in a first natural language and content in a second natural language, each term-document matrix associated with a translation from a source language to a different target foreign language, wherein, in each term-document matrix, the first natural language comprises the source language and the second natural language comprises the target foreign natural language.
  - 35. The information retrieval system of claim 25, the first partition further comprising entries that represent a third version of the at least one reference document comprising content in a third natural language, such that the first, second, and third versions of the at least one reference document can be used to semantically line documents between the first, second, and third natural languages.
  - 36. The information retrieval system of claim 35 wherein the first and second versions of the at least one reference document are used to translate terms between the first and second natural language and the first and third versions of the at feast one reference document are used to translate terms between the first and third natural language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fiver LLC
Original Assignee
Insightful Corporation (Cloud Software Group)
Inventors
Marchisio, Giovanni B.
Primary Examiner(s)
Cottingham, John
Assistant Examiner(s)
Lu, Kuen S

Application Number

US10/855,786
Publication Number

US 20050021517A1
Time in Patent Office

1,203 Days
Field of Search

707/104.1, 707/102, 707/101, 707/2, 707/205, 704/8
US Class Current

1/1
CPC Class Codes

G06F 16/30   of unstructured textual dat...

G06F 16/334   Query execution G06F16/335 ...

G06F 16/954   Navigation, e.g. using cate...

G06F 40/169   Annotation, e.g. comment da...

G06F 40/216   using statistical methods

G06F 40/268   Morphological analysis

G06F 40/279   Recognition of textual enti...

G06F 40/30   Semantic analysis

G06F 40/58   Use of machine translation,...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99939   Privileged access

Y10S 707/99943   Generating database or data...

Extended functionality for an inverse inference engine based web search

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Extended functionality for an inverse inference engine based web search

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links