Wide-spectrum information search engine
First Claim
1. A method for processing information, comprising:
- receiving a segmented judgment matrix, the segmented judgment matrix being a numerical matrix pairing each of a set of terms to each of a set of subject matter classifications, each term being a word or phrase, the segmented judgment matrix having a plurality of information submatrices, each element of each information submatrix representing a rating of a relevance of the term of the element to the subject matter classification of the element, each information submatrix being a numerical matrix representing the relevance of each of a subset of the set of terms to each of a subset of the set of subject matter classifications, the information submatrices forming a disjoint set in which no two information submatrices have any row or any column of the segmented judgment matrix in common, and each element of the segmented judgment matrix not contained in one of the information submatrices having a rating indicating an absence of relevance of the corresponding term to the corresponding subject matter classification; and
using the segmented judgment matrix to calculate an information spectrum.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and computer program product for comparing documents includes segmenting a judgment matrix into a plurality of information sub-matrices where each submatrix has a plurality of classifications and a plurality of terms relevant to each classification; evaluating a relevance of each term of the plurality of terms with respect to each classification of each information submatrix of the information submatrices; calculating an information spectrum for a first document based upon at least some of the plurality of terms; calculating an information spectrum for a second document based upon at least some of the plurality of terms; and identifying the second document as relevant to the first document based upon a comparison of the calculated information spectrums.
40 Citations
23 Claims
-
1. A method for processing information, comprising:
-
receiving a segmented judgment matrix, the segmented judgment matrix being a numerical matrix pairing each of a set of terms to each of a set of subject matter classifications, each term being a word or phrase, the segmented judgment matrix having a plurality of information submatrices, each element of each information submatrix representing a rating of a relevance of the term of the element to the subject matter classification of the element, each information submatrix being a numerical matrix representing the relevance of each of a subset of the set of terms to each of a subset of the set of subject matter classifications, the information submatrices forming a disjoint set in which no two information submatrices have any row or any column of the segmented judgment matrix in common, and each element of the segmented judgment matrix not contained in one of the information submatrices having a rating indicating an absence of relevance of the corresponding term to the corresponding subject matter classification; and
using the segmented judgment matrix to calculate an information spectrum.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
receiving a search request;
using the segmented judgment matrix to calculate an information spectrum of the search request;
using the segmented judgment matrix to calculate an information spectrum for each of a plurality of documents; and
identifying at least some documents of the plurality of documents as relevant to the search request based upon a comparison of the calculated information spectrums.
-
-
5. The method of claim 4 wherein:
-
each information submatrix has a plurality of subject matter classifications and a plurality of terms relevant to each subject matter classification; and
using the segmented judgment matrix to calculate an information spectrum for each of a plurality of documents comprises calculating an information spectrum for each of the plurality of documents based upon at least some of the plurality of terms;
the method further comprising;
selecting the plurality of terms based upon a relevance of each term of the plurality of terms to at least some of the subject matter classifications of the information submatrices.
-
-
6. The method of claim 4 wherein the step of calculating an information spectrum for each document and for the search request further comprises determining a log average among the ratings of relevance of the terms for each subject matter classification.
-
7. The method of claim 4 wherein the step of identifying at least some documents further comprises determining a distance between the information spectrum of the at least some documents and the information spectrum of the search request.
-
8. The method of claim 4 further comprising:
-
selecting a document of the identified documents as definitely relevant to the search request including calculating an information spectrum of the selected document; and
using the calculated information spectrum of the selected document as a new search request.
-
-
9. The method of claim 4 further comprising:
-
zooming in on a portion of a document information spectrum; and
determining that a document and request have a wide spectrum with significant content in a field F of a term and measuring the request and document using a subengine for field F.
-
-
10. The method of claim 1, wherein using the segmented judgment matrix to calculate an information spectrum comprises:
using the segment judgment matrix and a collection of documents to create an augmented judgment matrix, the augmented judgment matrix having a matrix elements for a new term and an existing subject matter classification, the matrix element value being a relevance value calculated from the number of occurrences of the new term in all documents in the collection considered definitely relevant to a the subject matter classification.
-
11. The method of claim 4, wherein the step of calculating an information spectrum for each document and for the search request further comprises determining a value for each subject matter classification by mathematically combining the ratings of the terms found in the document or request.
-
12. A computer program product comprising instructions operable to cause data processing apparatus to:
-
receive a segmented judgment matrix, the segmented judgment matrix being a numerical matrix pairing each of a set of terms to each of a set of subject matter classifications, each term being a word or phrase, the segmented judgment matrix having a plurality of information submatrices, each element of each information submatrix representing a rating of a relevance of the term of the element to the subject matter classification of the element, each information submatrix being a numerical matrix representing the relevance of each of a subset of the set of terms to each of a subset of the set of subject matter classifications, the information submatrices forming a disjoint set in which no two information submatrices have any row or any column of the segmented judgment matrix in common, and each element of the segmented judgment matrix not contained in one of the information submatrices having a rating indicating an absence of relevance of the corresponding term to the corresponding subject matter classification; and
use the segmented judgment matrix to calculate an information spectrum. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
receive a search request;
use the segmented judgment matrix to calculate an information spectrum of the search request;
use the segmented judgment matrix to calculate an information spectrum for each of a plurality of documents; and
identify at least some documents of the plurality of documents as relevant to the search request based upon a comparison of the calculated information spectrums.
-
-
16. The product of claim 15 wherein:
-
each information submatrix has a plurality of subject matter classifications and a plurality of terms relevant to each subject matter classification; and
the instructions to use the segmented judgment matrix to calculate an information spectrum for each of a plurality of documents comprise instructions to calculate an information spectrum for each of the plurality of documents based upon at least some of the plurality of terms;
the product further comprising instructions to;
select the plurality of terms based upon a relevance of each term of the plurality of terms to at least some of the subject matter classifications of the information submatrices.
-
-
17. The product of claim 15 wherein the instructions to calculate an information spectrum for each document and for the search request further comprise instructions to determine a log average among the ratings of relevance of the terms for each subject matter classification.
-
18. The product of claim 15 wherein the instructions to identify at least some documents further comprise instructions to determine a distance between the information spectrum of the at least some documents and the information spectrum of the search request.
-
19. The product of claim 15 further comprising instructions to:
-
select a document of the identified documents as definitely relevant to the search request including instructions to calculate an information spectrum of the selected document; and
use the calculated information spectrum of the selected document as a new search request.
-
-
20. The method of claim 15 further comprising instructions to:
-
zoom in on a portion of a document information spectrum; and
determine that a document and request have a wide spectrum with significant content in a field F of a term and measure the request and document using a subengine for field F.
-
-
21. The product of claim 12, wherein the instructions to use the segmented judgment matrix to calculate an information spectrum comprise instructions to:
use the segment judgment matrix and a collection of documents to create an augmented judgment matrix, the augmented judgment matrix having a matrix elements for a new term and an existing subject matter classification, the matrix element value being a relevance value calculated from the number of occurrences of the new term in all documents in the collection considered definitely relevant to a the subject matter classification.
-
22. The product of claim 15, wherein instructions to calculate an information spectrum for each document and for the search request further comprise instructions to determine a value for each subject matter classification by mathematically combining the ratings of the terms found in the document or request.
-
23. A computer program product for processing text information, the product comprising instructions operable to cause data processing apparatus to perform the operations of:
-
receiving a judgment matrix that is segmented into a plurality of information submatrices where each submatrix has a plurality of subject matter classifications and a plurality of terms relevant to each subject matter classification;
evaluating a relevance of each term of the plurality of terms with respect to each subject matter classification of each information submatrix of the information submatrices;
calculating an information spectrum for each of a plurality of documents based upon at least some of the plurality of terms;
receiving a search request;
calculating an information spectrum of the search request based upon at least some of the plurality of terms; and
identifying at least some documents of the plurality of documents as relevant to the request based upon a comparison of the calculated information spectrums.
-
Specification