Query construction for semantic topic indexes derived by non-negative matrix factorization
First Claim
1. A method of evaluating a body of documents, comprising:
- parsing the body of documents into a term-document matrix A of values aij, where aij=a function of the number of times the term i appears in document j;
factoring the matrix A into a product W*H using non-negative matrix factorization, where W represents semantic topics contained in the body of documents and wherein each column of H contains an encoding of a linear combination of the semantic topics that approximates a corresponding column of A; and
constructing queries by weighting semantic topics to order the documents in accordance with relevance to the queries.
0 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus and machine-readable medium analyze documents processed by non-negative matrix factorization in accordance with semantic topics. Users construct queries by assigning weights to semantic topics to order documents within a set. The query may be refined in accordance with the user'"'"'s evaluation of the efficacy of the query. Any document that does not result in data indicative of significant correlation with at least one semantic topic is flagged so that a user may make a manual review. The collection of semantic topics may be continually or periodically updated in response to new documents. Additionally, the collection may also be “downdated” to drop semantic factors no longer appearing in new documents received after an initial set has been analyzed. Different sets of semantic topics may be generated and each document evaluated using each set. Reports may be prepared showing results for a body of documents for each of a plurality of sets of semantic topics.
53 Citations
24 Claims
-
1. A method of evaluating a body of documents, comprising:
-
parsing the body of documents into a term-document matrix A of values aij, where aij=a function of the number of times the term i appears in document j;
factoring the matrix A into a product W*H using non-negative matrix factorization, where W represents semantic topics contained in the body of documents and wherein each column of H contains an encoding of a linear combination of the semantic topics that approximates a corresponding column of A; and
constructing queries by weighting semantic topics to order the documents in accordance with relevance to the queries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A machine-readable medium that provides instructions, which when executed by a processor, causes said processor to perform operations comprising:
-
parsing a body of documents into a term-document matrix A of values aij, where aij=a function of the number of times the term i appears in document j;
factoring the matrix A into a product W*H using non-negative matrix factorization, where W represents semantic topics contained in the body of documents and wherein each column of H contains an encoding of a linear combination of the semantic topics that approximates a corresponding column of A; and
constructing queries by weighting semantic topics to order the documents in accordance with relevance to the queries. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system to evaluate a body of documents, comprising:
-
a reader and processor parsing the body of documents into a term-document matrix A of values aij, where aij=a function of the number of times the term i appears in document j;
said processor factoring the matrix A into a product W*H using non-negative matrix factorization, where W represents semantic topics contained in the body of documents and wherein each column of H contains an encoding of a linear combination of the semantic topics that approximates a corresponding column of A; and
said processor constructing queries by weighting semantic topics to order the documents in accordance with relevance to the queries. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification