Knowledge discovery from citation networks
First Claim
1. A method for processing a corpus of documents having a multi-level transitive linkage structure with other documents, comprising:
- providing a computer-implemented generative model based on at least parameters estimated by probabilistic inference, which models a content of each document in the document database based on at least(i) an intrinsic content of each respective document; and
(ii) a content of related documents linked to each respective document through the multi-level transitive linkage structure,the computer-implemented generative model representing the content of each respective document as at least a mixture over latent topics having topic distributions which are a mixture of distributions associated with the related documents comprising at least a mixture weighting of the intrinsic content of each respective document and the content of related documents linked to each respective document through the multi-level transitive linkage structure;
at least one of representing, characterizing, clustering, summarizing, indexing, ranking, and searching the documents in the corpus of documents with at least one automated processor based on at least the computer-implemented generative model; and
controlling a human machine interface of an information processing system to receive a user input, and selectively dependent on the user input and the at least one of representing, characterizing, clustering, summarizing, indexing, ranking, and searching of the documents in the corpus of documents, to output a representation of a relationship of a plurality of respective documents.
1 Assignment
0 Petitions
Accused Products
Abstract
In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: document itself and a citation of other documents. A Bernoulli Process Topic (BPT) model is provided which models the corpus at two levels: document level and citation level. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multi-level hierarchical structure of the citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach.
-
Citations
20 Claims
-
1. A method for processing a corpus of documents having a multi-level transitive linkage structure with other documents, comprising:
-
providing a computer-implemented generative model based on at least parameters estimated by probabilistic inference, which models a content of each document in the document database based on at least (i) an intrinsic content of each respective document; and (ii) a content of related documents linked to each respective document through the multi-level transitive linkage structure, the computer-implemented generative model representing the content of each respective document as at least a mixture over latent topics having topic distributions which are a mixture of distributions associated with the related documents comprising at least a mixture weighting of the intrinsic content of each respective document and the content of related documents linked to each respective document through the multi-level transitive linkage structure; at least one of representing, characterizing, clustering, summarizing, indexing, ranking, and searching the documents in the corpus of documents with at least one automated processor based on at least the computer-implemented generative model; and controlling a human machine interface of an information processing system to receive a user input, and selectively dependent on the user input and the at least one of representing, characterizing, clustering, summarizing, indexing, ranking, and searching of the documents in the corpus of documents, to output a representation of a relationship of a plurality of respective documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for processing a set of documents having a multi-level transitive linkage structure with other documents within the set of documents, comprising:
-
generating a probabilistic inference model of a content of each document of the set of documents, based on at least; (i) an intrinsic content of each document, (ii) a content of related documents linked to each document through the multi-level transitive linkage structure; and (iii) a mixture weighting of the intrinsic content of each document and the content of the related documents linked to each document through the multi-level transitive linkage structure, to represent the content of each document as at least a mixture over latent topics having topic distributions which are a mixture of distributions associated with the related documents linked to each document through the multi-level transitive linkage structure; at least one of representing, characterizing, clustering, summarizing, indexing, ranking, and searching the documents in the corpus of documents with at least one automated processor based on at least the computer-implemented generative model; and controlling a human machine interface of an information processing system to receive a user input, and selectively dependent on the user input and the at least one of representing, characterizing, clustering, summarizing, indexing, ranking, and searching of the documents in the corpus of documents, to output a representation of a relationship of a plurality of respective documents.
-
-
13. A system for processing a corpus of documents having a multi-level transitive linkage structure with other documents, comprising:
-
a memory configured to store a computer-implemented generative model dependent on at least parameters estimated by probabilistic inference, which models a content of each respective document in the document database as at least a mixture weighting of an intrinsic content of each respective document and a content of documents related to the respective document through multi-level transitive linkage structure, to represent the content of each respective document as at least a mixture over latent topics having topic distributions which are a mixture of distributions associated with documents related to the respective document through multi-level transitive linkage structure; a human-machine interface; and at least one automated processor configured to; at least one of represent, characterize, cluster, summarize, index, rank, and search the documents in the corpus of documents with at least one automated processor based on at least the computer-implemented generative model; control the human machine interface to receive a user input; and selectively dependent on the user input and the at least one of represent, characterize, cluster, summarize, index, rank, and search of the documents in the corpus of documents, output a representation of a relationship of a plurality of respective documents. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification