×

Knowledge discovery from citation networks

  • US 8,630,975 B1
  • Filed: 12/02/2011
  • Issued: 01/14/2014
  • Est. Priority Date: 12/06/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for characterizing a corpus of documents each having one or more references, comprising:

  • identifying a network of multilevel hierarchically related documents having direct references, and indirect references, wherein the references are associated with content relationships;

    for each respective document, determining a first set of latent topic characteristics based on an intrinsic content of the respective document;

    for each document, determining a second set of latent topic characteristics based on a respective content of other documents which are referenced directly and indirectly through at least one other document to the respective document, the indirectly referenced documents contributing transitively to the latent topic characteristics of the respective document;

    representing a set of latent topics for the respective document based on a joint probability distribution of at least the first and second sets of latent topic characteristics, dependent on the identified network, wherein the contributions of at least the second set of latent topic characteristics are determined by an iterative process, wherein the represented set of latent topics is modeled at both a document level and a reference level, by differentiating the two different levels and the multilevel hierarchical network which is captured by a Bernoulli random process; and

    storing, in a memory, the represented set of latent topics for the respective document.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×