×

Information mining using domain specific conceptual structures

  • US 8,805,843 B2
  • Filed: 06/03/2008
  • Issued: 08/12/2014
  • Est. Priority Date: 02/13/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method executed by a computer processor and stored on a computer readable medium for use with a first set of documents related to a first topic of interest and a second set of documents related to a second topic of interest, the method comprising the steps of:

  • automatically generating a first taxonomy through a feature space derived from the first set of documents, wherein the feature space includes at least one of unstructured data, structured data, and annotations derived from text of the first set of documents, and the first taxonomy provides a first partition of the set of documents according to the taxonomy;

    using domain-specific knowledge to re-partition the first set of documents to provide a second partition of the first set of documents;

    creating a refined taxonomy for the first set of documents according to the second partition so that the refined taxonomy incorporates the domain specific knowledge;

    using the refined taxonomy to categorize the first e of documents into a first set of categories;

    creating a second set of categories of the first set of documents, wherein the second set of categories are independent of the second partition based on at least one of unstructured data, structured data, and annotations derived from text in the first set of documents;

    constructing a contingency table having the first set of categories along a first axis and the second set of categories along a second axis, wherein the contingency table includes cells having respective actual values and for which respective expected values are computed, and the contingency table includes a cell having trending information;

    displaying the first set of categories along a first axis and the second set of categories along a second axis on a display device;

    comparing the expected value against the actual value of a cell to identify a category of interest;

    computing a degree of significance for the actual value of the cell;

    identifying a relationship between at least two different categories using the contingency table;

    using the contingency table and trending information to identify a recent category with respect to some pre-determined date;

    using an element of domain knowledge to re-categorize the first set of documents;

    categorizing the second set of documents according to the first set of categories of the first set of documents, further including categorizing the second set of documents according to a criterion chosen from the group consisting of;

    text within the second set of documents, structure within the second set of documents, and annotations derived from text within the second set of documents;

    examining the first set of categories to identify correlations between categories; and

    examining a category of the first set of categories to identify a document of interest, the document of interest being a representative document within the category.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×