×

Methods and systems for classifying data using a hierarchical taxonomy

  • US 9,367,814 B1
  • Filed: 06/22/2012
  • Issued: 06/14/2016
  • Est. Priority Date: 12/27/2011
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method including executing instructions stored on a computer-readable medium, the method comprising:

  • generating a set of document classifiers by applying a classification algorithm to a trusted corpus, wherein the trusted corpus includes a set of training documents representing a hierarchical taxonomy, the hierarchical taxonomy including a hierarchical tree structure of domain specific issues that includes multiple levels of issue categories, subcategories, and sub issues of each issue, the trusted corpus further includes previously classified documents associated with a classification confidence level above a predetermined confidence level threshold;

    executing one or more of the generated document classifiers against a first plurality of input documents to create a first plurality of classified documents, wherein each classified document is associated with a classification within the taxonomy and a classification confidence level;

    selecting one or more classified documents that are associated with a classification confidence level below the predetermined confidence level threshold to create a set of low-confidence documents;

    disassociating the low-confidence documents from each of the associated classifications;

    prompting a user to enter a new classification within the hierarchical taxonomy for at least one low-confidence document, wherein the low-confidence document is associated with the entered classification and with a predetermined confidence level to create a newly classified document in at least one of the multiple levels of issue categories, subcategories, and sub issues of each issue of the hierarchical taxonomy;

    applying a highest classification confidence level to the newly classified document;

    including the newly classified document in the trusted corpus to create an updated trusted corpus; and

    executing one or more of the generated document classifiers, by applying the classification algorithm to the updated trusted corpus against a second plurality of input documents to create a second plurality of classified documents, wherein each classified document is associated with a classification within the taxonomy and a classification confidence level.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×