×

System and method for text categorization based on ontologies

  • US 8,782,051 B2
  • Filed: 04/26/2013
  • Issued: 07/15/2014
  • Est. Priority Date: 02/07/2012
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system for text categorization based on ontologies, the system comprising:

  • a plurality of data collector software modules stored and operating on a plurality of network-attached computers;

    a categorizer software module stored and operating on a network-attached server computer; and

    a database server comprising an indexed database of documents and their categorizations, and further comprising a plurality of ontologies, each ontology comprising a plurality of hierarchical taxonomies and each hierarchical taxonomy comprising a plurality of taxons;

    wherein the data collector software modules receive a text to be classified and submits the received text to the categorizer software module; and

    further wherein the categorizer performs the following steps to categorize the received text;

    splitting the text into sentences;

    selecting words or phrases from the sentences of the received text that are present in one or more of the plurality of ontologies stored in the database server;

    determining one or more specific subcategories that the sentence corresponds to in view of pattern analysis of the selected words or phrases;

    selecting a plurality of subtrees from the plurality of ontologies based on the determination of the selected words or phrases belonging to one or more specific subcategories;

    determining a weight for each subcategory within the one or more of the specific subcategories;

    using the selected plurality of subtrees to create at least one modified subtree by eliminating from the selected plurality of subtrees subcategories having a category weight below a threshold;

    for each of the at least one modified subtree, computing a conditionality coefficient to make a Boolean determination of whether to consider the respective modified subtree or not in categorization of the text; and

    using any modified subtree that has been determined to be considered in categorization of the text to categorize the text;

    wherein the conditionality coefficient is determined at least in part by user-defined rules or the presence, or absence, of diagnostics in nodes of at least one neighboring subtree.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×