×

Method and system for maximum-informativeness information extraction using a domain-specific ontology

  • US 8,176,048 B2
  • Filed: 11/10/2009
  • Issued: 05/08/2012
  • Est. Priority Date: 11/10/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method for transforming unstructured text into structured data using a domain-specific ontology, the method comprising:

  • recording the unstructured text using an information extraction module (IEM);

    discovering text phrases contained in the recorded unstructured text via the IEM;

    retrieving lexical data from a knowledge source using the IEM and the discovered text phrases;

    processing each of the discovered text phrases and the lexical data using the IEM to thereby generate a plurality of nodes in the domain-specific ontology, wherein each of the plurality of nodes represents a corresponding single concept as a cluster of synonyms;

    using the plurality of generated nodes to classify the discovered text phrases by corresponding objects of interest, thereby transforming the unstructured text into structured data;

    generating a list of sub-phrases of the discovered text phrases;

    mapping each sub-phrase in the generated list of sub-phrases into the domain-specific ontology via the IEM;

    using an informativeness function to quantify each of the sub-phrases of the discovered text phrases by a normalized relative importance informativeness score of between 0 and 1; and

    eliminating all sub-phrases from the domain-specific ontology that have an informativeness score that is less than a calibrated threshold.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×