Method and system for maximum-informativeness information extraction using a domain-specific ontology
First Claim
1. A method for transforming unstructured text into structured data using a domain-specific ontology, the method comprising:
- recording the unstructured text using an information extraction module (IEM);
discovering text phrases contained in the recorded unstructured text via the IEM;
retrieving lexical data from a knowledge source using the IEM and the discovered text phrases;
processing each of the discovered text phrases and the lexical data using the IEM to thereby generate a plurality of nodes in the domain-specific ontology, wherein each of the plurality of nodes represents a corresponding single concept as a cluster of synonyms;
using the plurality of generated nodes to classify the discovered text phrases by corresponding objects of interest, thereby transforming the unstructured text into structured data;
generating a list of sub-phrases of the discovered text phrases;
mapping each sub-phrase in the generated list of sub-phrases into the domain-specific ontology via the IEM;
using an informativeness function to quantify each of the sub-phrases of the discovered text phrases by a normalized relative importance informativeness score of between 0 and 1; and
eliminating all sub-phrases from the domain-specific ontology that have an informativeness score that is less than a calibrated threshold.
8 Assignments
0 Petitions
Accused Products
Abstract
A method transforms unstructured text into structured data in a domain-specific ontology. The method includes recording an input block of text using an information extraction module (IEM), accessing a domain-specific ontology and supplemental data in a knowledge source(s) via the IEM, processing the input text block, and using the IEM to generate a plurality of nodes in the domain-specific ontology. Each node classifies the unstructured text to corresponding objects of interest, thereby transforming the unstructured text into the structured data. An IEM is also provided having a computer device and an algorithm executable thereby to transform unstructured text into structured data in a domain-specific ontology. The IEM is adapted for recording a text phrase using the computer device, accessing and retrieving the domain-specific ontology and supplemental data from a knowledge source(s), and processing the text block using the computer device to generate a plurality of nodes in the domain-specific ontology.
19 Citations
16 Claims
-
1. A method for transforming unstructured text into structured data using a domain-specific ontology, the method comprising:
-
recording the unstructured text using an information extraction module (IEM); discovering text phrases contained in the recorded unstructured text via the IEM; retrieving lexical data from a knowledge source using the IEM and the discovered text phrases; processing each of the discovered text phrases and the lexical data using the IEM to thereby generate a plurality of nodes in the domain-specific ontology, wherein each of the plurality of nodes represents a corresponding single concept as a cluster of synonyms; using the plurality of generated nodes to classify the discovered text phrases by corresponding objects of interest, thereby transforming the unstructured text into structured data; generating a list of sub-phrases of the discovered text phrases; mapping each sub-phrase in the generated list of sub-phrases into the domain-specific ontology via the IEM; using an informativeness function to quantify each of the sub-phrases of the discovered text phrases by a normalized relative importance informativeness score of between 0 and 1; and eliminating all sub-phrases from the domain-specific ontology that have an informativeness score that is less than a calibrated threshold. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for transforming unstructured text into structured data using a domain-specific ontology, the method comprising:
-
inputting the unstructured text into an information extraction module (IEM); embedding the unstructured text in a domain-specific text archive; retrieving data from a plurality of different knowledge sources, including the domain-specific text archive, via the IEM using the unstructured text; processing the unstructured text using the IEM based on the retrieved data to thereby generate a plurality of nodes in the domain-specific ontology, wherein each of the nodes represents a corresponding single concept as a cluster of synonyms for the unstructured text; transforming the unstructured text into the structured data via the plurality of nodes, including classifying the unstructured text to predetermined corresponding objects of interest; quantifying all sub-phrases of the classified unstructured text by relative informativeness as a normalized value between 0 and 1 using an informativeness function; and eliminating all quantified sub-phrases from the domain-specific ontology having a normalized value that is less than a calibrated threshold. - View Dependent Claims (8, 9, 10, 11)
-
-
12. An information extraction module (IEM) comprising:
-
a computer device; and an algorithm recorded in memory of the computer device and executable by the computer device, wherein the computer device executes the algorithm to thereby; receive and record an input block of unstructured text; discover text phrases contained in the received input block of unstructured text; access and retrieve lexical data from a knowledge source using the discovered text phrases; process the discovered text phrases and the retrieved lexical data to thereby generate a plurality of nodes in the domain-specific ontology; map sub-phrases of the discovered text phrases into the domain-specific ontology; use an informativeness function to quantify the sub-phrases by their relative informativeness as a normalized value between 0 and 1; and eliminate all quantified sub-phrases from the domain-specific ontology having a normalized value that is less than a calibrated threshold; wherein each of the plurality of nodes classifies the unstructured text to predetermined corresponding objects of interest, thereby transforming the unstructured text into the structured data. - View Dependent Claims (13, 14, 15, 16)
-
Specification