System and method for automating the generation of an ontology from unstructured documents
First Claim
Patent Images
1. A domain independent method of creating an ontology comprising:
- a programmable processor automatically extracting phrases from one or more documents independent of a domain of the one or more documents, wherein said extracting of the phrases further comprises separating a portion of a content of at least some of the extracted phrases based upon barrier characters;
the programmable processor extracting core noun phrases from the one or more documents, wherein a plurality of core noun phrases is identified at least in part based on an absence of each of the plurality of core noun phrases from each of an adjective word list, a verb word list, and a barrier word list;
the programmable processor extracting links from the one or more documents based at least in part on the plurality of core noun phrases; and
the programmable processor generating an ontology in accordance with at least the extracted phrases.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for the substantially automatic creation of ontologies from unstructured documents identify phrases and core noun phrases from the respective documents. Links can be extracted from the documents. Concepts can be identified from the documents. Ontologies can be automatically created for the documents. The processing is domain independent.
31 Citations
25 Claims
-
1. A domain independent method of creating an ontology comprising:
-
a programmable processor automatically extracting phrases from one or more documents independent of a domain of the one or more documents, wherein said extracting of the phrases further comprises separating a portion of a content of at least some of the extracted phrases based upon barrier characters; the programmable processor extracting core noun phrases from the one or more documents, wherein a plurality of core noun phrases is identified at least in part based on an absence of each of the plurality of core noun phrases from each of an adjective word list, a verb word list, and a barrier word list; the programmable processor extracting links from the one or more documents based at least in part on the plurality of core noun phrases; and the programmable processor generating an ontology in accordance with at least the extracted phrases. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. An apparatus comprising:
-
a processor including first software, executed by the processor, that analyzes a document and forms an extracted phrases file based on barrier characters identified in the document; the processor including second software, executed by the processor, that analyzes the document and forms a core noun phrases file comprising a plurality of core noun phrases independent of a domain of the document wherein the processor identifies the plurality of core noun phrases at least in part based on an absence of each of the plurality of core noun phrases from each of an adjective word list, a verb word list, and a barrier word list; the processor including third software, executed by the processor, that analyzes the document and forms a link file based at least in part on the plurality of core noun phrases; and the processor including fourth software, executed by the processor, that forms an ontology in accordance with selected phrases in the extracted phrases file. - View Dependent Claims (19, 20, 21)
-
-
22. An ontology generating system comprising:
-
first software, recorded on a non-transitory computer readable medium, that extracts and stores phrases from at least one text source independent of a domain of the text source; second software, recorded on a non-transitory computer readable medium, that extracts and stores a plurality of core noun phrases from the extracted phrases, wherein the plurality of core noun phrases is identified at least in part based on an absence of each of the plurality of core noun phrases from each of an adjective word list, a verb word list, and a barrier word list; third software, recorded on a computer readable medium, that extracts and stores links from the extracted phrases based at least in part on the plurality of core noun phrases; and fourth software, recorded on a computer readable medium, that generates an ontology for the at least one text source. - View Dependent Claims (23, 24, 25)
-
Specification