Conceptual world representation natural language understanding system and method
First Claim
1. A method for indexing a free text document, the method comprising:
- typographically and functionally segmenting, by a computing device, said free text document;
identifying, by the computing device, words and multi-word terms in said free text document,matching, by the computing device, said words and multi-word terms to a first plurality of concepts, said first plurality of concepts being contained in a formal ontology, wherein the words and multi-word terms are matched to the first plurality of concepts by first matching the words and multi-word terms to a lexicon of terms, the lexicon of terms containing terms in a plurality of languages, the terms in a plurality of languages being linked to the concepts in the formal ontology,adding, by the computing device, said first plurality of concepts to a conceptual graph,identifying, by the computing device, a second plurality of concepts, said second plurality of concepts being related to said first plurality of concepts, said second plurality of concepts being contained in said formal ontology,adding, by the computing device, said second plurality of concepts to said conceptual graph,finding, by a spreading activation algorithm executed by the computing device, a list of relevant concepts associated with said first and second plurality of concepts using links in the formal ontology, said list of relevant concepts representing a meaning contained in said free text document, andadding, by the computing device, said list of relevant concepts to an index for said free text document.
3 Assignments
0 Petitions
Accused Products
Abstract
A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.
-
Citations
12 Claims
-
1. A method for indexing a free text document, the method comprising:
-
typographically and functionally segmenting, by a computing device, said free text document; identifying, by the computing device, words and multi-word terms in said free text document, matching, by the computing device, said words and multi-word terms to a first plurality of concepts, said first plurality of concepts being contained in a formal ontology, wherein the words and multi-word terms are matched to the first plurality of concepts by first matching the words and multi-word terms to a lexicon of terms, the lexicon of terms containing terms in a plurality of languages, the terms in a plurality of languages being linked to the concepts in the formal ontology, adding, by the computing device, said first plurality of concepts to a conceptual graph, identifying, by the computing device, a second plurality of concepts, said second plurality of concepts being related to said first plurality of concepts, said second plurality of concepts being contained in said formal ontology, adding, by the computing device, said second plurality of concepts to said conceptual graph, finding, by a spreading activation algorithm executed by the computing device, a list of relevant concepts associated with said first and second plurality of concepts using links in the formal ontology, said list of relevant concepts representing a meaning contained in said free text document, and adding, by the computing device, said list of relevant concepts to an index for said free text document. - View Dependent Claims (2, 3, 4)
-
-
5. A method for indexing a free text document, comprising:
-
typographically segmenting, by a computing device, the free text document; functionally segmenting, by the computing device, the free text document, wherein functionally segmenting includes identifying modalized words in the free text document and tagging clauses or phrases containing the modalized words as modalized text; extracting, by the computing device, concepts from the segmented free text document by matching words and multi-word terms in the segmented free text document to a plurality of concepts contained in a formal ontology, wherein the extracting is based at least in part on the clauses or phrases tagged as modalized text; finding, by a spreading activation algorithm executed by the computing device, a list of relevant concepts associated with the plurality of concepts using links in the formal ontology; and adding, by the computing device, the list of relevant concepts to an index for the free text document. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
-
Specification