Conceptual world representation natural language understanding system and method
First Claim
1. A method of processing free text documents for indexing, said method comprising:
- typographically segmenting a free text document using a computer-based ontology management system, said typographically segmenting comprising;
delimiting said free text document into words, sentences, titles, list items and paragraphs based on character patterns in said free text document;
functionally segmenting said free text document using the computer-based ontology management system, said functionally segmenting comprising;
grouping words into multi-word terms, segmenting said sentences into clause-phrase segments, and grouping words into noun phrases, wherein said grouping words into multi-word terms is accomplished by identifying at least two adjacent words;
re-writing at least one of said at least two adjacent words to generate a pairing of at least two adjacent words containing at least one re-written word;
searching a lexicon of terms for said pairing of at least two adjacent words containing at least one re-written word;
if said pairing of at least two adjacent words containing at least one re-written word is found in said lexicon, replacing said pairing of at least two adjacent words with said pairing of at least two adjacent words containing at least one re-written word; and
tagging said pairing of at least two adjacent words containing at least one re-written word as a multi-word term.
3 Assignments
0 Petitions
Accused Products
Abstract
A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.
76 Citations
16 Claims
-
1. A method of processing free text documents for indexing, said method comprising:
-
typographically segmenting a free text document using a computer-based ontology management system, said typographically segmenting comprising;
delimiting said free text document into words, sentences, titles, list items and paragraphs based on character patterns in said free text document;functionally segmenting said free text document using the computer-based ontology management system, said functionally segmenting comprising;
grouping words into multi-word terms, segmenting said sentences into clause-phrase segments, and grouping words into noun phrases, wherein said grouping words into multi-word terms is accomplished by identifying at least two adjacent words;re-writing at least one of said at least two adjacent words to generate a pairing of at least two adjacent words containing at least one re-written word; searching a lexicon of terms for said pairing of at least two adjacent words containing at least one re-written word; if said pairing of at least two adjacent words containing at least one re-written word is found in said lexicon, replacing said pairing of at least two adjacent words with said pairing of at least two adjacent words containing at least one re-written word; and tagging said pairing of at least two adjacent words containing at least one re-written word as a multi-word term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
Specification