CONCEPTUAL WORLD REPRESENTATION NATURAL LANGUAGE UNDERSTANDING SYSTEM AND METHOD
First Claim
1. A method for indexing a free text document, the method comprising:
- typographically and functionally segmenting said free text document;
identifying words and multi-word terms in said free text document,matching said words and multi-word terms to a first plurality of concepts, said first plurality of concepts being contained in a formal ontology,adding said first plurality of concepts to a conceptual graph,identifying a second plurality of concepts, said second plurality of concepts being related to said first plurality of concepts, said second plurality of concepts being contained in said formal ontology,adding said second plurality of concepts to said conceptual graph,ranking the relevance of said first and second plurality of concepts to a meaning contained in said free text to create a list of relevant concepts, said list of relevant concepts representing said meaning contained in said free text, andadding said list of relevant concepts to an index for said free text document.
3 Assignments
0 Petitions
Accused Products
Abstract
A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.
96 Citations
35 Claims
-
1. A method for indexing a free text document, the method comprising:
-
typographically and functionally segmenting said free text document; identifying words and multi-word terms in said free text document, matching said words and multi-word terms to a first plurality of concepts, said first plurality of concepts being contained in a formal ontology, adding said first plurality of concepts to a conceptual graph, identifying a second plurality of concepts, said second plurality of concepts being related to said first plurality of concepts, said second plurality of concepts being contained in said formal ontology, adding said second plurality of concepts to said conceptual graph, ranking the relevance of said first and second plurality of concepts to a meaning contained in said free text to create a list of relevant concepts, said list of relevant concepts representing said meaning contained in said free text, and adding said list of relevant concepts to an index for said free text document. - View Dependent Claims (2)
-
-
3. A method of processing free text documents for indexing, said method comprising:
-
typographically segmenting a free text document, said typographically segmenting comprising;
delimiting said free text document into words, sentences, titles, list items and paragraph based character patterns in said free text document, andfunctionally segmenting said free text document, said functionally segmenting comprising;
grouping words into multi-word terms, segmenting said sentences into clause-phrase segments, and grouping words into noun phrases. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of deriving the degree of association between words and human-applied labels for a body of text, the method comprising:
-
a) collecting a set of documents representative of the kind needed for an application, b) providing for each paragraph and title in the said documents a label which is considered appropriate for that paragraph or title, c) counting the number of occurrences of a first word within a first paragraph of text designated with a first label, d) counting the number of occurrences of said first word within paragraphs of text designated with a label other than first said label, e) computing the ratio of the occurrences in acts (c) and (d), this ratio being taken as the degree of association between said first word and said section, a ratio greater than 1 signifying a greater than normal association, a ratio less than 1 signifying a weaker than normal association, f) repeating acts (c) through (e) for each word within said first paragraph of text. - View Dependent Claims (21, 22, 23, 24)
-
-
25. A method for indexing a free text document, comprising:
-
typographically segmenting, by a computing device, the free text document; functionally segmenting, by the computing device, the free text document; extracting, by the computing device, concepts from the segmented free text document by matching words and multi-word terms in the segmented free text document to a plurality of concepts contained in a formal ontology; and indexing, by the computing device, the free text document based on the extracted concepts. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
Specification