Conceptual world representation natural language understanding system and method
First Claim
1. A method of segmenting a free text document into functional sections, wherein said document comprises a plurality of functional sections, each of said plurality of functional sections representing a sub-topic, the method being performed by a processing device and a memory encoded with instructions that are executed by the processing device, the method comprising:
- a) dividing the document into a plurality of paragraphs,b) determining for each paragraph of said plurality of paragraphs a probability that each label of a plurality of human-applied labels is appropriate for the paragraph by;
b1) collecting a set of documents representative of an application,b2) providing for each paragraph and title in the set of documents a label which is considered appropriate for that paragraph or title,b3) counting a first number of occurrences of a first word within a first paragraph of text designated with a first label,b4) counting a second number of occurrences of said first word within paragraphs of text designated with a second label other than said first label,b5) computing a ratio of the first number of occurrences to the second number of occurrences, this ratio being taken as a degree of association between said first word and said first label, a ratio greater than 1 signifying a greater than normal degree of association, a ratio less than 1 signifying a weaker than normal degree of association,b6) repeating acts (b3) through (b5) for each word within said first paragraph to determine the probability,c) assigning to each paragraph the label determined in act (b) to have the highest probability,d) grouping any sequence of one or more sequential paragraphs with the same assigned label as a single functional section,e) either assigning or not assigning each paragraph of said plurality of paragraphs to said single functional section based on said probability,each of acts (a) through (e) being performed on each paragraph of said plurality of paragraphs for each of said plurality of functional sections to provide a segmented free text document, andstoring the segmented free text document in a computer based storage system.
3 Assignments
0 Petitions
Accused Products
Abstract
A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.
-
Citations
3 Claims
-
1. A method of segmenting a free text document into functional sections, wherein said document comprises a plurality of functional sections, each of said plurality of functional sections representing a sub-topic, the method being performed by a processing device and a memory encoded with instructions that are executed by the processing device, the method comprising:
-
a) dividing the document into a plurality of paragraphs, b) determining for each paragraph of said plurality of paragraphs a probability that each label of a plurality of human-applied labels is appropriate for the paragraph by; b1) collecting a set of documents representative of an application, b2) providing for each paragraph and title in the set of documents a label which is considered appropriate for that paragraph or title, b3) counting a first number of occurrences of a first word within a first paragraph of text designated with a first label, b4) counting a second number of occurrences of said first word within paragraphs of text designated with a second label other than said first label, b5) computing a ratio of the first number of occurrences to the second number of occurrences, this ratio being taken as a degree of association between said first word and said first label, a ratio greater than 1 signifying a greater than normal degree of association, a ratio less than 1 signifying a weaker than normal degree of association, b6) repeating acts (b3) through (b5) for each word within said first paragraph to determine the probability, c) assigning to each paragraph the label determined in act (b) to have the highest probability, d) grouping any sequence of one or more sequential paragraphs with the same assigned label as a single functional section, e) either assigning or not assigning each paragraph of said plurality of paragraphs to said single functional section based on said probability, each of acts (a) through (e) being performed on each paragraph of said plurality of paragraphs for each of said plurality of functional sections to provide a segmented free text document, and storing the segmented free text document in a computer based storage system. - View Dependent Claims (2, 3)
-
Specification