Conceptual world representation natural language understanding system and method

US 9,292,494 B2
Filed: 03/06/2013
Issued: 03/22/2016
Est. Priority Date: 07/12/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method of segmenting a free text document into functional sections, wherein said document comprises a plurality of functional sections, each of said plurality of functional sections representing a sub-topic, the method being performed by a processing device and a memory encoded with instructions that are executed by the processing device, the method comprising:

a) dividing the document into a plurality of paragraphs,b) determining for each paragraph of said plurality of paragraphs a probability that each label of a plurality of human-applied labels is appropriate for the paragraph by;

b1) collecting a set of documents representative of an application,b2) providing for each paragraph and title in the set of documents a label which is considered appropriate for that paragraph or title,b3) counting a first number of occurrences of a first word within a first paragraph of text designated with a first label,b4) counting a second number of occurrences of said first word within paragraphs of text designated with a second label other than said first label,b5) computing a ratio of the first number of occurrences to the second number of occurrences, this ratio being taken as a degree of association between said first word and said first label, a ratio greater than 1 signifying a greater than normal degree of association, a ratio less than 1 signifying a weaker than normal degree of association,b6) repeating acts (b3) through (b5) for each word within said first paragraph to determine the probability,c) assigning to each paragraph the label determined in act (b) to have the highest probability,d) grouping any sequence of one or more sequential paragraphs with the same assigned label as a single functional section,e) either assigning or not assigning each paragraph of said plurality of paragraphs to said single functional section based on said probability,each of acts (a) through (e) being performed on each paragraph of said plurality of paragraphs for each of said plurality of functional sections to provide a segmented free text document, andstoring the segmented free text document in a computer based storage system.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.

Citations

3 Claims

1. A method of segmenting a free text document into functional sections, wherein said document comprises a plurality of functional sections, each of said plurality of functional sections representing a sub-topic, the method being performed by a processing device and a memory encoded with instructions that are executed by the processing device, the method comprising:
- a) dividing the document into a plurality of paragraphs,b) determining for each paragraph of said plurality of paragraphs a probability that each label of a plurality of human-applied labels is appropriate for the paragraph by;
  
  b1) collecting a set of documents representative of an application,b2) providing for each paragraph and title in the set of documents a label which is considered appropriate for that paragraph or title,b3) counting a first number of occurrences of a first word within a first paragraph of text designated with a first label,b4) counting a second number of occurrences of said first word within paragraphs of text designated with a second label other than said first label,b5) computing a ratio of the first number of occurrences to the second number of occurrences, this ratio being taken as a degree of association between said first word and said first label, a ratio greater than 1 signifying a greater than normal degree of association, a ratio less than 1 signifying a weaker than normal degree of association,b6) repeating acts (b3) through (b5) for each word within said first paragraph to determine the probability,c) assigning to each paragraph the label determined in act (b) to have the highest probability,d) grouping any sequence of one or more sequential paragraphs with the same assigned label as a single functional section,e) either assigning or not assigning each paragraph of said plurality of paragraphs to said single functional section based on said probability,each of acts (a) through (e) being performed on each paragraph of said plurality of paragraphs for each of said plurality of functional sections to provide a segmented free text document, andstoring the segmented free text document in a computer based storage system.
- View Dependent Claims (2, 3)
- - 2. The method according to claim 1, wherein at least one paragraph of said plurality of paragraphs is preceded by a title and wherein said probability is a first probability, the method further comprising:
    - calculating a second probability that said at least one paragraph belongs to said functional section based on said title, andeither assigning or not assigning said at least one paragraph to said functional section based on a combination of said first probability and said second probability.
  - 3. The method according to claim 1, further comprising:
    - calculating the probability that at least one paragraph of said plurality of paragraphs belongs to said functional section based on the location of said at least one paragraph in said free text document relative to other paragraphs of said plurality of paragraphs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Ceusters, Werner, O'Donnell, Mick, Montyne, Frank, Coppens, Frederik, Van Mol, Maarten
Primary Examiner(s)
He, Jialong

Application Number

US13/786,830
Publication Number

US 20130185303A1
Time in Patent Office

1,112 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G06N 5/02   Knowledge representation; S...

Conceptual world representation natural language understanding system and method

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

3 Claims

Specification

Solutions

Use Cases

Quick Links

Conceptual world representation natural language understanding system and method

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

3 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links