Conceptual world representation natural language understanding system and method

US 8,812,292 B2
Filed: 03/06/2013
Issued: 08/19/2014
Est. Priority Date: 07/12/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method for indexing a free text document, the method comprising:

typographically and functionally segmenting, by a computing device, said free text document;

identifying, by the computing device, words and multi-word terms in said free text document,matching, by the computing device, said words and multi-word terms to a first plurality of concepts, said first plurality of concepts being contained in a formal ontology, wherein the words and multi-word terms are matched to the first plurality of concepts by first matching the words and multi-word terms to a lexicon of terms, the lexicon of terms containing terms in a plurality of languages, the terms in a plurality of languages being linked to the concepts in the formal ontology,adding, by the computing device, said first plurality of concepts to a conceptual graph,identifying, by the computing device, a second plurality of concepts, said second plurality of concepts being related to said first plurality of concepts, said second plurality of concepts being contained in said formal ontology,adding, by the computing device, said second plurality of concepts to said conceptual graph,finding, by a spreading activation algorithm executed by the computing device, a list of relevant concepts associated with said first and second plurality of concepts using links in the formal ontology, said list of relevant concepts representing a meaning contained in said free text document, andadding, by the computing device, said list of relevant concepts to an index for said free text document.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.

Citations

12 Claims

1. A method for indexing a free text document, the method comprising:
- typographically and functionally segmenting, by a computing device, said free text document;
  
  identifying, by the computing device, words and multi-word terms in said free text document,matching, by the computing device, said words and multi-word terms to a first plurality of concepts, said first plurality of concepts being contained in a formal ontology, wherein the words and multi-word terms are matched to the first plurality of concepts by first matching the words and multi-word terms to a lexicon of terms, the lexicon of terms containing terms in a plurality of languages, the terms in a plurality of languages being linked to the concepts in the formal ontology,adding, by the computing device, said first plurality of concepts to a conceptual graph,identifying, by the computing device, a second plurality of concepts, said second plurality of concepts being related to said first plurality of concepts, said second plurality of concepts being contained in said formal ontology,adding, by the computing device, said second plurality of concepts to said conceptual graph,finding, by a spreading activation algorithm executed by the computing device, a list of relevant concepts associated with said first and second plurality of concepts using links in the formal ontology, said list of relevant concepts representing a meaning contained in said free text document, andadding, by the computing device, said list of relevant concepts to an index for said free text document.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, wherein:
    - said typographically segmenting said free text document comprises;
      
      delimiting said free text document into words, sentences, titles, list items and paragraph based character patterns in said free text document, andsaid functionally segmenting said free text document comprises;
      
      grouping words into multi-word terms, segmenting said sentences into clause-phrase segments, and grouping words into noun phrases.
  - 3. A method as defined in claim 1, wherein the second plurality of concepts are related to the first plurality of concepts by parent/child relationships, the second plurality of concepts being parent concepts.
  - 4. A method as defined in claim 1, wherein the second plurality of concepts are related to the first plurality of concepts by a plurality of link types, wherein a link type defines a relationship between a first concept and a second concept.

5. A method for indexing a free text document, comprising:
- typographically segmenting, by a computing device, the free text document;
  
  functionally segmenting, by the computing device, the free text document, wherein functionally segmenting includes identifying modalized words in the free text document and tagging clauses or phrases containing the modalized words as modalized text;
  
  extracting, by the computing device, concepts from the segmented free text document by matching words and multi-word terms in the segmented free text document to a plurality of concepts contained in a formal ontology, wherein the extracting is based at least in part on the clauses or phrases tagged as modalized text;
  
  finding, by a spreading activation algorithm executed by the computing device, a list of relevant concepts associated with the plurality of concepts using links in the formal ontology; and
  
  adding, by the computing device, the list of relevant concepts to an index for the free text document.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12)
- - 6. A method as defined in claim 5, further comprising syntactic parsing, by the computing device, of the free text document.
  - 7. A method as defined in claim 5, wherein the plurality of concepts contained in the formal ontology include concepts that are independent of a specific language and concepts that explain the relationships between the language-independent concepts and language.
  - 8. A method as defined in claim 5, wherein the formal ontology comprises:
    - the plurality of concepts arranged in a hierarchy, the hierarchy having a primary node, wherein a primary concept occupies the primary node, the primary concept being the most general concept in the formal ontology, wherein the concepts become more specific at lower levels of the hierarchy;
      
      the plurality of concepts representing real world objects;
      
      each of the plurality of concepts having at least one definition;
      
      wherein a definition of a first concept comprises a first link to the first concept from a second concept, the link representing a relationship between the first concept and the second concept.
  - 9. A method as defined in claim 8, wherein each of the plurality of concepts is independently selected from the group consisting of domain concept, linguistic concept and domain/linguistic concept.
  - 10. A method as defined in claim 5, wherein typographically segmenting the free text document comprises delimiting the free text document into words, sentences, titles, list items and paragraph based character patterns in the free text document.
  - 11. A method as defined in claim 10, wherein functionally segmenting the free text document comprises grouping words into multi-word terms, segmenting the sentences into clause-phrase segments, and grouping words into noun phrases.
  - 12. A method as defined in claim 5, further comprising identifying modalizing words adjacent to negating words and tagging clauses or phrases containing the modalizing words adjacent to negating words as modalized text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Ceusters, Werner, O'Donnell, Mick, Montyne, Frank, Coppens, Frederik, Van Mol, Maarten
Primary Examiner(s)
He, Jialong

Application Number

US13/786,771
Publication Number

US 20130211823A1
Time in Patent Office

531 Days
Field of Search

704/4, 704/9, 707/2, 707/3
US Class Current

704/4
CPC Class Codes

G06F 16/35   Clustering; Classification

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G06N 5/02   Knowledge representation; S...

Conceptual world representation natural language understanding system and method

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Conceptual world representation natural language understanding system and method

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links