Probabilistic system for natural language processing

US 6,556,964 B2
Filed: 07/23/2001
Issued: 04/29/2003
Est. Priority Date: 09/30/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for encoding free-text data, comprising:

(A) receiving free-text data, wherein said free-text data includes;

words, a grammar, a syntax and a semantic relationship between said words;

(B) checking for synonyms of said words within said received free-text data;

(C) checking spelling of said words within said received free-text data;

(D) parsing said syntax of said received free-text data;

(E) transforming said grammar of said received free-text data;

(F) inferring concepts from said received free-text data, using a probabilistic system, wherein said probabilistic system further comprises a Bayesian network for managing one or more probabilistic calculations for use in slotting said words of said free-text data for translation to said inferred concept, and wherein said inferring concepts further comprises;

(1) identifying possible sets of word level network assignments for low level phrases in a parse tree;

(2) combining said identified low level phrase assignments to generated assignments for high level phrases;

(3) binding null states to nodes representing concepts apparently unexpressed; and

(4) selecting a highest probability state to provide an interpretation of said free-text data;

(G) creating an encoded representation of said received free-text data; and

(H) writing said encoded representation into a database.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A natural language understanding system is described to provide generation of concept codes from free-text medical data. A probabilistic model of lexical semantics, is implemented by means of a Bayesian network, and is used to determine the most probable concept or meaning associated with a sentence or phrase. The inventive method and system includes the steps of checking for synonyms, checking spelling, performing syntactic parsing, transforming text to its “deep” or semantic form, and performing a semantic analysis based on a probabilistic model of lexical semantics.

171 Citations

20 Claims

1. A method for encoding free-text data, comprising:
- (A) receiving free-text data, wherein said free-text data includes;
  
  words, a grammar, a syntax and a semantic relationship between said words;
  
  (B) checking for synonyms of said words within said received free-text data;
  
  (C) checking spelling of said words within said received free-text data;
  
  (D) parsing said syntax of said received free-text data;
  
  (E) transforming said grammar of said received free-text data;
  
  (F) inferring concepts from said received free-text data, using a probabilistic system, wherein said probabilistic system further comprises a Bayesian network for managing one or more probabilistic calculations for use in slotting said words of said free-text data for translation to said inferred concept, and wherein said inferring concepts further comprises;
  
  (1) identifying possible sets of word level network assignments for low level phrases in a parse tree;
  
  (2) combining said identified low level phrase assignments to generated assignments for high level phrases;
  
  (3) binding null states to nodes representing concepts apparently unexpressed; and
  
  (4) selecting a highest probability state to provide an interpretation of said free-text data;
  
  (G) creating an encoded representation of said received free-text data; and
  
  (H) writing said encoded representation into a database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method as recited in claim 1, wherein said receiving free-text data further comprises:
3. A method as recited in claim 1, wherein said checking for synonyms further comprises:
- (1) looking for words within said free-text data which qualify as variations of more standard terms, and (2) replacing said qualified words within said free-text data with said more standard terms.
4. A method as recited in claim 1, wherein said checking spelling further comprises checking the spelling of said words based on the probabilistic measure of semantic coherence of said proposed corrections.
5. A method as recited in claim 1, wherein said checking spelling further comprises:
- (1) identifying a target word within said free-text data for spell checking;
  
  (2) identifying a set of words known to a Bayesian network, wherein said identified set of words are those which said target word could be a misspelling, wherein said identification is based on transformations of said target word from said free-text data to create a list of candidate spellings of said target word;
  
  (3) binding said set of candidate spellings to said words known to said Bayesian network;
  
  (4) identifying a best candidate spelling from said list of candidate spellings based on probability values produced by said Bayesian network; and
  
  (5) replacing said target word with said best candidate spelling.
6. A method as recited in claim 5, wherein said identifying a best candidate spelling employs a probabilistic analysis of said candidate spellings.
7. A method as recited in claim 1, wherein said parsing of said syntax is performed in a manner constrained by ongoing analysis of semantic coherence of proposed syntactic relations within parse, and of word-sense assignments to words within said parse.
8. A method as recited in claim 1, wherein said parsing of said syntax further comprises a context-free grammar parser.
9. A method as recited in claim 1, wherein said transforming said grammar is performed in a manner constrained by ongoing analysis of semantic coherence of said parsed syntactic relations within said transformation.
10. A method as recited in claim 1, wherein said transforming said grammar further comprises placing syntactic parsing into a form suitable for semantic analysis.
11. A method as recited in claim 1, wherein said analyzing said semantic relationships further comprises said Bayesian network having one or more nodes each having probabilistic values.
12. A method as recited in claim 1, wherein said creating an encoded representation further comprises selecting the appropriate ICD9 code resulting from said checking of synonyms, checking of spelling, parsing syntax, transforming grammar, and analyzing grammar of said free-text data.

13. A method for providing encoded medical information from free-text data, operating on a computer system, including:
- a digital computer processor executing the steps of the method;
  
  a mass storage device connected to said digital computer processor for storing the data being worked on by the method;
  
  an input device, electrically connected to said digital computer processor, for receiving data to be worked on by the method;
  
  a preservation storage device electrically connected to said digital computer processor, to store resulting coded data;
  
  the method comprising;
  
  (A) receiving free-text data, wherein said free-text data includes;
  
  words, a grammar, a syntax, and a semantic relationship between said words;
  
  (B) checking for synonyms of said words within said received free-text data;
  
  (C) checking spelling of said words within said received free-text data;
  
  (D) parsing said syntax of said received free-text data;
  
  (E) transforming said grammar of said received free-text data;
  
  (F) analyzing said semantic relationship of said received free-text data, wherein said analysis is based on a probabilistic model of lexical semantics, wherein said probabilistic model relates said words to one or more concepts, wherein said words are appropriate for translation into a concept, and wherein said analyzing said semantic relationship further comprises;
  
  (1) identifying possible sets of word level network assignments for low level phrases in a parse tree;
  
  (2) combining said identified low level phrase assignments to generated assignments for high level phrases;
  
  (3) binding null states to nodes representing concepts apparently unexpressed; and
  
  (4) selecting a highest probability state to provide an interpretation of said free-text data;
  
  (G) creating an encoded representation of said received free-text data; and
  
  (H) writing said encoded representation into a database.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. A system as recited in claim 13, wherein said checking spelling of said words further comprises checking the spelling of words based on a probabilistic measure of semantic coherence of said proposed correction.
  - 15. A system as recited in claim 13, wherein said parsing of said syntax is performed in a manner constrained by ongoing analysis of semantic coherence of proposed syntactic relations within said parse, and of word-sense assignments to words within said parse.
  - 16. A system as recited in claim 13, wherein said parsing of said syntax further comprises a context-free grammar parser.
  - 17. A system as recited in claim 13, wherein said transforming said grammar is performed in a manner constrained by ongoing analysis of semantic coherence of proposed syntactic relations with said transformation.
  - 18. A system as recited in claim 13, wherein said transforming said grammar further comprises placing syntactic parsing into a form suitable for semantic analysis.
  - 19. A system as recited in claim 13, wherein said analyzing said semantic relationships further comprises a Bayesian network of nodes having probabilistic values.

20. A system for encoding free-text information, comprising:
- (A) an input device for receiving free-text information;
  
  (B) a processor electrically connected to said input device for processing said received free-text information, wherein said processing further comprises probabilistically calculating a relationship between said received free-text information and one or more concepts and wherein said probabilistic calculation further comprises a Bayesian network, and wherein said processing of said free-text information further comprises;
  
  (1) identifying possible sets of word level network assignments for low level phrases in a parse tree;
  
  (2) combining said identified low level phrase assignments to generated assignments for high level phrases;
  
  (3) binding null states to nodes representing concepts apparently unexpressed; and
  
  (4) selecting a highest probability state to provide an interpretation of said free-text data;
  
  (C) a digital storage device electrically connected to said processor;
  
  (D) a means for encoding said received free-text information employing said processor; and
  
  (E) a means for storing said encoded free-text information on said digital storage device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intermountain Intellectual Asset Management LLC
Original Assignee
IHC Health Services Incorporated (IHC Hospitals, Inc.)
Inventors
Haug, Peter J., Van Bree, Rudy E., Christensen, Lee M., Gundersen, Michael L., Koehler, Spencer B.
Primary Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US09/911,976
Publication Number

US 20020128816A1
Time in Patent Office

645 Days
Field of Search

704/9, 704/1, 704/10, 704/251, 704/255, 704/256, 704/257, 705/2, 705/3, 705/4
US Class Current

704/9
CPC Class Codes

G06F 40/10   Text processing natural lan...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/216   using statistical methods

G06F 40/232   Orthographic correction, e....

G06F 40/30   Semantic analysis

Probabilistic system for natural language processing

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

171 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Probabilistic system for natural language processing

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

171 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links