Probabilistic system for natural language processing
First Claim
Patent Images
1. A method for encoding free-text data, comprising:
- (A) receiving free-text data, wherein said free-text data includes;
words, a grammar, a syntax and a semantic relationship between said words;
(B) checking for synonyms of said words within said received free-text data;
(C) checking spelling of said words within said received free-text data;
(D) parsing said syntax of said received free-text data;
(E) transforming said grammar of said received free-text data;
(F) inferring concepts from said received free-text data, using a probabilistic system, wherein said probabilistic system further comprises a Bayesian network for managing one or more probabilistic calculations for use in slotting said words of said free-text data for translation to said inferred concept, and wherein said inferring concepts further comprises;
(1) identifying possible sets of word level network assignments for low level phrases in a parse tree;
(2) combining said identified low level phrase assignments to generated assignments for high level phrases;
(3) binding null states to nodes representing concepts apparently unexpressed; and
(4) selecting a highest probability state to provide an interpretation of said free-text data;
(G) creating an encoded representation of said received free-text data; and
(H) writing said encoded representation into a database.
3 Assignments
0 Petitions
Accused Products
Abstract
A natural language understanding system is described to provide generation of concept codes from free-text medical data. A probabilistic model of lexical semantics, is implemented by means of a Bayesian network, and is used to determine the most probable concept or meaning associated with a sentence or phrase. The inventive method and system includes the steps of checking for synonyms, checking spelling, performing syntactic parsing, transforming text to its “deep” or semantic form, and performing a semantic analysis based on a probabilistic model of lexical semantics.
171 Citations
20 Claims
-
1. A method for encoding free-text data, comprising:
-
(A) receiving free-text data, wherein said free-text data includes;
words, a grammar, a syntax and a semantic relationship between said words;
(B) checking for synonyms of said words within said received free-text data;
(C) checking spelling of said words within said received free-text data;
(D) parsing said syntax of said received free-text data;
(E) transforming said grammar of said received free-text data;
(F) inferring concepts from said received free-text data, using a probabilistic system, wherein said probabilistic system further comprises a Bayesian network for managing one or more probabilistic calculations for use in slotting said words of said free-text data for translation to said inferred concept, and wherein said inferring concepts further comprises;
(1) identifying possible sets of word level network assignments for low level phrases in a parse tree;
(2) combining said identified low level phrase assignments to generated assignments for high level phrases;
(3) binding null states to nodes representing concepts apparently unexpressed; and
(4) selecting a highest probability state to provide an interpretation of said free-text data;
(G) creating an encoded representation of said received free-text data; and
(H) writing said encoded representation into a database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
(1) entering free-text data into a computer system;
(2) entering identification information into a computer system; and
(3) storing said entered free-test data and said identification information into a computer memory.
-
-
3. A method as recited in claim 1, wherein said checking for synonyms further comprises:
-
(1) looking for words within said free-text data which qualify as variations of more standard terms, and (2) replacing said qualified words within said free-text data with said more standard terms.
-
-
4. A method as recited in claim 1, wherein said checking spelling further comprises checking the spelling of said words based on the probabilistic measure of semantic coherence of said proposed corrections.
-
5. A method as recited in claim 1, wherein said checking spelling further comprises:
-
(1) identifying a target word within said free-text data for spell checking;
(2) identifying a set of words known to a Bayesian network, wherein said identified set of words are those which said target word could be a misspelling, wherein said identification is based on transformations of said target word from said free-text data to create a list of candidate spellings of said target word;
(3) binding said set of candidate spellings to said words known to said Bayesian network;
(4) identifying a best candidate spelling from said list of candidate spellings based on probability values produced by said Bayesian network; and
(5) replacing said target word with said best candidate spelling.
-
-
6. A method as recited in claim 5, wherein said identifying a best candidate spelling employs a probabilistic analysis of said candidate spellings.
-
7. A method as recited in claim 1, wherein said parsing of said syntax is performed in a manner constrained by ongoing analysis of semantic coherence of proposed syntactic relations within parse, and of word-sense assignments to words within said parse.
-
8. A method as recited in claim 1, wherein said parsing of said syntax further comprises a context-free grammar parser.
-
9. A method as recited in claim 1, wherein said transforming said grammar is performed in a manner constrained by ongoing analysis of semantic coherence of said parsed syntactic relations within said transformation.
-
10. A method as recited in claim 1, wherein said transforming said grammar further comprises placing syntactic parsing into a form suitable for semantic analysis.
-
11. A method as recited in claim 1, wherein said analyzing said semantic relationships further comprises said Bayesian network having one or more nodes each having probabilistic values.
-
12. A method as recited in claim 1, wherein said creating an encoded representation further comprises selecting the appropriate ICD9 code resulting from said checking of synonyms, checking of spelling, parsing syntax, transforming grammar, and analyzing grammar of said free-text data.
-
13. A method for providing encoded medical information from free-text data, operating on a computer system, including:
-
a digital computer processor executing the steps of the method;
a mass storage device connected to said digital computer processor for storing the data being worked on by the method;
an input device, electrically connected to said digital computer processor, for receiving data to be worked on by the method;
a preservation storage device electrically connected to said digital computer processor, to store resulting coded data;
the method comprising;
(A) receiving free-text data, wherein said free-text data includes;
words, a grammar, a syntax, and a semantic relationship between said words;
(B) checking for synonyms of said words within said received free-text data;
(C) checking spelling of said words within said received free-text data;
(D) parsing said syntax of said received free-text data;
(E) transforming said grammar of said received free-text data;
(F) analyzing said semantic relationship of said received free-text data, wherein said analysis is based on a probabilistic model of lexical semantics, wherein said probabilistic model relates said words to one or more concepts, wherein said words are appropriate for translation into a concept, and wherein said analyzing said semantic relationship further comprises;
(1) identifying possible sets of word level network assignments for low level phrases in a parse tree;
(2) combining said identified low level phrase assignments to generated assignments for high level phrases;
(3) binding null states to nodes representing concepts apparently unexpressed; and
(4) selecting a highest probability state to provide an interpretation of said free-text data;
(G) creating an encoded representation of said received free-text data; and
(H) writing said encoded representation into a database. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A system for encoding free-text information, comprising:
-
(A) an input device for receiving free-text information;
(B) a processor electrically connected to said input device for processing said received free-text information, wherein said processing further comprises probabilistically calculating a relationship between said received free-text information and one or more concepts and wherein said probabilistic calculation further comprises a Bayesian network, and wherein said processing of said free-text information further comprises;
(1) identifying possible sets of word level network assignments for low level phrases in a parse tree;
(2) combining said identified low level phrase assignments to generated assignments for high level phrases;
(3) binding null states to nodes representing concepts apparently unexpressed; and
(4) selecting a highest probability state to provide an interpretation of said free-text data;
(C) a digital storage device electrically connected to said processor;
(D) a means for encoding said received free-text information employing said processor; and
(E) a means for storing said encoded free-text information on said digital storage device.
-
Specification