Probabilistic method for natural language processing and for encoding free-text data into a medical database by utilizing a Bayesian network to perform spell checking of words
First Claim
1. A method for encoding free-text data, comprising:
- (A) receiving free-text data, wherein said free text data, wherein said free-text data includes;
words, a grammar, a syntax and a semantic relationship between said words;
(B) checking for synonyms of said words within said received free-text data;
(C) checking spelling of said words within said received free-text data wherein said checking spelling further comprises;
(1) identifying a target word within said free-text data for spell checking;
(2) identifying a set of words known to a Bayesian network, wherein said identified set of words are those which said target word could be a misspelling, wherein said identification is based on transformations of said target word from said free-text data to create a list of candidate spellings of said target word;
(3) binding said set of candidate spellings to said words known to said Bayesian network;
(4) identifying a best candidate spelling from said list of candidate spelling based on probability values produced by said Bayesian network; and
(5) replacing said target word with said best candidate spelling;
(D) parsing said syntax of said received free-text data;
(E) transforming said grammar of said received free-text data;
(F) analyzing said semantic relationship within said received free-text data on the basis of a probabilistic model of lexical semantics which relates said words to one or more concepts;
(G) creating an encoded representation of said received free-text data; and
(H) writing said encoded representation into a medical database.
3 Assignments
0 Petitions
Accused Products
Abstract
A natural language understanding system is described which provides for the generation of concept codes from free-text medical data. A probabilistic model of lexical semantics, in the preferred embodiment of the invention implemented by means of a Bayesian network, is used to determine the most probable concept or meaning associated with a sentence or phrase. The inventive method and system includes the steps of checking for synonyms, checking spelling, performing syntactic parsing, transforming text to its “deep” or semantic form, and performing a semantic analysis based on a probabilistic model of lexical semantics. In the preferred embodiment of the invention, spell checking and transformational processing as well as semantic analysis make use of semantic probabilistic determinations.
224 Citations
2 Claims
-
1. A method for encoding free-text data, comprising:
-
(A) receiving free-text data, wherein said free text data, wherein said free-text data includes;
words, a grammar, a syntax and a semantic relationship between said words;
(B) checking for synonyms of said words within said received free-text data;
(C) checking spelling of said words within said received free-text data wherein said checking spelling further comprises;
(1) identifying a target word within said free-text data for spell checking;
(2) identifying a set of words known to a Bayesian network, wherein said identified set of words are those which said target word could be a misspelling, wherein said identification is based on transformations of said target word from said free-text data to create a list of candidate spellings of said target word;
(3) binding said set of candidate spellings to said words known to said Bayesian network;
(4) identifying a best candidate spelling from said list of candidate spelling based on probability values produced by said Bayesian network; and
(5) replacing said target word with said best candidate spelling;
(D) parsing said syntax of said received free-text data;
(E) transforming said grammar of said received free-text data;
(F) analyzing said semantic relationship within said received free-text data on the basis of a probabilistic model of lexical semantics which relates said words to one or more concepts;
(G) creating an encoded representation of said received free-text data; and
(H) writing said encoded representation into a medical database.
-
-
2. A method for encoding free-text data, comprising:
-
(A) receiving free-text data, wherein said free text data, wherein said free-text data includes;
words, a grammar, a syntax and a semantic relationship between said words;
(B) checking for synonyms of said words within said received free-text data;
(C) checking spelling of said words within said received free-text data, wherein said checking spelling further comprises;
(1) identifying a target word within said free-text data for spell checking;
(2) identifying a set of words known to a Bayesian network, wherein said identified set of words are those which said target word could be a misspelling, wherein said identification is based on transformations of said target word from said free-text data to create a list of candidate spellings of said target word;
(3) binding said set of candidate spellings to said words known to said Bayesian network;
(4) identifying a best candidate spelling from said list of candidate spelling based on probability values produced by said Bayesian network, wherein said identifying a best candidate spelling employs a probabilistic analysis of said candidate spellings; and
(5) replacing said target word with said best candidate spelling;
(D) parsing said syntax of said received free-text data;
(E) transforming said grammar of said received free-text data;
(F) analyzing said semantic relationship within said received free-text data on the basis of a probabilistic model of lexical semantics which relates said words to one or more concepts;
(G) creating an encoded representation of said received free-text data; and
(H) writing said encoded representation into a medical database.
-
Specification