System and method for language extraction and encoding
First Claim
Patent Images
1. A method for extracting information from medical or natural-language input text, comprising:
- receiving, by a computing system, medical or natural-language input text, wherein one or more words or portions of said medical or natural-language input text includes an identification tag;
selecting, by computing system, the input text using the identification tag to determine a relevant text input and an irrelevant text input, wherein the identification tag includes a string value and/or a nested structure value, wherein the identification tag is configured to be customized and recognized by a processor;
utilizing, by the computing system, a lexicon knowledge base to identify and categorize multi-word and single word phrases within sentences of the relevant text input, wherein said lexicon knowledge base is configured to be dynamically customized by a user;
receiving, from a user, filenames having new lexical entries, and modifying the lexicon knowledge based on the filenames;
disambiguating, by the computing system, one or more ambiguous words in the relevant text input using a contextual disambiguation rule, wherein the contextual disambiguation rule is configured to analyze words following or preceding each ambiguous word, words in the same sentence, words in a certain section, and/or words in a certain domain, and wherein the contextual disambiguation rule is configured to be dynamically loaded without compiling the entire computing system;
parsing, by the computing system, said relevant text input to determine a grammatical structure of the relevant text input, said parsing step comprising the step of referring to a domain parameter having a value indicative of a domain from which the text data originated, the domain parameter corresponding to one or more rules of grammar within a knowledge base related to the domain to be applied for parsing the relevant text input;
regularizing, by the computing system, the parsed text data to form a canonical output form;
converting, by the computing system, the canonical output form into controlled vocabulary terms using a table of codes, wherein the table of codes is configured to be dynamically customized without compiling the entire computing system;
tagging, by the computing system, the input text with a structured data component derived from the controlled vocabulary terms; and
outputting the tagged text data to be stored in a database.
2 Assignments
0 Petitions
Accused Products
Abstract
Improved systems and methods for extracting information from medical and natural-language text data.
22 Citations
9 Claims
-
1. A method for extracting information from medical or natural-language input text, comprising:
-
receiving, by a computing system, medical or natural-language input text, wherein one or more words or portions of said medical or natural-language input text includes an identification tag; selecting, by computing system, the input text using the identification tag to determine a relevant text input and an irrelevant text input, wherein the identification tag includes a string value and/or a nested structure value, wherein the identification tag is configured to be customized and recognized by a processor; utilizing, by the computing system, a lexicon knowledge base to identify and categorize multi-word and single word phrases within sentences of the relevant text input, wherein said lexicon knowledge base is configured to be dynamically customized by a user; receiving, from a user, filenames having new lexical entries, and modifying the lexicon knowledge based on the filenames; disambiguating, by the computing system, one or more ambiguous words in the relevant text input using a contextual disambiguation rule, wherein the contextual disambiguation rule is configured to analyze words following or preceding each ambiguous word, words in the same sentence, words in a certain section, and/or words in a certain domain, and wherein the contextual disambiguation rule is configured to be dynamically loaded without compiling the entire computing system; parsing, by the computing system, said relevant text input to determine a grammatical structure of the relevant text input, said parsing step comprising the step of referring to a domain parameter having a value indicative of a domain from which the text data originated, the domain parameter corresponding to one or more rules of grammar within a knowledge base related to the domain to be applied for parsing the relevant text input; regularizing, by the computing system, the parsed text data to form a canonical output form; converting, by the computing system, the canonical output form into controlled vocabulary terms using a table of codes, wherein the table of codes is configured to be dynamically customized without compiling the entire computing system; tagging, by the computing system, the input text with a structured data component derived from the controlled vocabulary terms; and outputting the tagged text data to be stored in a database. - View Dependent Claims (2, 3, 4)
-
-
5. A system for extracting information from medical or natural-language input text, comprising:
-
a lexicon knowledge base to identify and categorize multi-word and single word phrases within sentences of a relevant input text, wherein said lexicon knowledge base is configured to be dynamically customized by a user by receiving, from the user, filenames having new lexical entries, and the lexicon knowledge is modified based on the filenames; a processor, coupled to said lexicon knowledge base and receiving said medical or natural-language input text, tagging one or more words or portions of said medical or natural-language input text with an identification tag, and selecting the input test using the identification tag to determine the relevant text input and an irrelevant text input, wherein the identification tag is configured to be customized by a user; a boundary identifier, coupled to said processor and said lexicon knowledge base and receiving said medical or natural-language input text and dropping the irrelevant text input; a parser, coupled to said boundary identifier and receiving said relevant input text to determine the grammatical structure of the relevant text input and generating a parsed text wherein one or more ambiguous words in the parsed data are disambiguated using a contextual disambiguation rule, wherein the disambiguation rule is configured to be dynamically loaded without compiling the entire computing system; a phrase regulator, coupled to said parser and replacing the parsed text with a canonical output form; and an encoder, coupled to said phrase regulator and receiving the canonical output form, converting the canonical output form into a controlled vocabulary term using a table of code, tagging the input text with a structured data component derived from controlled vocabulary terms, and outputting the tagged text data to be stored in a database, wherein the table of codes is configured to be dynamically customized without compiling the entire computing system. - View Dependent Claims (6, 7, 8, 9)
-
Specification