×

System and method for language extraction and encoding

  • US 10,275,424 B2
  • Filed: 01/28/2014
  • Issued: 04/30/2019
  • Est. Priority Date: 07/29/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method for extracting information from medical or natural-language input text, comprising:

  • receiving, by a computing system, medical or natural-language input text, wherein one or more words or portions of said medical or natural-language input text includes an identification tag;

    selecting, by computing system, the input text using the identification tag to determine a relevant text input and an irrelevant text input, wherein the identification tag includes a string value and/or a nested structure value, wherein the identification tag is configured to be customized and recognized by a processor;

    utilizing, by the computing system, a lexicon knowledge base to identify and categorize multi-word and single word phrases within sentences of the relevant text input, wherein said lexicon knowledge base is configured to be dynamically customized by a user;

    receiving, from a user, filenames having new lexical entries, and modifying the lexicon knowledge based on the filenames;

    disambiguating, by the computing system, one or more ambiguous words in the relevant text input using a contextual disambiguation rule, wherein the contextual disambiguation rule is configured to analyze words following or preceding each ambiguous word, words in the same sentence, words in a certain section, and/or words in a certain domain, and wherein the contextual disambiguation rule is configured to be dynamically loaded without compiling the entire computing system;

    parsing, by the computing system, said relevant text input to determine a grammatical structure of the relevant text input, said parsing step comprising the step of referring to a domain parameter having a value indicative of a domain from which the text data originated, the domain parameter corresponding to one or more rules of grammar within a knowledge base related to the domain to be applied for parsing the relevant text input;

    regularizing, by the computing system, the parsed text data to form a canonical output form;

    converting, by the computing system, the canonical output form into controlled vocabulary terms using a table of codes, wherein the table of codes is configured to be dynamically customized without compiling the entire computing system;

    tagging, by the computing system, the input text with a structured data component derived from the controlled vocabulary terms; and

    outputting the tagged text data to be stored in a database.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×