×

ELECTRONIC DOCUMENT SOURCE INGESTION FOR NATURAL LANGUAGE PROCESSING SYSTEMS

  • US 20140164407A1
  • Filed: 12/10/2012
  • Published: 06/12/2014
  • Est. Priority Date: 12/10/2012
  • Status: Active Grant
First Claim
Patent Images

1. A system, comprising:

  • a computer processor; and

    a memory containing a program that, when executed on the computer processor, performs an operation for processing data, comprising;

    receiving a plurality of electronic documents, wherein the electronic documents are arranged according to different, respective formats;

    identifying a properties file associated with one of the electronic documents, the properties file defining a formatting element of the respective format in the one electronic document and an action corresponding to a text portion associated with the formatting element;

    parsing the one electronic document to identify the formatting element;

    upon identifying the text portion associated with the formatting element, performing the action to the text portion by assigning the text portion to a formatting element of a normalized format; and

    storing the text portion into a NLP object based on the formatting element of the normalized format, wherein text in the NLP object is arranged based on the normalized format.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×