×

ELECTRONIC DOCUMENT SOURCE INGESTION FOR NATURAL LANGUAGE PROCESSING SYSTEMS

  • US 20140164408A1
  • Filed: 12/12/2012
  • Published: 06/12/2014
  • Est. Priority Date: 12/10/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • receiving a plurality of electronic documents, wherein the electronic documents are arranged according to different, respective formats;

    identifying a properties file associated with one of the electronic documents, the properties file defining a formatting element of the respective format in the one electronic document and an action corresponding to a text portion associated with the formatting element;

    parsing the one electronic document to identify the formatting element using one or more processors;

    upon identifying the text portion associated with the identified formatting element, performing the action to the text portion by assigning the text portion to a formatting element of a normalized format; and

    storing the text portion into a natural language processing (NLP) object based on the formatting element of the normalized format, wherein text in the NLP object is arranged based on the normalized format.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×