×

Method for preserving conceptual distance within unstructured documents

  • US 9,424,299 B2
  • Filed: 03/09/2015
  • Issued: 08/23/2016
  • Est. Priority Date: 10/07/2014
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for characterizing content of documents by conceptual relationships, comprising:

  • applying natural language processing (NLP) to content in a plurality of documents to identify topics and subjects;

    applying analytic analysis to the topics and subjects to identify a conceptual relationships of the content in the plurality of documents;

    partitioning the content in each of the plurality of documents into a first structured hierarchy, preserving at least one structure in each document inherent in the each document; and

    providing access to content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy; and

    whereinthe content is characterized by optimizing a vector space model representation of the documents, the optimization performed by a system capable of answering questions, where;

    the content from the plurality of documents is ingested by the system;

    natural language processing is applied to the content in the plurality of documents to identify terms, topics, subjects and concepts;

    the content is partitioned according to a semantic parse distance to identify a context for partitioned content;

    the content and context is represented, by the system, utilizing a vector space model;

    entries in the vector space model are eliminated based on a difference criteria; and

    an iterative genetic algorithm is applied to optimize features of the vector space model.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×