Method For Preserving Conceptual Distance Within Unstructured Documents
First Claim
1. A computer-implemented method for characterizing content of documents by conceptual relationships, comprising:
- applying natural language processing (NLP) to content in a plurality of documents to identify topics and subjects;
applying analytic analysis to the topics and subjects to identify a conceptual relationships of the content in the plurality of documents;
partitioning the content in each of the plurality of documents into a first structured hierarchy, preserving at least one structure in each document inherent in the each document; and
providing access to content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and computer-usable medium are disclosed for preserving conceptual distance within unstructured documents by characterizing conceptual relationships. Natural language processing is applied to content in a plurality of documents to identify topics and subjects. Analytic analysis is then applied to the identified topics and subjects to identify concepts. The content in each of the plurality of documents is partitioned into a first structured hierarchy, preserving at least one structure in each document inherent in the each document. Access is then provided to the content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy. The conceptual relationship criteria are based upon a directed graph with weights based upon a similarity and a distance based upon concepts.
-
Citations
7 Claims
-
1. A computer-implemented method for characterizing content of documents by conceptual relationships, comprising:
-
applying natural language processing (NLP) to content in a plurality of documents to identify topics and subjects; applying analytic analysis to the topics and subjects to identify a conceptual relationships of the content in the plurality of documents; partitioning the content in each of the plurality of documents into a first structured hierarchy, preserving at least one structure in each document inherent in the each document; and providing access to content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7-20. -20. (canceled)
Specification