Preserving Conceptual Distance Within Unstructured Documents
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system and computer-usable medium are disclosed for preserving conceptual distance within unstructured documents by characterizing conceptual relationships. Natural language processing is applied to content in a plurality of documents to identify topics and subjects. Analytic analysis is then applied to the identified topics and subjects to identify concepts. The content in each of the plurality of documents is partitioned into a first structured hierarchy, preserving at least one structure in each document inherent in the each document. Access is then provided to the content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy. The conceptual relationship criteria are based upon a directed graph with weights based upon a similarity and a distance based upon concepts.
38 Citations
20 Claims
-
1-6. -6. (canceled)
-
7. A system comprising:
-
a processor, a data bus coupled to the processor; and a computer-usable medium embodying computer program code, the computer-usable medium being coupled to the data bus, the computer program code used for characterizing content of documents by conceptual relationships and comprising instructions executable by the processor and configured for; applying natural language processing (NLP) to content in a plurality of documents to identify topics and subjects; applying analytic analysis to the topics and subjects to identify a conceptual relationship of the content in the plurality of documents; partitioning the content in each of the plurality of documents into a first structured hierarchy, preserving at least one structure in each document inherent in the each document; and providing access to content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory, computer-readable storage medium embodying computer program code, the computer program code comprising computer executable instructions configured for:
-
applying natural language processing (NLP) to content in a plurality of documents to identify topics and subjects; applying analytic analysis to the topics and subjects to identify a conceptual relationship of the content in the plurality of documents; partitioning the content in each of the plurality of documents into a first structured hierarchy, preserving at least one structure in each document inherent in the each document; and providing access to content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification