System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations
First Claim
1. A data processing system for processing document data, comprising:
- data storage for storing a collection of document data that comprises unstructured document data; and
at least one text analysis engine that comprises a plurality of coupled annotators at least some of which are operable for tokenizing document data for identifying and annotating a particular type of semantic content;
where said at least one text analysis engine operates to generate a plurality of views of a document, each of said plurality of views being derived from a different tokenization of the document.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique. Also disclosed is system, method and computer program product to process document data. The method includes inputting a document and operating at least one text analysis engine that comprises a plurality of coupled annotators for tokenizing document data for identifying and annotating a particular type of semantic content. Operating the at least one text analysis engine generates a plurality of views of a document, where each of the plurality of views are derived from a different tokenization of the document. The method further includes storing the plurality of views in a common data structure associated with the document.
-
Citations
27 Claims
-
1. A data processing system for processing document data, comprising:
-
data storage for storing a collection of document data that comprises unstructured document data; and at least one text analysis engine that comprises a plurality of coupled annotators at least some of which are operable for tokenizing document data for identifying and annotating a particular type of semantic content; where said at least one text analysis engine operates to generate a plurality of views of a document, each of said plurality of views being derived from a different tokenization of the document. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method to process document data, comprising:
-
inputting a document; and operating at least one text analysis engine that comprises a plurality of coupled annotators for tokenizing document data for identifying and annotating a particular type of semantic content; where operating said at least one text analysis engine generates a plurality of views of a document, each of said plurality of views being derived from a different tokenization of the document;
further comprisingstoring said plurality of views in a common data structure associated with the document. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product embodied on a computer-readable medium and comprising program code for directing at least one computer to process document data, comprising:
-
a program code segment for inputting a document; and a program code segment for implementing at least one text analysis engine that comprises a plurality of coupled annotators for tokenizing document data for identifying and annotating a particular type of semantic content; where operation of said at least one text analysis engine generates a plurality of views of a document, each of said plurality of views being derived from a different tokenization of the document;
further comprisinga program code segment for storing said plurality of views in a common abstract data structure associated with the document. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
Specification