System, method and computer program product for performing unstructured information management and automatic text analysis
First Claim
Patent Images
1. A data processing system for processing document data, comprising:
- data storage for storing a collection of document data that comprises unstructured document data;
coupled to the data storage, a semantic search engine for retrieving document data from said data storage; and
at least one analysis engine that comprises a plurality of coupled annotators at least some of which are operable for processing document data for tokenizing document data and for identifying and annotating a particular type of semantic content;
where said data processing system comprises an inverted file system for storing said annotations, a list comprising occurrences of respective annotations and, for each listed occurrence of a respective annotation, a set comprised of a plurality of token locations spanned by said respected annotation.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a system architecture, components and a searching technique for an Unstructured Information Management System (UIMS). The UIMS may be provided as middleware for the effective management and interchange of unstructured information over a wide array of information sources. The architecture generally includes a search engine, data storage, analysis engines containing pipelined document annotators and various adapters. The searching technique makes use of a two-level searching technique.
-
Citations
48 Claims
-
1. A data processing system for processing document data, comprising:
-
data storage for storing a collection of document data that comprises unstructured document data;
coupled to the data storage, a semantic search engine for retrieving document data from said data storage; and
at least one analysis engine that comprises a plurality of coupled annotators at least some of which are operable for processing document data for tokenizing document data and for identifying and annotating a particular type of semantic content;
wheresaid data processing system comprises an inverted file system for storing said annotations, a list comprising occurrences of respective annotations and, for each listed occurrence of a respective annotation, a set comprised of a plurality of token locations spanned by said respected annotation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A data processing system for processing document data, comprising:
-
at least one application data storage interface for coupling to at least one database comprised of unstructured document data, said data storage interface for receiving at least database specification parameters, data source specification parameters and query command specification parameters; and
at least one application text analysis engine interface for coupling to at least one text analysis engine that comprises a plurality of coupled annotators, at least some of which are operable for processing document data for identifying and annotating a particular type of semantic content, said text analysis interface for receiving at least text analysis engine flow parameters, document specification parameters and annotator specification parameters and producing analysis results;
wherean application is interoperable with said data storage and text analysis interfaces for specifying how to populate said at least one database, for specifying document selection and processing parameters for processing specified document data and analysis results, and for specifying at least one user interface, where at least one of the parameters sent through said application text analysis engine interface specifies a common abstract data format for specifying the operation of said at least one text analysis engine. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A modular text intelligence system, comprising:
-
at least one document store interface coupled to at least one document store, the document store interface receiving at least one database specification and at least one data source and providing at least one database query command;
at least one analysis engine interface coupled to at least one text analysis engine, the analysis engine interface receiving at least one document set specification of at least one document set and providing text analysis engine analysis results;
an application interface for coupling to an application through which the application specifies;
how to populate said at least one document store;
an application logic for selecting at least one document set;
processing of said selected document set by said at least one text analysis engine;
processing of said analysis results; and
at least one user interface, where the application specification occurs by setting at least one parameter, said at least one parameter comprising a specification of a common abstract data format for use by said at least one text analysis engine. - View Dependent Claims (24, 25)
-
-
26. A computer program product embodied on a computer-readable medium and comprising program code for directing operation of a text intelligence system in cooperation with at least one application, comprising:
-
a program code segment for managing a collection of document data that comprises unstructured document data;
a program code segment for implementing a semantic search engine;
a program code segment for implementing at least one analysis engine comprising a plurality of annotators at least some of which are operable for processing document data for tokenizing document data and for identifying and annotating a particular type of semantic content; and
a program code segment for creating and managing an inverted file system for storing, for each processed document, annotations, a list comprising occurrences of respective annotations and, for each listed occurrence of a respective annotation, a set comprised of token locations spanned by said respected annotation. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A method to process document data, comprising:
-
providing at least one application data storage interface for coupling to at least one database comprised of unstructured document data, and receiving at least database specification parameters, data source specification parameters and query command specification parameters through said data storage interface; and
providing at least one application text analysis engine interface for coupling to at least one text analysis engine that comprises a plurality of coupled annotators, at least some of which are operable for processing document data for identifying and annotating a particular type of semantic content, and receiving at least text analysis engine flow parameters, document specification parameters and annotator specification parameters and producing analysis results through said text analysis interface;
wherean application is interoperable with said data storage and text analysis interfaces for specifying how to populate said at least one database, for specifying document selection and processing parameters for processing specified document data and analysis results, and for specifying at least one user interface, where at least one of the parameters sent through said application text analysis engine interface specifies a common abstract data format for specifying the operation of said at least one text analysis engine. - View Dependent Claims (43, 44, 45, 46, 47, 48)
-
Specification