Method and apparatus for performing data collection, interpretation and analysis, in an information platform
First Claim
1. An information platform for performing data collection, interpretation and analysis, comprising:
- a data retrieval module comprising;
a catalog including a data store for collecting internal and external information from relevant sources;
a geometry recognition module for analyzing multiple sources and recognizing particular patterns within each source; and
a page analyzer for scanning a source document, breaking said source document into blocks and sub-blocks of information, and returning granular pieces for aggregation in said data store;
a data classification and storage module;
an information browsing, query, analysis, and report creation module said information browsing, query, analysis, and report creation module comprising;
a classification subsystem for classifying data according to a specific language, wherein said classification allows said data to be archived and tracked in an object store, and wherein said classification allows said object store to manage complex relationships between a plurality of items whereby, once classified, an item is associated to several other data types by any of one or more characteristics; and
a desktop integration module.
2 Assignments
0 Petitions
Accused Products
Abstract
An information platform automates the collection of data, provides a method for organizing the library of information and provides analysis using multiple content-types, thereby providing a user with a market understanding necessary to execute rapid and knowledgeable decision making. The information platform collects and integrates data, observations and intelligence; provides controls for multiple methods of information navigation and analysis; and allows details to be digested in the context of other data, regardless of its type. The information platform is a client/server implementation that is subdivided into four major sections, including: (1) Data Retrieval, which provides a sophisticated catalog for finding internal and external information and collection agents which retrieve specific information without user intervention; (2) Data Classification and Storage which handles the storage of the information once it has been gathered from a source; (3) Information Browsing, Query, Analysis, and Report Creation which provides information browsing, reporting, and analysis tools; and (4) Desktop Integration where the information platform takes information from a wide variety of formats (HTML, text, spreadsheet) and combines them all into a single format (HTML, text, spreadsheet).
956 Citations
11 Claims
-
1. An information platform for performing data collection, interpretation and analysis, comprising:
-
a data retrieval module comprising; a catalog including a data store for collecting internal and external information from relevant sources; a geometry recognition module for analyzing multiple sources and recognizing particular patterns within each source; and a page analyzer for scanning a source document, breaking said source document into blocks and sub-blocks of information, and returning granular pieces for aggregation in said data store; a data classification and storage module; an information browsing, query, analysis, and report creation module said information browsing, query, analysis, and report creation module comprising; a classification subsystem for classifying data according to a specific language, wherein said classification allows said data to be archived and tracked in an object store, and wherein said classification allows said object store to manage complex relationships between a plurality of items whereby, once classified, an item is associated to several other data types by any of one or more characteristics; and a desktop integration module. - View Dependent Claims (2, 3)
-
-
4. An information platform for performing data collection, interpretation and analysis, comprising:
-
a data retrieval module comprising; a catalog including a data store for collecting internal and external information from relevant sources; and a parsing engine for interpreting the format of a stream of information and then returning requested elements to a user by reading a source document and determining said source document page geometry, wherein said parsing engine locates an element by finding a specific string or pattern within a source document, where regular expressions are character strings in which plain text indicates that that text must exist in a target string, and special characters are used to indicate what variability is allowed in said target strings, and wherein said parsing engine performs any of; a simple content match which looks for a sub-string on regular expression in said source document and returns a primary document element containing the match; a bounded content match in which the search scope is limited to one contiguous part of said source document; and a simple/bounded content match in element type which is a bounded or document wide search that look for a sub-string within a specific element type; and a data classification and storage module; and an information browsing, query, analysis, and report creation module. - View Dependent Claims (5, 6, 7, 8, 9)
-
-
10. A method for performing data collection, interpretation and analysis, in an information platform, said method comprising the steps of:
-
finding tags starting and ending points in a source document; identifying tags that have structural information; providing a text parser for a text segment that has raw text or text with tags that have no structural information; dividing said segment into paragraphs with said text parser; parsing each paragraph using a paragraph parser, wherein said paragraph parser studies the lines of a paragraph using a line parser and decides if said paragraph is a regular text paragraph, header, page number, or table, wherein said line parser divides a line into phrases and calls a phrase parser to get information about each phrase, wherein said paragraph parser then uses line and phrase calculators to identify the structure of said paragraph;
wherein if said paragraph is a table, said paragraph parser creates a table;
wherein said paragraph parser generates virtual tags and returns a list of such tags to said text parser;combining all of said virtual tags from all of said parsed paragraphs with said text parser and then passing said virtual tags back to an HTML parser; creating one container for all the virtual and non-virtual tags with said HTML parser and enumerating all the tags properly in said source document; and creating a list of high level blocks from said virtual and non-virtual tags; wherein a user can retrieve a generated, fully structured document of said source document; and
wherein said user can retrieve information about any block in said source document using regular expressions.
-
-
11. An apparatus for performing data collection, interpretation and analysis, in an information platform, comprising:
-
a parse engine for finding tags starting and ending points in a source document and for identifying tags that have structural information; a text parser for a text segment that has raw text or text with tags that have no structural information;
said text parser dividing said segment into paragraphs; anda paragraph parser for parsing each paragraph; wherein said paragraph parser studies the lines of a paragraph using a line parser and decides if said paragraph is a regular text paragraph, header, page number, or table, wherein said line parser divides a line into phrases; wherein said paragraph parser calls a phrase parser to get information about each phrase; wherein said paragraph parser then uses line and phrase calculators to identify the structure of said paragraph; wherein if said paragraph is a table, said paragraph parser creates a table; wherein said paragraph parser generates virtual tags and returns a list of such tags to said text parser; wherein said text parser combines all of said virtual tags from all of said parsed paragraphs and then passes said virtual tags back to an HTML parser; wherein said HTML parser creates one container for all the virtual and non-virtual tags, enumerates all the tags properly in said source document, and creates a list of high level blocks from said virtual and non-virtual tags; wherein a user can retrieve a generated, fully structured document of said source document; and
wherein said user can retrieve information about any block in said source document using regular expressions.
-
Specification