Method and system for processing structured data and unstructured data
First Claim
Patent Images
1. A computer implemented method for processing data, the method comprising:
- (a) executing, on a data processing, programming for providing a plurality of data processing modules, said data processing modules comprising;
an e-mail capture and parsing engine for intercepting, copying, and processing e-mails transmitted from a gateway to an email server, said processing comprising dividing a copy of an email into sections including at least a header section and a body section, and dividing the header section into sections comprising one or more of sender email address, sender name, recipient email address, recipient name, summary of email contents, date the email was sent, or time the email was sent;
an email reload engine for loading data associated with archived or stored emails;
a web crawler engine for Internet crawling and capturing Internet web pages;
a document gathering engine for capturing data from external sources comprising one or more of an application data warehouse, application server, or file system;
one or more data staging areas for temporary storage of data associated with said email capture and parsing engine, said web crawler engine, and said document gathering engine;
a text extraction and parsing engine for receiving data from said data staging areas, extracting structured data from unstructured data, and correlating extracted structured data and associated unstructured data to define a link between said structured data and said associated unstructured data;
a data holding area for temporary storage of said structured data and said unstructured data from said text extraction and parsing engine;
a data loading engine for loading and storing into a database management system said structured data and said unstructured data from said data holding area based on said link;
an email account management engine for bypassing said text extraction and parsing engine and directly copying structured data from said email server to said database management system;
(b) using said plurality of data processing modules, carrying out operations comprising;
capturing unstructured data from an unstructured data source and structured data from a structured data source, the unstructured data being associated with the structured data, wherein the unstructured data source and the structured data source are each associated with at least an email, the email including a header;
parsing the header into at least a sending email address, a receiving email address, a date and time of transmission associated with the email, and a carbon copy email address;
evaluating the email using the email capture and parsing engine, wherein the email capture and parsing engine generates a summary of content associated with the email, the sending email address, the receiving email address, the date and the time of transmission associated with the email, the carbon copy email address, and a summary used to classify the email;
correlating the unstructured data and the structured data to establish a link between the unstructured data and the structured data, wherein the link integrates the unstructured data and the structured data; and
storing the unstructured data and the structured data in a first data structure, wherein the unstructured data is stored in an unstructured portion of the first data structure, wherein the structured data is stored in a structured portion of the first data structure, wherein the link is preserved in the first data structure, and wherein the storing of the unstructured data and the structured data enables access of the unstructured data and the structured data from the first data structure.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for processing data is provided. In this method, unstructured data and structured data are captured and the unstructured data is associated with the structured data. After capture, the unstructured data and the structured data are correlated to define a link between the unstructured data and the structured data. The unstructured data and the structured data then are stored in a data structure based on the link. A system for processing data also is described.
82 Citations
20 Claims
-
1. A computer implemented method for processing data, the method comprising:
-
(a) executing, on a data processing, programming for providing a plurality of data processing modules, said data processing modules comprising; an e-mail capture and parsing engine for intercepting, copying, and processing e-mails transmitted from a gateway to an email server, said processing comprising dividing a copy of an email into sections including at least a header section and a body section, and dividing the header section into sections comprising one or more of sender email address, sender name, recipient email address, recipient name, summary of email contents, date the email was sent, or time the email was sent; an email reload engine for loading data associated with archived or stored emails; a web crawler engine for Internet crawling and capturing Internet web pages; a document gathering engine for capturing data from external sources comprising one or more of an application data warehouse, application server, or file system; one or more data staging areas for temporary storage of data associated with said email capture and parsing engine, said web crawler engine, and said document gathering engine; a text extraction and parsing engine for receiving data from said data staging areas, extracting structured data from unstructured data, and correlating extracted structured data and associated unstructured data to define a link between said structured data and said associated unstructured data; a data holding area for temporary storage of said structured data and said unstructured data from said text extraction and parsing engine; a data loading engine for loading and storing into a database management system said structured data and said unstructured data from said data holding area based on said link; an email account management engine for bypassing said text extraction and parsing engine and directly copying structured data from said email server to said database management system; (b) using said plurality of data processing modules, carrying out operations comprising; capturing unstructured data from an unstructured data source and structured data from a structured data source, the unstructured data being associated with the structured data, wherein the unstructured data source and the structured data source are each associated with at least an email, the email including a header; parsing the header into at least a sending email address, a receiving email address, a date and time of transmission associated with the email, and a carbon copy email address; evaluating the email using the email capture and parsing engine, wherein the email capture and parsing engine generates a summary of content associated with the email, the sending email address, the receiving email address, the date and the time of transmission associated with the email, the carbon copy email address, and a summary used to classify the email; correlating the unstructured data and the structured data to establish a link between the unstructured data and the structured data, wherein the link integrates the unstructured data and the structured data; and storing the unstructured data and the structured data in a first data structure, wherein the unstructured data is stored in an unstructured portion of the first data structure, wherein the structured data is stored in a structured portion of the first data structure, wherein the link is preserved in the first data structure, and wherein the storing of the unstructured data and the structured data enables access of the unstructured data and the structured data from the first data structure. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer readable storage medium having computer instructions executable on a computer for carrying out operations, comprising:
-
(a) providing a plurality of data processing modules, said data processing modules comprising; an e-mail capture and parsing engine for intercepting, copying, and processing e-mails transmitted from a gateway to an email server, said processing comprising dividing a copy of an email into sections including at least a header section and a body section, and dividing the header section into sections comprising one or more of sender email address, sender name, recipient email address, recipient name, summary of email contents, date the email was sent, or time the email was sent; an email reload engine for loading data associated with archived or stored emails; a web crawler engine for Internet crawling and capturing Internet web pages; a document gathering engine for capturing data from external sources comprising one or more of an application data warehouse, application server, or file system; one or more data staging areas for temporary storage of data associated with said email capture and parsing engine, said web crawler engine, and said document gathering engine; a text extraction and parsing engine configured for receiving data from said data staging areas, extracting structured data from unstructured data, and correlating extracted structured data and associated unstructured data to define a link between said structured data and said associated unstructured data; a data holding area for temporary storage of said structured data and said unstructured data from said text extraction and parsing engine; a data loading engine for loading and storing into a database management system said structured data and said unstructured data from said data holding area based on said link; an email account management engine configured for bypassing said text extraction and parsing engine and directly copying structured data from said email server to said database management system; and (b) using said plurality of data processing modules, carrying out operations comprising; capturing unstructured data from an unstructured data source and a first structured data from a structured data source, a first portion of the unstructured data being associated with the first structured data, wherein the unstructured data source and the structured data source are each associated with at least an email, the email including a header; parsing the header into at least a sending email address, a receiving email address, a date and time of transmission associated with the email, and a carbon copy email address; evaluating the email using the email capture and parsing engine, wherein the email capture and parsing engine generates a summary of content associated with the email, the sending email address, the receiving email address, the date and the time of transmission associated with the email, the carbon copy email address, and a summary used to classify the email; extracting a second structured data from the first portion of the unstructured data; correlating the first and second structured data to establish a first link between the first and second structured data, wherein the first link integrates the first and second structured data; and storing the first and second structured data in a structured portion of a data structure, wherein the first link is preserved in the data structure. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer implemented system for processing data associated with an e-mail, the system comprising:
-
(a) a data processing unit; (b) programming, executable by said data processing unit, for providing a plurality of data processing modules, said data processing modules comprising; an e-mail capture and parsing engine for intercepting, copying, and processing e-mails transmitted from a gateway to an email server, said processing comprising dividing a copy of an email into sections including at least a header section and a body section, and dividing the header section into sections comprising one or more of sender email address, sender name, recipient email address, recipient name, summary of email contents, date the email was sent, or time the email was sent; an email reload engine for loading data associated with archived or stored emails; a web crawler engine for Internet crawling and capturing Internet web pages; a document gathering engine for capturing data from external sources comprising one or more of an application data warehouse, application server, or file system; one or more data staging areas configured for temporary storage of data associated with said email capture and parsing engine, said web crawler engine, and said document gathering engine; a text extraction and parsing engine for receiving data from the data staging areas, extracting structured data from unstructured data, and correlating extracted structured data and associated unstructured data and a to define a link between said structured data and said associated unstructured data; a data holding area configured for temporary storage of said structured and said unstructured data from said text extraction and parsing engine; a data loading engine for loading and storing into a database management system said structured data and said unstructured data from said data holding area based on said link; an email account management engine for bypassing said text extraction and parsing engine and directly copying structured data from said email server to said database management system; (c) wherein said plurality of data processing modules are configured for carrying out operations comprising; receiving unstructured data and structured data from a plurality of sources comprising an unstructured data source and a structured data source, the unstructured data being associated with the structured data, wherein the unstructured data source and the structured data source are each associated with at least an email, the email including a parsing the header into at least a sending email address, a receiving email address, a date and time of transmission associated with the email, and a carbon copy email address; evaluating the email using an email capture and parsing engine, wherein the email capture and parsing engine is configured to generate a summary of content associated with the email, the sending email address, the receiving email address, the date and the time of transmission associated with the email, the carbon copy email address, and a summary used to classify the email; correlating the unstructured data and the structured data to establish a link between the unstructured data and the structured data, wherein the link integrates the unstructured data and the structured data; and storing the unstructured data and the structured data in a data structure based on the link, wherein the unstructured data is stored in an unstructured portion of the data structure and wherein the structured data is stored in a structured portion of the data structure. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification