Enterprise web mining system and method
First Claim
1. A computer-implemented method of enterprise web mining comprising the steps of:
- collecting data from a plurality of data sources, including proprietary corporate data comprising proprietary account or user-based data, external data comprising data acquired from sources external to the system, Web data comprising Web traffic data, web server application program interface data and Web server log data, and Web transaction data comprising data relating to transactions completed over the Web;
selecting data that is relevant to a desired output from among the collected data by mapping between general attributes and particular features, the selected data having reduced dimensionality relative to the collected data;
pre-processing the selected data by removing redundant or irrelevant information from Web server log data, by identifying a visitor to a web site from the Web traffic data, reconstructing a session from the Web traffic data, by reconstructing a path followed by a visitor in a session from the Web server log data, by analyzing a path a whole Website from the Web server log data, by converting to filenames from the Web server log data to page titles, and by converting IP addresses from the Web traffic data to domain names;
building a plurality of database tables from the pre-processed selected data, wherein the acquired data comprises a plurality of different types of data;
integrating the collected data by forming an integrated database comprising collected data in a coherent format using generated taxonomies to group attributes of the data and using generated profiles of the data;
generating a plurality of data mining models using the collected data; and
generating a prediction or recommendation using at least one of the plurality of generated data mining models, in response to a received request for a recommendation or prediction.
2 Assignments
0 Petitions
Accused Products
Abstract
An enterprise-wide web data mining system, computer program product, and method of operation thereof, that uses Internet based data sources, and which operates in an automated and cost effective manner. The enterprise web mining system comprises: a database coupled to a plurality of data sources, the database operable to store data collected from the data sources; a data mining engine coupled to the web server and the database, the data mining engine operable to generate a plurality of data mining models using the collected data; a server coupled to a network, the server operable to: receive a request for a prediction or recommendation over the network, generate a prediction or recommendation using the data mining models, and transmit the generated prediction or recommendation.
342 Citations
26 Claims
-
1. A computer-implemented method of enterprise web mining comprising the steps of:
-
collecting data from a plurality of data sources, including proprietary corporate data comprising proprietary account or user-based data, external data comprising data acquired from sources external to the system, Web data comprising Web traffic data, web server application program interface data and Web server log data, and Web transaction data comprising data relating to transactions completed over the Web;
selecting data that is relevant to a desired output from among the collected data by mapping between general attributes and particular features, the selected data having reduced dimensionality relative to the collected data;
pre-processing the selected data by removing redundant or irrelevant information from Web server log data, by identifying a visitor to a web site from the Web traffic data, reconstructing a session from the Web traffic data, by reconstructing a path followed by a visitor in a session from the Web server log data, by analyzing a path a whole Website from the Web server log data, by converting to filenames from the Web server log data to page titles, and by converting IP addresses from the Web traffic data to domain names;
building a plurality of database tables from the pre-processed selected data, wherein the acquired data comprises a plurality of different types of data;
integrating the collected data by forming an integrated database comprising collected data in a coherent format using generated taxonomies to group attributes of the data and using generated profiles of the data;
generating a plurality of data mining models using the collected data; and
generating a prediction or recommendation using at least one of the plurality of generated data mining models, in response to a received request for a recommendation or prediction. - View Dependent Claims (2, 3, 4, 5)
selecting an algorithm to be used to generate a model;
generating at least one model using the selected algorithm and data included in the integrated database; and
deploying the at least one model.
-
-
3. The method of claim 2, wherein the step of deploying the at least one model comprises the step of:
generating program code implementing the model.
-
4. The method of claim 3, wherein the step of generating an online prediction or recommendation comprises the steps of:
-
receiving a request for a prediction or recommendation;
scoring a model using data included in the integrated database;
generating a predication or recommendation based on the generated score; and
transmitting the predication or recommendation.
-
-
5. The method of claim 4, wherein the step of pre-processing the selected data further comprises the step of:
collecting pre-defined items of data passed by a web server.
-
6. A computer program product for performing an enterprise web mining process in an electronic data processing system, comprising:
-
a computer readable medium;
computer program instructions, recorded on the computer readable medium, executable by a processor, for performing the steps of;
collecting data from a plurality of data sources, including proprietary corporate data comprising proprietary account or user-based data, external data comprising data acquired from sources external to the system, Web data comprising Web traffic data, web server application program interface data and Web server log data, and Web transaction data comprising data relating to transactions completed over the Web;
selecting data that is relevant to a desired output from among the collected data by mapping between general attributes and particular features, the selected data having reduced dimensionality relative to the collected data;
pre-processing the selected data by removing redundant or irrelevant information from Web server log data, by identifying a visitor to a web site from the Web traffic data, reconstructing a session from the Web traffic data, by reconstructing a path followed by a visitor in a session from the Web server log data, by analyzing a path a whole Website from the Web server log data, by converting to filenames from the Web server log data to page titles, and by converting IP addresses from the Web traffic data to domain names;
building a plurality of database tables from the pre-processed selected data, wherein the acquired data comprises a plurality of different types of data;
integrating the collected data by forming an integrated database comprising collected data in a coherent format using generated taxonomies to group attributes of the data and using generated profiles of the data;
generating a plurality of data mining models using the collected data; and
generating a prediction or recommendation using at least one of the plurality of generated data mining models, in response to a received request for a recommendation or prediction. - View Dependent Claims (7, 8, 9, 10)
selecting an algorithm to be used to generate a model;
generating at least one model using the selected algorithm and data included in the integrated database; and
deploying the at least one model.
-
-
8. The computer program product of claim 7, wherein the step of deploying the at least one model comprises the step of:
generating program code implementing the model.
-
9. The computer program product of claim 8, wherein the step of generating an online prediction or recommendation comprises the steps of:
-
receiving a request for a prediction or recommendation;
scoring a model using data included in the integrated database;
generating a predication or recommendation based on the generated score; and
transmitting the predication or recommendation.
-
-
10. The computer program product of claim 9, wherein the step of pre-processing the selected data further comprises the step of:
collecting pre-defined items of data passed by a web server.
-
11. A system for performing an enterprise web mining process, comprising:
-
a processor operable to execute computer program instructions; and
a memory operable to store computer program instructions executable by the processor, for performing the steps of;
collecting data from a plurality of data sources, including proprietary corporate data comprising proprietary account or user-based data, external data comprising data acquired from sources external to the system, Web data comprising Web traffic data, web server application program interface data and Web server log data, and Web transaction data comprising data relating to transactions completed over the Web;
selecting data that is relevant to a desired output from among the collected data by mapping between general attributes and particular features, the selected data having reduced dimensionality relative to the collected data;
pre-processing the selected data by removing redundant or irrelevant information from Web server log data, by identifying a visitor to a web site from the Web traffic data, reconstructing a session from the Web traffic data, by reconstructing a path followed by a visitor in a session from the Web server log data, by analyzing a path a whole Website from the Web server log data, by converting to filenames from the Web server log data to page titles, and by converting IP addresses from the Web traffic data to domain names;
building a plurality of database tables from the pre-processed selected data, wherein the acquired data comprises a plurality of different types of data;
integrating the collected data by forming an integrated database comprising collected data in a coherent format using generated taxonomies to group attributes of the data and using generated profiles of the data;
generating a plurality of data mining models using the collected data; and
generating a prediction or recommendation using at least one of the plurality of generated data mining models, in response to a received request for a recommendation or prediction. - View Dependent Claims (12, 13, 14, 15)
selecting an algorithm to be used to generate a model;
generating at least one model using the selected algorithm and data included in the integrated database; and
deploying the at least one model.
-
-
13. The system of claim 12, wherein the step of deploying the at least one model comprises the step of:
generating program code implementing the model.
-
14. The system of claim 13, wherein the step of generating an online prediction or recommendation comprises the steps of:
-
receiving a request for a prediction or recommendation;
scoring a model using data included in the integrated database;
generating a predication or recommendation based on the generated score; and
transmitting the predication or recommendation.
-
-
15. The system of claim 14, wherein the step of pre-processing the selected data further comprises the step of:
collecting pre-defined items of data passed by a web server.
-
16. An enterprise web mining system comprising:
-
a database system coupled to a plurality of data sources, the database system operable to store data collected from the data sources, the data sources including proprietary corporate data comprising proprietary account or user-based data, external data comprising data acquired from sources external to the system, Web data comprising Web traffic data, web server application program interface data and Web server log data, and Web transaction data comprising data relating to transactions completed over the Web, the database further operable to select data that is relevant to a desired output from among the collected data by mapping between general attributes and particular features, the selected data having reduced dimensionality relative to the collected data, the database further operable to pre-process the selected data by removing redundant or irrelevant information from Web server log data, by identifying a visitor to a web site from the Web traffic data, reconstructing a session from the Web traffic data, by reconstructing a path followed by a visitor in a session from the Web server log data, by analyzing a path a whole Website from the Web server log data, by converting to filenames from the Web server Jog data to page titles, and by converting IP addresses from the Web traffic data to domain names, the database further operable to build a plurality of database tables from the pre-processed selected data, wherein the acquired data comprises a plurality of different types of data, and the database further operable to integrate the collected data by forming an integrated database comprising collected data in a coherent format using generated taxonomies to group attributes of the data and using generated profiles of the data;
a data mining engine coupled to the database, the data mining engine operable to generate a plurality of data mining models using the integrated database;
a server coupled to a network, the server operable to receive a request for a prediction or recommendation over the network, generate a prediction or recommendation using at least one of the data mining models, and transmit the generated prediction or recommendation. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
select an algorithm to be used to generate a model;
generate at least one model using the selected algorithm and data included in the integrated database; and
deploy the at least one model.
-
-
18. The system of claim 17, wherein the deployed model comprises program code implementing the model.
-
19. The system of claim 18, wherein the server is operable to generate a prediction or recommendation by scoring a model using data included in the integrated database and generating a predication or recommendation based on the generated score.
-
20. The system of claim 16, further comprising a data pre-processing engine pre-processing the selected data.
-
21. The system of claim 20, wherein the database comprises:
a plurality of database tables built from the pre-processed selected data.
-
22. The system of claim 21, wherein the plurality of database tables forms an integrated database comprising collected data in a coherent format.
-
23. The system of claim 22, wherein the data mining engine is further operable to:
-
select an algorithm to be used to generate a model;
generate at least one model using the selected algorithm and data included in the integrated database; and
deploy the at least one model.
-
-
24. The system of claim 23, wherein the deployed model comprises program code implementing the model.
-
25. The system of claim 24, wherein the server is operable to generate a prediction or recommendation by scoring a model using data included in the integrated database and generating a predication or recommendation based on the generated score.
-
26. The method of claim 25, wherein the data pre-processing engine pre-processes the selected data by collecting pre-defined items of data passed by a web server.
Specification