Method and Apparatus for Building Sales Tools by Mining Data from Websites
First Claim
1. A method for extracting elements of information from a website comprising a plurality of Web pages, said method comprising:
- identifying a structure associated with said plurality of web pages;
extracting a plurality of blocks of information from said web pages as a function of said structure;
assigning each block of information in said plurality of blocks of information to a category in a plurality of categories; and
identifying a plurality of elements of information in each block of information.
2 Assignments
0 Petitions
Accused Products
Abstract
A website mining tool is disclosed that extracts information from, for example, a company'"'"'s website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information.
-
Citations
30 Claims
-
1. A method for extracting elements of information from a website comprising a plurality of Web pages, said method comprising:
-
identifying a structure associated with said plurality of web pages; extracting a plurality of blocks of information from said web pages as a function of said structure; assigning each block of information in said plurality of blocks of information to a category in a plurality of categories; and identifying a plurality of elements of information in each block of information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A apparatus for extracting elements of information from a website comprising a plurality of Web pages, said apparatus comprising:
-
means for identifying a structure associated with said plurality of web pages; means for extracting a plurality of blocks of information from said web pages as a function of said structure; means for assigning each block of information in said plurality of blocks of information to a category in a plurality of categories; and means for identifying a plurality of elements of information in each block of information. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer readable medium storing computer program instructions which, when executed on a processor, define the steps of:
-
identifying a structure associated with a plurality of web pages on a Web site; extracting a plurality of blocks of information from said web pages as a function of said structure; assigning each block of information in said plurality of blocks of information to a category in a plurality of categories; and identifying a plurality of elements of information in each block of information. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification