Method and apparatus for building sales tools by mining data from websites
First Claim
1. A method for characterizing a plurality of extensible markup language documents, the method comprising:
- extracting a plurality of blocks of information from the plurality of extensible markup language documents;
assigning a block of information in the plurality of blocks of information to a task complexity category in a plurality of categories, the block of information comprising a value indicative of a number of extensible markup language documents in the plurality of extensible markup language documents and a value indicative of a number of links associated with the plurality of extensible markup language documents;
associating a task complexity with the plurality of extensible markup language documents based on the block of information and a structural hierarchy of the plurality of extensible markup language documents; and
characterizing the plurality of extensible markup language documents based on the task complexity.
2 Assignments
0 Petitions
Accused Products
Abstract
A website mining tool is disclosed that extracts information from, for example, a company'"'"'s website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information.
-
Citations
20 Claims
-
1. A method for characterizing a plurality of extensible markup language documents, the method comprising:
-
extracting a plurality of blocks of information from the plurality of extensible markup language documents; assigning a block of information in the plurality of blocks of information to a task complexity category in a plurality of categories, the block of information comprising a value indicative of a number of extensible markup language documents in the plurality of extensible markup language documents and a value indicative of a number of links associated with the plurality of extensible markup language documents; associating a task complexity with the plurality of extensible markup language documents based on the block of information and a structural hierarchy of the plurality of extensible markup language documents; and characterizing the plurality of extensible markup language documents based on the task complexity. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for characterizing a plurality of extensible markup language documents, the apparatus comprising:
-
a processor; and a memory communicatively coupled to the processor, the memory to store computer program instructions, the computer program instructions when executed on the processor cause the processor to perform operations comprising; extracting a plurality of blocks of information from the plurality of extensible markup language documents; assigning a block of information in the plurality of blocks of information to a task complexity category in a plurality of categories, the block of information comprising a value indicative of a number of extensible markup language documents in the plurality of extensible markup language documents and a value indicative of a number of links associated with the plurality of extensible markup language documents; associating a task complexity with the plurality of extensible markup language documents based on the block of information and a structural hierarchy of the plurality of extensible markup language documents; and characterizing the plurality of extensible markup language documents based on the task complexity. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer readable storage device storing computer program instructions for characterizing a plurality of extensible markup language documents, the computer program instructions when executed on a processor, cause the processor to perform operations comprising:
-
extracting a plurality of blocks of information from the plurality of extensible markup language documents; assigning a block of information in the plurality of blocks of information to a task complexity category in a plurality of categories, the block of information comprising a value indicative of a number of extensible markup language documents in the plurality of extensible markup language documents and a value indicative of a number of links associated with the plurality of extensible markup language documents; associating a task complexity with the plurality of extensible markup language documents based on the block of information and a structural hierarchy of the plurality of extensible markup language documents; and characterizing the plurality of extensible markup language documents based on the task complexity. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification