Grouping of documents that contain markup language code
First Claim
1. A method of grouping e-mails that contain HTML (HyperText Markup Language) code, the method to be performed by a computer and comprising:
- (a) extracting HTML tags from an e-mail in a plurality of e-mails;
(b) forming a fingerprint of the e-mail by linking together the extracted HTML tags to form a single string;
(c) hashing the fingerprint to form a signature key; and
(d) grouping the e-mail with other e-mails having the same fingerprint.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, a fingerprint is generated for each document (e.g., e-mail, web page) containing markup language (e.g., HTML) code. The fingerprint is indicative of the structure of the markup language code in the document. The fingerprint may be formed by extracting markup language tags from the document and then linking together the extracted tags to form a single string. The fingerprint may be hashed through a hashing function to generate a signature key that may be used to create a directory for the document and other documents having the same fingerprint. The grouping of documents with the same fingerprint facilitates creation of anti-spam rules or identification of web pages from particular websites, for example.
32 Citations
20 Claims
-
1. A method of grouping e-mails that contain HTML (HyperText Markup Language) code, the method to be performed by a computer and comprising:
-
(a) extracting HTML tags from an e-mail in a plurality of e-mails; (b) forming a fingerprint of the e-mail by linking together the extracted HTML tags to form a single string; (c) hashing the fingerprint to form a signature key; and (d) grouping the e-mail with other e-mails having the same fingerprint. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer comprising:
-
a tag extractor configured to extract markup language tags from a document containing markup language code; and a fingerprint generator configured to generate a fingerprint of the document containing markup language code, the fingerprint being indicative of a structure of the markup language code in the document, the fingerprint generator being configured to generate the fingerprint by forming together markup language tags extracted by the tag extractor from the document. - View Dependent Claims (8, 9, 10)
-
-
11. A method to be performed by a computer, the method comprising:
-
extracting markup language tags from a document; and forming a structure identifier indicative of a structure of markup language code in the document, the structure identifier being formed from the markup language tags extracted from the document. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification