Message rendering for identification of content features
First Claim
1. A system, comprising:
- a computer-readable storage device, comprising;
pre-rendering logic that receives a message containing unrendered text and a non-text element intended to thwart a junk filter by varying the unrendered text and that renders the message in a user—
perceivable format;
converting logic that converts the rendered message into a text-only message by removing the non-text element of the rendered message and rendering words and phrases of the rendered message into text, the converting logic mitigating an affect of the non-text element; and
filtering logic that filters the text-only message based upon predetermined content related to a degree of visibility of user-perceivable content of the rendered message, the filtering logic including weighting logic that adjusts a weighting parameter associated with the text of the text-only message to determine if the text should be removed, the filtering logic further making a determination of whether to handle the message as a junk message or a legitimate message based at least in part on a number of images associated with the text-only message and a total area of the images associated with the text-only message.
2 Assignments
0 Petitions
Accused Products
Abstract
Architecture for detecting and removing obfuscating clutter from the subject and/or body of a message, e.g., e-mail, prior to filtering of the message, to identify junk messages commonly referred to as spam. The technique utilizes the powerful features built into an HTML rendering engine to strip the HTML instructions for all non-substantive aspects of the message. Pre-processing includes pre-rendering of the message into a final format, which final format is that which is displayed by the rendering engine to the user. The final format message is then converted to a text-only format to remove graphics, color, non-text decoration, and spacing that cannot be rendered as ASCII-style or Unicode-style characters. The result is essentially to reduce each message to its common denominator essentials so that the junk mail filter can view each message on an equal basis.
161 Citations
28 Claims
-
1. A system, comprising:
-
a computer-readable storage device, comprising; pre-rendering logic that receives a message containing unrendered text and a non-text element intended to thwart a junk filter by varying the unrendered text and that renders the message in a user—
perceivable format;converting logic that converts the rendered message into a text-only message by removing the non-text element of the rendered message and rendering words and phrases of the rendered message into text, the converting logic mitigating an affect of the non-text element; and filtering logic that filters the text-only message based upon predetermined content related to a degree of visibility of user-perceivable content of the rendered message, the filtering logic including weighting logic that adjusts a weighting parameter associated with the text of the text-only message to determine if the text should be removed, the filtering logic further making a determination of whether to handle the message as a junk message or a legitimate message based at least in part on a number of images associated with the text-only message and a total area of the images associated with the text-only message. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system, comprising:
-
a processor; and a memory communicatively coupled to the processor for storing computer-executable instructions that, when executed by the processor, perform operations comprising; receiving a message containing unrendered text and a non-text element intended to thwart a junk filter by varying the unrendered text; pre-rendering the message into a first format corresponding to content intended to be user perceived; adjusting a weighting parameter associated with text of the message to determine if the text should be removed related to a degree of visibility of the user-perceivable content of the rendered message; converting the message of the first format into a character-only message; and determining whether to handle the message as a junk message or a legitimate message based on the character-only message, the determining being based at least in part on a number of images associated with the message and a total area of the images associated with the message. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A method, comprising:
-
receiving a message containing unrendered text and a non-text element intended to thwart a junk filter by varying the unrendered text; pre-rendering the message into a first format corresponding to content intended to be user perceived, the pre-rendering mitigating an affect of the non-text element; adjusting a weighting parameter associated with text of the message to determine if the text should be removed relative to a degree of visibility of the user-perceivable content of the rendered message; converting the message of the first format into a character-only message; creating a hash of the message and images within the message to determine whether the message and future messages are junk messages; and determining whether to handle the message as a junk message or a legitimate message based on the character-only message and the hash, the determining being based at least in part on a number of images associated with the message or whether images associated with the message link to another source. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification