Language independent probabilistic content matching
First Claim
Patent Images
1. A computing system comprising:
- at least one processor; and
memory storing instructions executable by the at least one processor, wherein the instructions configure the computing system;
access a rule that defines patterns that are used to identify content as sensitive content, the rule defininga segmented pattern to be matched to textual content written in a segmented language, and corroborating data associated with the segmented pattern, andan un-segmented pattern to be matched to textual content written in an un-segmented language, and corroborating data associated with the un-segmented pattern;
identify an electronic source document having electronic document content;
determine whether the electronic document content is sensitive content by matching the electronic document content against the patterns in the rule and generating a confidence score corresponding to the determination as to whether the electronic document content is sensitive content, wherein generation of the confidence score is based on whether the electronic document content matched the segmented pattern or the un-segmented pattern, and based on the corroborating data associated with the matched pattern, the generation of the confidence score being regardless of a language of the electronic document content;
identify a data dissemination policy based on the determination as to whether the electronic document content is sensitive content and the corresponding confidence score; and
automatically process the electronic document by identifying an action defined by the data dissemination policy and automatically performing the identified action to control electronic dissemination of the electronic document content over a computer network by at least one of;
automatically blocking the document content from being sent to a potential recipient;
automatically displaying a message indicating that the document content contains sensitive material and that the document content will be blocked from being sent to a potential recipient;
orautomatically displaying a message indicating that the document content contains sensitive material and instructing the user how to proceed based on the data dissemination policy.
2 Assignments
0 Petitions
Accused Products
Abstract
Content is received and compared against rules for identifying a type of content. Each rule has both segmented and unsegmented patterns. The content is matched against the patterns and assigned a confidence score that is higher if the content matches a segmented pattern and lower if the content matches an unsegmented pattern.
-
Citations
10 Claims
-
1. A computing system comprising:
-
at least one processor; and memory storing instructions executable by the at least one processor, wherein the instructions configure the computing system; access a rule that defines patterns that are used to identify content as sensitive content, the rule defining a segmented pattern to be matched to textual content written in a segmented language, and corroborating data associated with the segmented pattern, and an un-segmented pattern to be matched to textual content written in an un-segmented language, and corroborating data associated with the un-segmented pattern; identify an electronic source document having electronic document content; determine whether the electronic document content is sensitive content by matching the electronic document content against the patterns in the rule and generating a confidence score corresponding to the determination as to whether the electronic document content is sensitive content, wherein generation of the confidence score is based on whether the electronic document content matched the segmented pattern or the un-segmented pattern, and based on the corroborating data associated with the matched pattern, the generation of the confidence score being regardless of a language of the electronic document content; identify a data dissemination policy based on the determination as to whether the electronic document content is sensitive content and the corresponding confidence score; and automatically process the electronic document by identifying an action defined by the data dissemination policy and automatically performing the identified action to control electronic dissemination of the electronic document content over a computer network by at least one of; automatically blocking the document content from being sent to a potential recipient; automatically displaying a message indicating that the document content contains sensitive material and that the document content will be blocked from being sent to a potential recipient;
orautomatically displaying a message indicating that the document content contains sensitive material and instructing the user how to proceed based on the data dissemination policy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computing system comprising:
-
a processor; and memory storing instructions which, when executed by the processor, configure the computing system to; access a rule for identifying a particular content type, the rule comprising; a segmented pattern that, when matched to textual content written in a segmented language, defines the textual content as the particular content type; and an unsegmented pattern that, when matched to textual content written in an unsegmented language, defines the textual content as the particular content type; identify an electronic message having message content; generate a confidence score indicative of whether the message content is of the particular content type, the confidence score being generated by applying the rule to the message content and determining whether the message content matched the segmented pattern or the un-segmented pattern; and based on the confidence score, process the electronic message by at least one of; blocking the message content from being sent to a potential recipient; rendering a user notification indicating that the message content contains sensitive material and will be blocked from being sent to a potential recipient; and rendering a user notification indicating that the message content contains sensitive material and instructing the user how to proceed to comply with a data dissemination policy. - View Dependent Claims (10)
-
Specification