Language independent probabilistic content matching
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving electronic message a content source system, the electronic message including textual content;
accessing, from a rules store by a content determination component in a content determination system, a set of rules that define patterns that are used to identify content as sensitive content, each rule having segmented and un-segmented patterns that can be matched to textual content written in a segmented language and textual content written in an un-segmented language, respectively;
matching, by the content determination component, at least one textual content portion in the electronic message against the patterns in each rule, regardless of whether the at least one textual content portion in the electronic message is written in a segmented language or in an un-segmented language, to determine whether the at least one textual content portion in the electronic message is sensitive content;
generating, by the content determination component, a confidence score corresponding to the determination as to whether the at least one textual content portion in the electronic message is sensitive content, based on whether the at least one textual content portion in the electronic message matched a segmented pattern or an un-segmented pattern; and
applying, by a content processing system, a data dissemination rule to the at least one textual content portion in the electronic message based on the determination as to whether the at least one textual content portion in the electronic message is sensitive content and the corresponding confidence score, wherein applying the data dissemination, rule comprises at least one of;
blocking the at least one textual content portion in the electronic message from being sent to a potential recipient;
displaying a message indicating that the at least one textual content portion contains sensitive material and that the at least one textual content portion will be blocked from being sent to a potential recipient;
ordisplaying a message indicating that the at least one textual content portion contains sensitive material and instructing the user how to proceed based on the data dissemination rule.
2 Assignments
0 Petitions
Accused Products
Abstract
Content is received and compared against rules for identifying a type of content. Each rule has both segmented and unsegmented patterns. The content is matched against the patterns and assigned a confidence score that is higher if the content matches a segmented pattern and lower if the content matches an unsegmented pattern.
-
Citations
19 Claims
-
1. A computer-implemented method comprising:
-
receiving electronic message a content source system, the electronic message including textual content; accessing, from a rules store by a content determination component in a content determination system, a set of rules that define patterns that are used to identify content as sensitive content, each rule having segmented and un-segmented patterns that can be matched to textual content written in a segmented language and textual content written in an un-segmented language, respectively; matching, by the content determination component, at least one textual content portion in the electronic message against the patterns in each rule, regardless of whether the at least one textual content portion in the electronic message is written in a segmented language or in an un-segmented language, to determine whether the at least one textual content portion in the electronic message is sensitive content; generating, by the content determination component, a confidence score corresponding to the determination as to whether the at least one textual content portion in the electronic message is sensitive content, based on whether the at least one textual content portion in the electronic message matched a segmented pattern or an un-segmented pattern; and applying, by a content processing system, a data dissemination rule to the at least one textual content portion in the electronic message based on the determination as to whether the at least one textual content portion in the electronic message is sensitive content and the corresponding confidence score, wherein applying the data dissemination, rule comprises at least one of; blocking the at least one textual content portion in the electronic message from being sent to a potential recipient; displaying a message indicating that the at least one textual content portion contains sensitive material and that the at least one textual content portion will be blocked from being sent to a potential recipient;
ordisplaying a message indicating that the at least one textual content portion contains sensitive material and instructing the user how to proceed based on the data dissemination rule. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A content processing system, comprising:
-
a rules data store that includes a plurality of rules, each rule corresponding to a type of information and having a set of segmented patterns and a set of un-segmented patterns; a content determination component receiving an electronic message having content and matching the content against the segmented patterns and the un-segmented patterns in each rule to determine whether the content includes the type of information corresponding to each rule, the content determination component assigning a confidence level to the determination of whether the content includes the type of information corresponding to a given rule based on whether the content matched a segmented pattern or an un-segmented pattern in the given rule; a content processing system processing the content based on the determination of whether the content includes the type of information corresponding to the given rule, and applying a data dissemination rule comprises at least one of blocking the at least one textual content portion in the electronic message from being sent to a potential recipient, displaying a message indicating that the at least one textual content portion contains sensitive material and that the at least one textual content portion will be blocked from being sent to a potential recipient, or displaying a message indicating that the at least one textual content portion contains sensitive material and instructing the user how to proceed in accordance with the data dissemination rule; and a computer processor being a functional component of the system and being activated by the content determination component to facilitate matching and assigning a confidence level. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A non-volatile memory device having computer readable instructions which, when executed, perform a method, comprising:
-
receiving an electronic message from a content source system, the electronic message including textual content; accessing, from a rules store by a content determination component in a content determination system, a set of rules that define patterns that are used to identify content as sensitive content, each rule having segmented and un-segmented patterns that can be matched to textual content written in a segmented language and textual content written in an un-segmented language, respectively; matching, by the content determination component, at least one textual content portion in the electronic message against the patterns in each rule, regardless of whether the at least one textual content portion in the electronic message is written in a segmented language or in an un-segmented language, to determine whether the at least one textual content portion in the electronic message is sensitive content; generating, by the content determination component, a confidence score corresponding to the determination as to whether the at least one textual content portion in the electronic message is sensitive content, based on whether the at least one textual content portion in the electronic message matched a segmented pattern or an un-segmented pattern by generating a higher confidence score if the at least one textual content portion in the electronic message matched a segmented pattern and generating a lower confidence score if the at least one textual content portion in the electronic message matched an un-segmented pattern; and applying, by a content processing system, a data dissemination rule to the at least one textual content portion in the electronic message based on the determination as to whether the at least one textual content portion in the electronic message is sensitive content and the corresponding confidence score, wherein applying the data dissemination rule comprises at least one of; blocking the at least one textual content portion in the electronic message from being sent to a potential recipient; displaying a message indicating that the at least one textual content portion contains sensitive material and that the at least one textual content portion will be blocked from being sent to a potential recipient;
ordisplaying a message indicating that the at least one textual content portion contains sensitive material and instructing the user how to comply with the data dissemination rule. - View Dependent Claims (17, 18, 19)
-
Specification