×

Language independent probabilistic content matching

  • US 9,087,039 B2
  • Filed: 02/07/2012
  • Issued: 07/21/2015
  • Est. Priority Date: 02/07/2012
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • receiving electronic message a content source system, the electronic message including textual content;

    accessing, from a rules store by a content determination component in a content determination system, a set of rules that define patterns that are used to identify content as sensitive content, each rule having segmented and un-segmented patterns that can be matched to textual content written in a segmented language and textual content written in an un-segmented language, respectively;

    matching, by the content determination component, at least one textual content portion in the electronic message against the patterns in each rule, regardless of whether the at least one textual content portion in the electronic message is written in a segmented language or in an un-segmented language, to determine whether the at least one textual content portion in the electronic message is sensitive content;

    generating, by the content determination component, a confidence score corresponding to the determination as to whether the at least one textual content portion in the electronic message is sensitive content, based on whether the at least one textual content portion in the electronic message matched a segmented pattern or an un-segmented pattern; and

    applying, by a content processing system, a data dissemination rule to the at least one textual content portion in the electronic message based on the determination as to whether the at least one textual content portion in the electronic message is sensitive content and the corresponding confidence score, wherein applying the data dissemination, rule comprises at least one of;

    blocking the at least one textual content portion in the electronic message from being sent to a potential recipient;

    displaying a message indicating that the at least one textual content portion contains sensitive material and that the at least one textual content portion will be blocked from being sent to a potential recipient;

    ordisplaying a message indicating that the at least one textual content portion contains sensitive material and instructing the user how to proceed based on the data dissemination rule.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×