Identifying and Preventing Leaks of Sensitive Information
First Claim
1. A method comprising:
- extracting, by a computer system, a plurality of terms from a plurality of documents for a plurality of profiles, wherein each profile in the plurality of profiles is associated with a particular user in a plurality of users using a network of electronics devices;
generating, by the computer system, for the plurality of terms in each of the plurality of profiles, a plurality of associated inferred meanings based on a plurality of usages of the plurality of terms in the plurality of documents;
generating for each profile in the plurality of profiles, by the computer system, a plurality of categorical terms based on the plurality of inferred meanings, wherein the plurality of categorical terms categorize the plurality of terms based on the associated inferred meanings;
generating, by the computer system, a plurality of associated categorical term frequencies based on a plurality of associated frequencies of term occurrences of terms associated with the categorical terms in each of the plurality of profiles, wherein each of the plurality of associated categorical term frequencies is associated with one of the plurality of categorical terms;
determining, by the computer system, a plurality of sensitivity level values for the plurality of categorical terms based on the plurality of associated categorical term frequencies; and
storing, by the computer system, the plurality of sensitivity level values for the plurality of categorical terms, wherein the plurality of sensitivity level values are used to analyze whether an information transaction comprising at least one of the plurality of terms is permitted.
11 Assignments
0 Petitions
Accused Products
Abstract
Determining sensitive information and preventing the unauthorized or unintended dissemination of such information are disclosed. Terms are determined from documents associated with users in a network. Distributions among users and relative frequencies with which the terms are used are determined. Link strengths between users are calculated. Based on the distribution of the terms, the relative frequencies of use among the user profiles and link strengths between users conducting information transactions that include the terms, a sensitivity level for each term can be determined. To determine whether a particular information transaction with particular terms may be conducted between two users in the network, a combination of link strength between the users and sensitivity level of the terms with respect to the users or users'"'"' profiles are considered. If the information transaction includes terms that are unknown to one of the users, then a warning or alarm can be raised.
-
Citations
20 Claims
-
1. A method comprising:
-
extracting, by a computer system, a plurality of terms from a plurality of documents for a plurality of profiles, wherein each profile in the plurality of profiles is associated with a particular user in a plurality of users using a network of electronics devices; generating, by the computer system, for the plurality of terms in each of the plurality of profiles, a plurality of associated inferred meanings based on a plurality of usages of the plurality of terms in the plurality of documents; generating for each profile in the plurality of profiles, by the computer system, a plurality of categorical terms based on the plurality of inferred meanings, wherein the plurality of categorical terms categorize the plurality of terms based on the associated inferred meanings; generating, by the computer system, a plurality of associated categorical term frequencies based on a plurality of associated frequencies of term occurrences of terms associated with the categorical terms in each of the plurality of profiles, wherein each of the plurality of associated categorical term frequencies is associated with one of the plurality of categorical terms; determining, by the computer system, a plurality of sensitivity level values for the plurality of categorical terms based on the plurality of associated categorical term frequencies; and storing, by the computer system, the plurality of sensitivity level values for the plurality of categorical terms, wherein the plurality of sensitivity level values are used to analyze whether an information transaction comprising at least one of the plurality of terms is permitted. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method comprising:
-
receiving, by a computer system, a plurality of terms from a plurality of documents for a plurality of profiles, wherein each profile in the plurality of profiles is associated with a particular user in a plurality of users using a network of electronics devices and with one or more electronic devices in the network of electronic devices; generating, by the computer system, a plurality of categorical terms that categorize the plurality of terms based on syntactic meanings of the plurality of terms; generating, by the computer system, a plurality of associated categorical term frequencies based on a plurality of frequencies of usage of terms associated with the plurality of categorical terms, wherein each of the plurality of associated categorical term frequencies is associated with one of the plurality of categorical terms; determining, by the computer system, a plurality of link strength values for a plurality of pairs of users in the plurality of users based on an organization chart of the users, wherein each of the plurality of link strength values describe a relationship between an associated pair of users in the plurality of pairs of users; and determining, by the computer system, a plurality of sensitivity level values for the plurality of categorical terms based on the plurality of associated categorical term frequencies and the plurality of link strength values for the plurality of pairs of users of the system, wherein the sensitivity level values are used to analyze whether an information transaction comprising at least one of the plurality of terms is allowable. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
-
one or more computer processors; and a non-transitory computer-readable storage medium containing instructions, that when executed, control the one or more computer processors to be configured for; extracting a plurality of terms from a plurality of documents for a plurality of profiles, wherein each profile in the plurality of profiles is associated with a particular user in a plurality of users using a network of electronics devices; generating for the plurality of terms in each of the plurality of profiles, a plurality of associated inferred meanings based on a plurality of usages of the plurality of terms in the plurality of documents; generating for each profile in the plurality of profiles a plurality of categorical terms based on the plurality of inferred meanings, wherein the plurality of categorical terms categorize the plurality of terms based on the associated inferred meanings; generating a plurality of associated categorical term frequencies based on a plurality of associated frequencies of term occurrences of terms associated with the categorical terms in each of the plurality of profiles, wherein each of the plurality of associated categorical term frequencies is associated with one of the plurality of categorical terms; and determining a plurality of sensitivity level values for the plurality of categorical terms based on the plurality of associated categorical term frequencies.
-
Specification