Estimating data topics of computers using external text content and usage information of the users
First Claim
Patent Images
1. A method to automatically estimate content topics of inaccessible content in a computer system comprising:
- gathering accessible content; and
analyzing, by a processor, the accessible content to estimate one or more topics of the inaccessible content without inspecting the inaccessible content,the inaccessible content comprising privileged data protected from access due to one or more of data privacy and computer security, wherein the one or more topics of the inaccessible content is estimated while preserving the one or more of data privacy and computer security,the analyzing further comprising;
identifying users of the computer system and access counts of the users accessing the computer system, retrieving the accessible content generated by the users of the computer system, analyzing user information and external text content associated with the users that are available in an organization'"'"'s online space outside of the computer system;
for each of the users, generating a document comprising a bag-of-words representation for the inaccessible content generated by the user, the bag-of-words representation comprising words occurring in the accessible content and counts of the words, the counts of the words scaled as a function of a number of occurrences of a word in the accessible content and a computer system access count associated with the user;
generating an asset document associated with the computer system by aggregating the document associated with each user for all users; and
executing a topic modeling algorithm on the asset document that estimates the one or more topics,wherein based on the one or more topics, the module automatically determines security level of information stored in the computer system.
1 Assignment
0 Petitions
Accused Products
Abstract
Automatically estimating content topics of inaccessible content in a computer system, in one aspect, may comprise gathering accessible content and analyzing the accessible content to estimate one or more topics of the inaccessible content.
42 Citations
10 Claims
-
1. A method to automatically estimate content topics of inaccessible content in a computer system comprising:
-
gathering accessible content; and analyzing, by a processor, the accessible content to estimate one or more topics of the inaccessible content without inspecting the inaccessible content, the inaccessible content comprising privileged data protected from access due to one or more of data privacy and computer security, wherein the one or more topics of the inaccessible content is estimated while preserving the one or more of data privacy and computer security, the analyzing further comprising; identifying users of the computer system and access counts of the users accessing the computer system, retrieving the accessible content generated by the users of the computer system, analyzing user information and external text content associated with the users that are available in an organization'"'"'s online space outside of the computer system; for each of the users, generating a document comprising a bag-of-words representation for the inaccessible content generated by the user, the bag-of-words representation comprising words occurring in the accessible content and counts of the words, the counts of the words scaled as a function of a number of occurrences of a word in the accessible content and a computer system access count associated with the user; generating an asset document associated with the computer system by aggregating the document associated with each user for all users; and executing a topic modeling algorithm on the asset document that estimates the one or more topics, wherein based on the one or more topics, the module automatically determines security level of information stored in the computer system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification