Aggregation and classification of secure data

US 9,779,260 B1
Filed: 05/30/2013
Issued: 10/03/2017
Est. Priority Date: 06/11/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

on a computer system comprising at least one server computer and a plurality of distinct data classification engines, managing and controlling a plurality of data-access credentials;

wherein the plurality of distinct data classification engines comprise an a priori classification engine, a posteriori classification engine, and a heuristics engine;

accessing, by the computer system, data from a plurality of sources in a plurality of data formats, the plurality of sources comprising sources that are internal to the computer system and sources that are external to the computer system;

wherein the accessing comprises using one or more data-access credentials of the plurality of data-access credentials, the one or more data-access credentials being associated with at least a portion of the plurality of data sources;

abstracting, by the computer system, the data into a standardized format for further analysis, the abstracting comprising selecting the standardized format based on a type of the data;

applying, by the computer system, a security policy to the data;

wherein the applying comprises identifying at least a portion of the data for exclusion from storage based on the security policy;

the computer system filtering from storage any data identified for exclusion;

storing, by the computer system, the filtered data in the standardized format;

prior to storing, classifying, using at least one of the plurality of distinct data classification engines, the data based on one or more characteristics of metadata associated with the data;

wherein the a posteriori classification engine is operable to perform an a posteriori classification, the a posteriori classification comprises utilization of one or more probabilistic algorithms, wherein the one or more probabilistic algorithms determine a probability that a set of data comprises a particular classification based on a combination of probabilistic determinations associated with subsets of the set of data, parameters, and metadata associated with the set of data; and

wherein the posteriori classification engine is configured to reclassify, at predefined time intervals, the previously classified data in response to user feedback, wherein the user feedback comprises indications by users of an accuracy of the previously classified data and update the one or more probabilistic algorithms based on the user feedback.

View all claims

23 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, a method includes managing and controlling a plurality of data-access credentials. The method further includes accessing data from a plurality of sources in a plurality of data formats. The accessing includes using one or more data-access credentials of the plurality of data-access credentials. The one or more data-access credentials are associated with at least a portion of the plurality of data sources. The method also includes abstracting the data into a standardized format for further analysis. The abstracting includes selecting the standardized format based on a type of the data. In addition, the method includes applying a security policy to the data. The applying includes identifying at least a portion of the data for exclusion from storage based on the security policy. Additionally, the method includes filtering from storage any data identified for exclusion. Further, the method includes storing the data in the standardized format.

272 Citations

18 Claims

1. A method comprising:
- on a computer system comprising at least one server computer and a plurality of distinct data classification engines, managing and controlling a plurality of data-access credentials;
  
  wherein the plurality of distinct data classification engines comprise an a priori classification engine, a posteriori classification engine, and a heuristics engine;
  
  accessing, by the computer system, data from a plurality of sources in a plurality of data formats, the plurality of sources comprising sources that are internal to the computer system and sources that are external to the computer system;
  
  wherein the accessing comprises using one or more data-access credentials of the plurality of data-access credentials, the one or more data-access credentials being associated with at least a portion of the plurality of data sources;
  
  abstracting, by the computer system, the data into a standardized format for further analysis, the abstracting comprising selecting the standardized format based on a type of the data;
  
  applying, by the computer system, a security policy to the data;
  
  wherein the applying comprises identifying at least a portion of the data for exclusion from storage based on the security policy;
  
  the computer system filtering from storage any data identified for exclusion;
  
  storing, by the computer system, the filtered data in the standardized format;
  
  prior to storing, classifying, using at least one of the plurality of distinct data classification engines, the data based on one or more characteristics of metadata associated with the data;
  
  wherein the a posteriori classification engine is operable to perform an a posteriori classification, the a posteriori classification comprises utilization of one or more probabilistic algorithms, wherein the one or more probabilistic algorithms determine a probability that a set of data comprises a particular classification based on a combination of probabilistic determinations associated with subsets of the set of data, parameters, and metadata associated with the set of data; and
  
  wherein the posteriori classification engine is configured to reclassify, at predefined time intervals, the previously classified data in response to user feedback, wherein the user feedback comprises indications by users of an accuracy of the previously classified data and update the one or more probabilistic algorithms based on the user feedback.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the using comprises using the one or more data-access credentials to access an account for each of the at least a portion of the plurality of data sources.
  - 3. The method of claim 1, wherein the accessing comprises accessing metadata associated with the data.
  - 4. The method of claim 1, wherein the abstracting comprises extracting metadata from the data.
  - 5. The method of claim 1, comprising, prior to the storing, formatting the data for storage.
  - 6. The method of claim 1, wherein the storing comprises dynamically expanding a database responsive to the data including a new type of data.
  - 7. The method of claim 1, wherein the plurality of sources are selected from the group consisting of:
    - cloud services, social-media services, hosted applications, and email servers.
  - 8. The method of claim 1, wherein:
    - the data comprises messaging data; and
      
      the standardized format comprises a standardized format for messages.
  - 9. The method of claim 8, wherein the accessing comprises accessing email from an email server to obtain metadata.

10. An information handling system comprising:
- a processing unit, wherein the processing unit is operable to implement a method comprising;
  
  managing and controlling a plurality of data-access credentials;
  
  accessing data from a plurality of sources in a plurality of data formats, the plurality of sources comprising sources that are internal to the information handling system and sources that are external to the information handling system;
  
  wherein the accessing comprises using one or more data-access credentials of the plurality of data-access credentials, the one or more data-access credentials being associated with at least a portion of the plurality of data sources;
  
  abstracting the data into a standardized format for further analysis, the abstracting comprising selecting the standardized format based on a type of the data;
  
  applying a security policy to the data;
  
  wherein the applying comprises identifying at least a portion of the data for exclusion from storage based on the security policy;
  
  filtering from storage any data identified for exclusion;
  
  storing the filtered data in the standardized format;
  
  prior to storing, classifying, using a plurality of distinct data classification engines, the data based on one or more characteristics of metadata associated with the data;
  
  wherein the plurality of distinct data classification engines comprise an a priori classification engine, a posteriori classification engine, and a heuristics engine;
  
  wherein the a posteriori classification engine is operable to perform an a posteriori classification, the a posteriori classification comprises utilization of one or more probabilistic algorithms, wherein the one or more probabilistic algorithms determine a probability that a set of data comprises a particular classification based on a combination of probabilistic determinations associated with subsets of the set of data, parameters, and metadata associated with the set of data; and
  
  wherein the posteriori classification engine is configured to reclassify, at predefined time intervals, the previously classified data in response to user feedback, wherein the user feedback comprises indications by users of an accuracy of the previously classified data and update the one or more probabilistic algorithms based on the user feedback.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The information handling system of claim 10, wherein the using comprises using the one or more data-access credentials to access an account for each of the at least a portion of the plurality of data sources.
  - 12. The information handling system of claim 10, wherein the accessing comprises accessing metadata associated with the data.
  - 13. The information handling system of claim 10, wherein the abstracting comprises extracting metadata from the data.
  - 14. The information handling system of claim 10, comprising, prior to the storing, formatting the data for storage.
  - 15. The information handling system of claim 10, wherein the storing comprises dynamically expanding a database responsive to the data including a new type of data.
  - 16. The information handling system of claim 10, wherein the plurality of sources are selected from the group consisting of:
    - cloud services, social-media services, hosted applications, and email servers.
  - 17. The information handling system of claim 10, wherein:
    - the data comprises messaging data; and
      
      the standardized format comprises a standardized format for messages.

18. A non-transitory computer-program product comprising a computer-usable medium having computer-readable program code embodied therein, the computer-readable program code adapted to be executed to implement a method comprising:
- on a computer system comprising at least one server computer and a plurality of distinct data classification engines, managing and controlling a plurality of data-access credentials;
  
  wherein the plurality of distinct data classification engines comprise an a priori classification engine, an a posteriori classification engine, and a heuristics engine;
  
  accessing, by the computer system, data from a plurality of sources in a plurality of data formats, the plurality of sources comprising sources that are internal to the computer system and sources that are external to the computer system;
  
  wherein the accessing comprises using one or more data-access credentials of the plurality of data-access credentials, the one or more data-access credentials being associated with at least a portion of the plurality of data sources;
  
  abstracting, by the computer system, the data into a standardized format for further analysis, the abstracting comprising selecting the standardized format based on a type of the data;
  
  applying, by the computer system, a security policy to the data;
  
  wherein the applying comprises identifying at least a portion of the data for exclusion from storage based on the security policy;
  
  the computer system filtering from storage any data identified for exclusion;
  
  storing, by the computer system, the filtered data in the standardized format;
  
  prior to storing, classifying, using at least one of the plurality of distinct data classification engines, the data based on one or more characteristics of metadata associated with the data;
  
  wherein the a posteriori classification engine is operable to perform an a posteriori classification, the a posteriori classification comprises utilization of one or more probabilistic algorithms, wherein the one or more probabilistic algorithms determine a probability that a set of data comprises a particular classification based on a combination of probabilistic determinations associated with subsets of the set of data, parameters, and metadata associated with the set of data; and
  
  wherein the posteriori classification engine is configured to reclassify, at predefined time intervals, the previously classified data in response to user feedback, wherein the user feedback comprises indications by users of an accuracy of the previously classified data and update the one or more probabilistic algorithms based on the user feedback.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Quest Software, Inc.
Original Assignee
Dell Software, Inc. (Dell Technologies Inc.)
Inventors
Brisebois, Michel, Aylesworth, Jason, Johnstone, Curtis, Leach, Andrew John, Vinogradov, Elena, Blaiberg, Joel Stacy, Pope, Stephen, Holmesdale, Shawn Donald, Hu, Guangning
Primary Examiner(s)
Parsons, Theodore C

Application Number

US13/906,241
Time in Patent Office

1,587 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/254   Extract, transform and load...

G06F 16/258   Data format conversion from...

G06F 16/951   Indexing; Web crawling tech...

G06F 21/62   Protecting access to data v...

G06F 21/6218   to a system of files or obj...

G06F 21/6227   where protection concerns t...

G06F 2216/01   Automatic library building

G06F 2216/03   Data mining

G06Q 30/02   Marketing; Price estimation...

Aggregation and classification of secure data

First Claim

23 Assignments

0 Petitions

Accused Products

Abstract

272 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Aggregation and classification of secure data

First Claim

23 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

272 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links