×

Identifying and tracking sensitive data

  • US 9,940,479 B2
  • Filed: 10/20/2015
  • Issued: 04/10/2018
  • Est. Priority Date: 10/20/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of classifying privacy relevance of an application programming interface (API), the computer implemented method comprising:

  • in response to receiving a set of input applications, analyzing, by a processor of a computer system, the set of input applications to identify a plurality of custom APIs, via one or more abstract syntax trees (ASTs),wherein representative code of the set of input applications is stored in the one or more ASTs;

    generating, by the processor of the computer system, a respective taint specification for each identified custom API, each respective taint specification relating one or more sources of data to one or more data sinks;

    generating, by the processor of the computer system, one or more taint flows based on the each respective taint specification, the one or more taint flows being a data path and associated data values between a source of data and a data sink, via data recorded from instrumenting the set of input applications based on the each respective taint specification;

    matching, by the processor of the computer system, one or more features and associated feature values from the one or more taint flows to a set of feature templates, via a representative code of each application of the set of input applications,wherein the representative code is searched to find one or more occurrences of each identified custom API;

    correlating, by the processor of the computer system, the matched one or more features and associated feature values with respective privacy relevance of the plurality of custom APIs to identify a set of privacy relevant features;

    clustering, by the processor of the computer system, the custom APIs from the set of input applications into separate groups based on similarity between the matched one or more features and associated feature values of each identified custom API,wherein the clustering is unsupervised;

    detecting, by the processor of the computer system, a candidate API;

    extracting, by the processor of the computer system, one or more features from the candidate API;

    comparing, by the processor of the computer system, the one or more features extracted from the candidate API to the set of privacy relevant features;

    assigning, by the processor of the computer system, a label to the candidate API indicating privacy relevance of the candidate API; and

    outputting an indication of the privacy relevancy of the candidate API via a user output device.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×