×

Generating anonymous data from web data

  • US 9,866,454 B2
  • Filed: 03/25/2014
  • Issued: 01/09/2018
  • Est. Priority Date: 03/25/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method to provide synthetic data, when statistical properties and privacy information, associated with web data, are preserved in the synthetic data, comprising:

  • receiving, by a device and from user devices, the web data,the web data being associated with the user devices,the web data being generated based on interactions of the user devices with one or more content provider devices via a network, andthe web data including one or more of;

    clickstream data that includes information associated with portions of content, provided by the one or more content provider devices, that are selected via the user devices,location data that includes information associated with locations of the user devices when the content is accessed by the user devices,time data that includes information associated with times when the user devices access the content, ornetwork data that includes information associated with network resources utilized by the user devices to access the content;

    removing, by the device, erroneous or objectionable web data from the web data to generate a subset of the web data;

    categorizing, by the device, the subset of the web data by assigning categories to the subset of the web data;

    performing, by the device, an empirical estimation of the categorized subset of the web data to generate empirical estimations that include information that provides a representation of behaviors associated with users of the user devices;

    receiving, by the device, a selection of an anonymity level associated with generating the synthetic data;

    performing, by the device, a simulation of the empirical estimations to generate the synthetic data,the synthetic data including information associated with the empirical estimations, andthe synthetic data removing private information, relating to the user devices and the users of the user devices, in accordance to the anonymity level;

    determining, by the device, whether the statistical properties and the privacy information, associated with the web data, are preserved in the synthetic data; and

    selectively;

    storing, by the device, the synthetic data in a storage device and providing the synthetic data when the statistical properties and the privacy information, associated with the web data, are preserved in the synthetic data, orre-performing, by the device, the simulation of the empirical estimations to generate other synthetic data when the statistical properties or the privacy information, associated with the web data, is not preserved in the synthetic data.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×