×

Systems and methods for the distributed categorization of source data

  • US 10,157,217 B2
  • Filed: 05/27/2016
  • Issued: 12/18/2018
  • Est. Priority Date: 05/18/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method for labeling a set of source data, comprising:

  • obtaining a set of source data using a distributed data categorization server system comprising a processor and a memory connected to the processor;

    determining a plurality of subsets of the source data using the distributed data categorization server system, where a subset of the source data comprises a plurality of pieces of source data in the set of source data;

    obtaining sets of pairwise annotations for each subset of source data using the data categorization server system, where a pairwise annotation indicates when a first piece of source data in a subset of source data is similar to a second piece of source data in the subset of source data;

    identifying a category for each subset of source data based on the obtained pairwise annotations for the subset of source data using the distributed data categorization server system;

    locating pieces of source data located in at least two of the subsets of source data using the data categorization server system;

    generating source data metadata describing attributes for at least one of the located pieces of source data based on the categories assigned to each of the subsets of source data in which the located pieces of content are contained using the data categorization server system, where the source data metadata for a piece of source data describes attributes of the piece of source data; and

    generating a taxonomy based on the identified categories and the set of source data using the distributed data categorization server system, where the taxonomy comprises relationships between the identified categories and the pieces of source data in the set of source data.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×