×

Multi-label content recategorization

  • US 10,691,739 B2
  • Filed: 12/22/2015
  • Issued: 06/23/2020
  • Est. Priority Date: 12/22/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computing apparatus, comprising:

  • a hardware platform comprising a processor and a memory; and

    one or more tangible, non-transitory computer-readable mediums having instructions to provide a two-phase classification engine to;

    in a first phase, receive a clean multi-labeled dataset comprising a plurality of documents, each assigned to one or more categories from a set of fixed categories;

    receive an unclean multi-labeled dataset, wherein at least some objects of the unclean multi-labeled dataset belong to overlapping classes, wherein the probability that a document belongs to the overlapping classes is approximately equal;

    produce a recategorized and cleansed dataset from the unclean multi-labeled dataset, comprising predicting a number of labels {circumflex over (l)} for a document j, and comparing {circumflex over (l)} to an existing number of labels {circumflex over (l)}; and

    in a second phase, compute from the recategorized and cleansed dataset a probability difference between l and {circumflex over (l)} for j, and take l to be correct if the difference is less than or equal to a threshold.

View all claims
  • 13 Assignments
Timeline View
Assignment View
    ×
    ×