×

Automated data classification

  • US 9,483,740 B1
  • Filed: 12/16/2013
  • Issued: 11/01/2016
  • Est. Priority Date: 09/06/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • identifying, by at least one server communicatively coupled to a network, a plurality of training tokens, each training token including a token retrieved from a content source and a classification of the token;

    for each training token in the plurality of training tokens;

    identifying, by the at least one server, a plurality of n-gram sequences,generating, by the at least one server, a plurality of features for the plurality of n-gram sequences, andgenerating, by the at least one server, first training data using the token retrieved from the content source, the plurality of features, and the classification of the token;

    training a first classifier with the first training data;

    storing, by the at least one server, the first classifier into a storage system in communication with the at least one server;

    for each training token in the plurality of training tokens;

    identifying a plurality of related tokens in the content source,for each of the related tokens in the content source;

    identifying a second plurality of n-gram sequences, andgenerating a second plurality of features using the second plurality of n-gram sequences and by executing the first classifier on the related token to generate a probable classification of the related token;

    generating second training data using the second plurality of features;

    training a second classifier with the second training data; and

    storing, by the at least one server, the second classifier into the storage system in communication with the at least one server.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×