×

Classification-based redaction in natural language text

  • US 8,938,386 B2
  • Filed: 03/15/2011
  • Issued: 01/20/2015
  • Est. Priority Date: 03/15/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method for redacting natural language text, the method comprising:

  • receiving, by a processing device and via a user input device operatively connected to the processing device, one or more user inputs indicating sensitive concepts and utility concepts based on a user interface that includes a visual representation of a plurality of concepts in the natural language text,the plurality of concepts including the sensitive concepts and the utility concepts, andthe natural language text being in an electronic format;

    determining, by the processing device, the sensitive concepts based on the one or more user inputs;

    determining, by the processing device, the utility concepts based on the one or more user inputs;

    determining, by the processing device and for at least one feature in the natural language text, a sensitive concepts implication factor based on class-conditional probabilities of the at least one feature according to the sensitive concepts;

    determining, by the processing device and for the at least one feature, a utility concepts implication factor based on class-conditional probabilities of the at least one feature according to the utility concepts;

    determining, by the processing device and for the at least one feature, a feature score based on a difference between the sensitive concepts implication factor and the utility concepts implication factor;

    identifying, by the processing device and to obtain identified features, the at least one feature based on the feature score satisfying a threshold,the at least one feature implicating at least one identified sensitive concept, of the sensitive concepts, more than at least one identified utility concept of the utility concepts; and

    perturbing, by the processing device, at least some of the identified features in at least a portion of the natural language text.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×