×

Systems, methods and computer program products for building a database associating n-grams with cognitive motivation orientations

  • US 9,269,273 B1
  • Filed: 04/18/2013
  • Issued: 02/23/2016
  • Est. Priority Date: 07/30/2012
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for building an analysis database associating each of a plurality of n-grams with corresponding respective cognitive motivation orientations, comprising:

  • receiving a training corpus of training documents in electronic form;

    wherein the receiving of the training corpus of training documents comprises scanning at least one training document using OCR technology, and thereby transforming the at least one training document into electronic form;

    each training document comprising a plurality of meaningfully arranged words;

    each training document having at least one annotated word sequence therein;

    wherein within each training document, each particular annotated word sequence is annotated with a corresponding word-sequence-level annotation identifying at least one cognitive motivation orientation that is associated with that particular annotated word sequence;

    for each training document;

    for each annotated word sequence in that particular training document;

    extracting n-grams overlapping that particular annotated word sequence; and

    associating each extracted n-gram with the at least one cognitive motivation orientation associated with that particular annotated word sequence;

    generating a set of indicator candidate n-grams wherein;

    each indicator candidate n-gram represents all instances of a particular n-gram in the training corpus for which at least one instance of that particular n-gram was extracted from any annotated word sequence in any training document;

    each indicator candidate n-gram being associated with every cognitive motivation orientation that is associated with at least one instance of the particular n-gram represented by that particular indicator candidate n-gram;

    applying at least one relevance filter to each indicator candidate n-grams in the set of indicator candidate n-grams to obtain a set of indicator n-grams, wherein;

    the set of indicator n-grams is a subset of the set of indicator candidate n-grams, so that each indicator n-gram corresponds to only one indicator candidate n-gram and thereby each indicator n-gram represents all instances of a corresponding particular n-gram in the training corpus for which at least one instance of that particular n-gram was extracted from any annotated word sequence in any training document;

    each indicator n-gram is associated with only a single cognitive motivation orientation; and

    each indicator n-gram has, as its associated single cognitive motivation orientation, that single cognitive motivation orientation with which the instances of the particular n-gram represented by that particular indicator n-gram are most frequently associated.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×