×

Methods and apparatus for identification and analysis of temporally differing corpora

  • US 9,135,243 B1
  • Filed: 03/15/2013
  • Issued: 09/15/2015
  • Est. Priority Date: 03/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying n-grams about an object, comprising:

  • identifying, as a result of computing hardware and programmable memory, an object-specific corpus, that is a subset of a first corpus, where approximately all statements of the object-specific corpus are about a same first object;

    identifying, as a result of computing hardware and programmable memory, statements of the object-specific corpus, for inclusion in a corpus of interest, upon a basis of a statement relating to a time period of interest;

    identifying, as a result of computing hardware and programmable memory, statements of the object-specific corpus, for inclusion in a reference corpus, upon a basis of a statement relating to a reference time period that is different from the time period of interest;

    identifying, as a result of computing hardware and programmable memory, n-grams of the corpus of interest, for inclusion in a corpus-of-interest list of n-grams;

    identifying, as a result of computing hardware and programmable memory, n-grams of the reference corpus, for inclusion in a reference-corpus list of n-grams;

    identifying, as a result of computing hardware and programmable memory, for each n-gram of the corpus-of-interest list of n-grams, for subsequent access in conjunction with an n-gram, a number of occurrences of the n-gram in the corpus of interest;

    identifying, as a result of computing hardware and programmable memory, for each n-gram of the reference-corpus list of n-grams, for subsequent access in conjunction with an n-gram, a number of occurrences of the n-gram in the reference corpus;

    determining, as a result of computing hardware and programmable memory, a selected list of n-grams from the corpus-of-interest list of n-grams or the reference-corpus list of n-grams;

    determining, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, an average number of occurrences of the n-gram, in the reference-corpus;

    determining, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, a difference value, between a number of occurrences of the n-gram in the corpus of interest and an average number of occurrences of the n-gram in the reference-corpus;

    normalizing, as a result of computing hardware and programmable memory, for each n-gram of the selected list of n-grams, for subsequent access in conjunction with an n-gram, the difference value to produce a normalized difference value; and

    determining, as a result of computing hardware and programmable memory, from the selected list of n-grams, a second selected list of n-grams, on a basis of the normalized difference value.

View all claims
  • 11 Assignments
Timeline View
Assignment View
    ×
    ×