×

Accessing documents using predictive word sequences

  • US 9,069,842 B2
  • Filed: 09/28/2010
  • Issued: 06/30/2015
  • Est. Priority Date: 09/28/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for accessing documents related to a subject from a document corpus, comprising:

  • categorizing documents from the document corpus based on one or more subjects;

    creating a candidate list of word sequences, wherein respective ones of the word sequences comprise one or more elements derived from the document corpus;

    expanding the candidate list by adding one or more new word patterns, wherein each new pattern comprises a gapped sequence created by combining one or more elements derived from the document corpus with one of said word sequences;

    determining a predictive power with respect to the subject for respective ones of entries of the candidate list, wherein the entries comprise said word sequences and said new word patterns;

    pruning from the candidate list ones of said entries with the determined predictive power less than a predetermined threshold, wherein the predictive power comprises a measure of information gain, and wherein the pruning further comprises pruning from the candidate list ones of said entries with a frequency of occurrence less than a predetermined frequency threshold;

    accessing documents from the document corpus based on the pruned candidate list;

    updating the categorization of documents based on the accessing; and

    iteratively performing the expanding, the determining the predictive power, and the pruning, for increasing entry lengths until at least one of the entries is of a predetermined length.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×