×

Method and system for document indexing and data querying

  • US 9,275,128 B2
  • Filed: 07/20/2010
  • Issued: 03/01/2016
  • Est. Priority Date: 07/23/2009
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for generating a document index, comprising:

  • generating a preset filter character list, wherein generating includes;

    determining monadic partitions from a sample set of documents, wherein monadic partitions comprise character text;

    determining an appearance frequency for each of at least a subset of the monadic partitions among the sample set of documents; and

    including a subset of the monadic partitions into the preset filter character list based at least in part on appearance frequencies corresponding to respective ones of the monadic partitions;

    obtaining a document to be indexed;

    performing a monadic partition operation on the document to obtain a plurality of monadic partitions associated with the document;

    for a first monadic partition in the plurality of monadic partitions associated with the document;

    determining that the first monadic partition is a first filter character monadic partition based at least in part on matching the first monadic partition with the first filter character monadic partition of the preset filter characters list; and

    in response to the determination that the first monadic partition is the first filter character monadic partition;

    not adding a first entry in the document index corresponding to the first filter character monadic partition;

    forming a polynary partition by combining the first filter character monadic partition with at least one other monadic partition in the plurality of monadic partitions associated with the document, wherein the polynary partition comprises a binary partition, wherein the at least one other monadic partition is adjacent to the first filter character monadic partition in the document; and

    adding the first entry in the document index corresponding to the polynary partition; and

    for a second monadic partition in the plurality of monadic partitions associated with the document;

    determining that the second monadic partition is not a second filter character monadic partition based at least in part on not matching the second monadic partition with the second filter character monadic partition of the preset filter characters list; and

    in response to the determination that the second monadic partition is not the second filter character monadic partition, adding a second entry in the document index corresponding to the second monadic partition.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×