×

Two-Pass Hash Extraction of Text Strings

  • US 20090089048A1
  • Filed: 09/28/2007
  • Published: 04/02/2009
  • Est. Priority Date: 09/28/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method for recognizing text, the method comprising:

  • generating a plurality of generated terms used in a text string;

    calculating a plurality of hash values from the plurality of generated terms;

    creating a plurality of hash buckets respectively corresponding to the plurality of hash values;

    maintaining a plurality of occurrence count values respectively corresponding to the plurality of hash buckets, each of the plurality of occurrence count values respectively indicating a number of times ones of the plurality of generated terms occur in the text string having a hash value that respectively correspond to the plurality of occurrence count values'"'"' respective hash bucket;

    discarding ones of the plurality of hash buckets having respective occurrence count values less than a first predetermined value; and

    adding dictionary terms to a dictionary, the dictionary terms comprising ones of the plurality of generated terms having respective hash values corresponding to any of the plurality of hash values respectively corresponding to the remaining plurality of hash buckets, the dictionary including a plurality of frequency count values respectively indicating the number of times each of the dictionary terms occurred in the text string.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×