×

Providing capitalization correction for unstructured excerpts

  • US 7,451,398 B1
  • Filed: 11/18/2003
  • Issued: 11/11/2008
  • Est. Priority Date: 11/18/2003
  • Status: Active Grant
First Claim
Patent Images

1. A computer system for building a lexicon for use in capitalization correction for unstructured excerpts, comprising:

  • at least one processing unit;

    at least one storage device being coupled with the at least one processing unit and storing program code;

    a ripper adapted to assemble a list of word sets from unstructured content, at least one of the word sets comprising a word and at least two non-standard capitalization variations for the word; and

    an aggregator adapted to aggregate at least one of the word sets, the aggregator includingan analyzer adapted to identify non-standard capitalization variations based on at least one criteria; and

    a non-standard capitalization selector adapted to select at least one of the identified non-standard capitalization variations within one of the at least one word sets, and adding the selected at least one of the identified non-standard capitalization variations to the lexicon, wherein the lexicon includes records, each record including a word, wherein the lexicon is indexed by the words included in the records, and wherein at least one of the records includes more than one non-standard capitalization variation.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×