×

Method and system for document classification or search using discrete words

  • US 8,838,614 B2
  • Filed: 11/20/2012
  • Issued: 09/16/2014
  • Est. Priority Date: 11/05/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method of operating a computerized document classification system, comprising the steps of:

  • generating at least one classification important word for distinguishing between two or more document classifications, wherein each classification important word has a classification importance value;

    selecting a reference document having information content comprising a plurality of words related to at least one of the two or more document classifications;

    automatically detecting at least one classification important word from the plurality of words within the reference document, wherein the at least one classification important word from the reference document has been processed using at least two dictionary functions selected from a group of dictionary functions consisting of;

    Derived Words;

    Acronym;

    Word Capitalization; and

    Hyphenation;

    generating a word score value for the at least one detected classification important word from the reference document using a WordRatio, which comprises a normalization factor for the detected classification important word that is based upon the relative occurrences of a predetermined plurality of base words, and at least one value selected from a group of values consisting of;

    a value defined for the at least one detected classification important word related to a document section that occurs in the reference document;

    a classification importance value for the at least one detected classification important word;

    a value defined for the at least one detected classification important word in a document type that applies to the document;

    a value defined for the at least one detected classification important word across multiple document classifications; and

    a value based on the statistical occurrence of the at least one detected classification important word in at least two different documents comparing the number of occurrences of the at least one classification important word in a first document having a first classification and a second document having a second classification; and

    generating a classification score for the reference document that is related to the word score value for the at least one detected classification important word.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×