×

Method and system for document classification based on document structure and written style

  • US 20090300046A1
  • Filed: 05/29/2008
  • Published: 12/03/2009
  • Est. Priority Date: 05/29/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method of determining the classification of a document for the purpose of searching, comprising:

  • a) receiving the textual content of said document in the form of linguistic sentences and alphabetical words;

    b) receiving meta-data on images of said document including each image size, title or description;

    c) categorizing said linguistic sentences into subjective and non-subjective sentences;

    d) categorizing said alphabetical words into complex and non-complex words;

    e) categorizing said images into descriptive and non-descriptive images;

    f) calculating the document subjectivity classification as the count of said subjective sentences or the ratio of subjective sentences to non-subjective sentences or total sentences in said document;

    g) calculating the document complexity classification as the count of complex alphabetical words or the ratio of complex alphabetical words to non-complex words or total words; and

    h) calculating the document descriptive-images classification as the count of descriptive-images.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×