×

High accuracy document information-element vector encoding server

  • US 7,725,466 B2
  • Filed: 10/23/2007
  • Issued: 05/25/2010
  • Est. Priority Date: 10/24/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • applying finite state automaton (FSA) to parse a document to identify one or more information elements (IEs) in the document;

    deriving a unique symbolic sequence particular to the document based on the one or more IEs contained in the document, such unique symbolic sequence being analogous to the DeoxyriboNucleic Acid (DNA) sequence in animals and/or plants;

    wherein deriving the unique symbolic sequence particular to the document comprises;

    if an IE of the one or more IEs includes a section of free text, determining a term frequency inverted document frequency (tfidf) of each of a plurality of words in the section of free text; and

    using the tfidf to generate a portion of the DNA sequence; and

    applying reduced concept space (RCS) to the one or more IEs, wherein the RCS includes polysemic analysis and synomemic analysis.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×