×

System and method for comparative analysis of textual documents

  • US 8,868,405 B2
  • Filed: 01/27/2004
  • Issued: 10/21/2014
  • Est. Priority Date: 01/27/2004
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of comparing the semantic content of two or more documents, comprising:

  • accessing a plurality of documents;

    performing a linguistic analysis on each document;

    defining a semantic vector for each document based on the linguistic analysis, said semantic vector having multiple components, wherein each component of said semantic vector has at least;

    a weighting factor relating to an importance, based on characteristics of the document, of said term; and

    a frequency value relating to a number of occurrences of said term;

    processing the semantic vector by a digital computer; and

    comparing a semantic vector of an identified document to the semantic vector for each document in the plurality of documents to determine at least one document semantically similar to the identified document, and wherein the comparing of the semantic vectors includes using a defined metric, wherein said defined metric is related to;

    Sqrt(f12+f22+f32+f42+ +f(N−

    1)2fN2)*100n2, wherein f is a difference in frequency of a common term between documents and n is the number of terms those documents have in common if the component has a weighting factor;

    orSqrt(sum((w−

    Delta)^2*w−

    Avg))/(Log(n)^3*1000), wherein w−

    Delta is the difference in weight between two common terms, w−

    Avg is the average weight between two common terms, and n is the number of common terms if the component has a frequency value.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×