×

System and method for handling the confounding effect of document length on vector-based similarity scores

  • US 9,311,390 B2
  • Filed: 01/29/2009
  • Issued: 04/12/2016
  • Est. Priority Date: 01/29/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of generating vector-based similarity scores in text document comparisons considering document length, comprising:

  • computing a mean of a number of word types of two text documents to be compared;

    determining a similarity score with a vector-based similarity model, wherein the vector-based similarity model is a Random Indexing model and wherein a normalization slope parameter has a value of 10;

    performing pivoted document length normalization on the similarity score using the mean of the number of word types of the two text documents as a normalization affected by both text documents and using the normalization slope parameter; and

    outputting a normalized similarity score.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×