×

System, method, and apparatus for pairing a short document to another short document from a plurality of short documents

  • US 10,083,229 B2
  • Filed: 10/09/2009
  • Issued: 09/25/2018
  • Est. Priority Date: 10/09/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for pairing a new document to a document from a plurality of documents in a document repository, comprising:

  • for each of the new document and the plurality of documents in the document repository, generating a vector uniquely associated with a document of the new document and the plurality of documents, wherein;

    the vector comprises a number of elements equal to a number of terms of interest; and

    for each term of interest, an associated element value of the vector is assigned as zero if the term of interest does not occur in the document and one if the term does occur in the document;

    for each document from the plurality of documents, determining a similarity between the vector for the new document and the vector for the document from the plurality of documents comprising calculating a cosine measurement of similarity between the vector for the new document and the vector for the document from the plurality of documents;

    if it is determined that the similarity between the vector for the new document and the vector for a document from the plurality of documents is greater than or equal to a threshold value then;

    selecting the document from the plurality of documents;

    generating a merged document by merging the new document with the document from the plurality of documents in response to the document from the plurality of documents being selected, wherein the merging comprises combining at least a portion of the new document with at least a portion of the selected document into the merged document;

    removing the selected document from the document repository and adding the merged document to the document repository; and

    generating a new vector for the merged document; and

    if it is determined that the similarity is less than the threshold value then adding the new document to the document repository without merging the new document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×