×

Automated determination of document utility for a document corpus

  • US 10,372,714 B2
  • Filed: 02/05/2016
  • Issued: 08/06/2019
  • Est. Priority Date: 02/05/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for determining whether to add a document to an electronic document corpus maintained in at least one electronic data store, the method comprising:

  • creating, based at least in part on the content of the document corpus, a corpus vector including corpus cells, each corpus cell indicating a word in the document corpus and corpus word count indicating a number of instances of the word in the document corpus; and

    receiving, via one or more processors, an electronic candidate document;

    creating, based at least in part on the content of the candidate document, a document vector including document cells, each document cell indicating a word in the candidate document and document word count indicating a number of instances of the word in the candidate document;

    determining, by at least one of the processors, a relevance value indicating relevance of the candidate document to the document corpus, wherein determining the relevance value includesfor each corpus cell in the document corpus vector, determining a product by multiplying the corpus word count by a document word count of a corresponding document cell in the document vector;

    determining a numerator by summing the products;

    determining a first sum by summing a square of the corpus word count in each corpus cell in the corpus vector,determining a second sum by summing a square of the document word count in each document cell in the document vector,determining a first square root of first sum and a second square root of the second sum;

    determining a denominator by multiplying the first square root by the second square root;

    dividing the numerator by the denominator;

    determining, by the one or more processors, that the candidate document is novel with respect to the document corpus based on the relevance value; and

    in response to determining that the candidate document is relevant to the document corpus and novel with respect to the document corpus, adding the candidate document to the document corpus to make at least a portion of the content of the candidate document available for a response to a search query.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×