×

Identifying and processing a number of features identified in a document to determine a type of the document

  • US 9,516,089 B1
  • Filed: 12/16/2013
  • Issued: 12/06/2016
  • Est. Priority Date: 09/06/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • receiving, by at least one server communicatively coupled to a network, an input document;

    identifying, by the at least one server, a plurality of features in the input document, the plurality of features including sequences of text extracted from the input document;

    generating, by the at least one server, a feature vector of the input document based upon the sequences of text;

    identifying, by the at least one server, a plurality of signature vectors based upon an input training dataset and at least one cross-type frequency vector;

    comparing, by the at least one server, the feature vector of the input document to each of a plurality of signature vectors to determine a primary type of the input document, wherein comparing the feature vector of the input document to each of the plurality of signature vectors to determine the primary type of the input document includes identifying a signature vector that maximizes the expression V·

    (Ct/D), where V is the feature vector, Ct is a signature vector t in the plurality of signature vectors, and D is the at least one cross-type frequency vector; and

    storing, by the at least one server, the primary type of the input document into a storage system in communication with the at least one server.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×