×

Electronic document classification using composite hyperspace distances

  • US 8,051,139 B1
  • Filed: 04/27/2011
  • Issued: 11/01/2011
  • Est. Priority Date: 09/28/2006
  • Status: Active Grant
First Claim
Patent Images

1. A non-transitory computer-readable medium encoding instructions which, when executed by a computer system, cause the computer system to:

  • parse an electronic text document to generate a document vector for the electronic text document, wherein the document vector includes a feature count component and a feature position component, wherein the feature count component includes a plurality of feature count indicators for the electronic text document, wherein the feature position component includes a data structure selected from a group consisting of an ordered list and a tree of document substructure indicators, each document substructure indicator denoting a type of substructure in the electronic text document, and wherein a position of said each document substructure indicator in the data structure characterizes a position of a corresponding substructure in the electronic text document;

    determine a plurality of composite hyperspace distances between the document vector and a plurality of reference vectors, each composite hyperspace distance being defined between the document vector and a reference vector of the plurality of reference vectors, wherein each composite hyperspace distance is a function of a Euclidean-space distance dependent on the feature count component of the document vector and of an edit distance dependent on the feature position component of the document vector; and

    classify the electronic text document according to at least one of the plurality of composite hyperspace distances.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×