×

Document characterization using a tensor space model

  • US 7,529,719 B2
  • Filed: 03/17/2006
  • Issued: 05/05/2009
  • Est. Priority Date: 03/17/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-readable medium having computer-executable instructions for controlling a processor of a computer system to categorize a document by a method comprising:

  • for each of a plurality of categories, providing documents within that category, each document having words with characters;

    for each document,generating a high-order tensor having an order of at least three, each order represented by a coordinate with characters as dimensions of the coordinate, each element of the high-order tensor representing a sequence of at least three characters and being set to a weight based on number of occurrences of that sequence of at least three characters within the document, the weight being based on term frequency by inverse document frequency; and

    generating a core tensor by reducing dimensionality of the generated high-order tensor using high-order singular value decomposition;

    training a support vector machine (“

    SVM”

    ) classifier using the generated core tensors for the documents and the categories of the documents; and

    categorizing a document by generating a high-order tensor for the document, generating a core tensor for the generated high-order tensor for the document, and applying the SVM classifier to the generated core tensor for the document to determine a category for the document.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×