×

Kernels and methods for selecting kernels for use in learning machines

  • US 7,788,193 B2
  • Filed: 10/30/2007
  • Issued: 08/31/2010
  • Est. Priority Date: 05/07/2001
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for analyzing data comprising a text document to identify patterns in words or characters within the document, the method comprising:

  • inputting the data into a computing environment comprising one or more pre-processing program modules and one or more support vector machine modules stored on a drive or a system memory of a computer or computer network by;

    dividing the data into a training dataset and a test dataset;

    defining a kernel for structured data for execution by the one or more support vector machine modules by representing the training dataset as a collection of sequences of words or characters and an index set within the document structure, wherein the indices within the index set correspond to locations of words or characters within the document;

    applying a vicinity function to the collection of sequences of words or characters to define a plurality of sequences of words or characters centered at different words or characters;

    measuring similarity of pairs of sequences of words or characters centered at the different indices to define a locational kernel having a value corresponding to each of the different pairs of sequences of words or characters;

    creating additional locational kernels by performing an operation selected from addition, scalar multiplication, multiplication, pointwise limits, transformation and convolution on the locational kernels;

    combining the locational kernels and the additional locational kernels for the different sequences of words or characters by performing an operation to produce a kernel on a set of sequences of words or characters;

    testing the kernel on the test data set having a known set of sequences of words or characters to determine whether an optimal solution has been achieved;

    if the optimal solution has been achieved, applying the kernel on a set of sequences of words or characters to a document having an unknown structure to identify patterns within, and thereby extract knowledge from, the document; and

    generating a display of the identified patterns of words or characters within the document having an unknown structure.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×