×

Methods and apparatus for automated matching and classification of data

  • US 7,657,506 B2
  • Filed: 01/03/2007
  • Issued: 02/02/2010
  • Est. Priority Date: 01/03/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for processing data, comprising:

  • receiving an initial set of records comprising initial terms describing respective items in specified categories;

    calculating, based on the initial set of records, respective term weights for at least some of the initial terms with respect to at least some of the categories, each term weight indicating, for a given initial term and a given category, a likelihood that a record containing the given initial term belongs to the given category, wherein calculating term weights comprises computing a general probability of occurrence of the given initial term over all of the categories, computing a specific probability of the occurrence of the given initial term in the records belonging to the given category, and determining the term weight responsively to a difference between the specific probability and the general probability for the given initial term with respect to the given category;

    receiving a new record, not included in the initial set, the new record comprising particular terms, wherein the particular terms are a subset of the initial terms;

    computing respective assignment metrics for two or more of the categories using the respective term weights of the particular terms in the new record with respect to the two or more of the categories; and

    classifying the new record in one of the two or more of the categories responsively to the assignment metrics.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×