×

Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values

  • US 6,233,575 B1
  • Filed: 06/23/1998
  • Issued: 05/15/2001
  • Est. Priority Date: 06/24/1997
  • Status: Expired due to Fees
First Claim
Patent Images

1. A process for classifying new documents containing features under nodes defining a multilevel taxonomy, based on features derived from a training set of documents that have been classified under respective nodes of the taxonomy, the process comprising:

  • associating a respective set of features with each one of said plurality of nodes, each given set of features comprising a plurality of features that are in at least one training document classified under the associated node; and

    classifying each new document under at least one node, based on the set of features associated with said at least one node, further comprising;

    determining a discrimination value for each term in at least one training document which is classified under each one of a plurality of the nodes of the taxonomy, wherein the discrimination value comprises a Fisher value based on the equation;

    Fisher





    (t)
    =

    c1,c2


    (μ



    (c1,t)
    -μ



    (c2,t)
    )
    2


    c


    1

    c





    d

    c


    (n

    (t,d,c)
    -μ



    (c,t)
    )
    2
    where t represents a term, d represents a document, c represents a class, μ



    (c,t)
    =1

    c





    d

    c


    x

    (d,t)
    ,

    and
    x

    (d,t)
    =an





    occurrence





    rate





    of





    t





    in





    d
    ;

    determining a minimum discrimination value for each of said plurality of nodes;

    wherein the features in each given set of features have discrimination values equal to or above the minimum discrimination value determined for the node associated with the given set of features.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×