Probablistic models and methods for combining multiple content classifiers

US 7,107,254 B1
Filed: 05/07/2001
Issued: 09/12/2006
Est. Priority Date: 05/07/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A computer system for classifying items, comprising:

a plurality of classifiers;

a computer system component comprising probabilistic dependency models, one for each of a plurality of categories, the computer system component applies the probabilistic dependency models to an item to provide with respect to each of the plurality of categories an indication of whether the item belongs;

wherein the probabilistic dependency models collectively employ outputs from the plurality of classifiers; and

the outputs employed by the probabilistic dependency models vary among the probabilistic dependency models.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention applies a probabilistic approach to combining evidence regarding the correct classification of items. Training data and machine learning techniques are used to construct probabilistic dependency models that effectively utilize evidence. The evidence includes the outputs of one or more classifiers and optionally one or more reliability indicators. The reliability indicators are, in a broad sense, attributes of the items being classified. These attributes can include characteristics of an item, source of an item, and meta-level outputs of classifiers applied to the item. The resulting models include meta-classifiers, which combine evidence from two or more classifiers, and tuned classifiers, which use reliability indicators to inform the interpretation of classical classifier outputs. The invention also provides systems and methods for identifying new reliability indicators.

128 Citations

View as Search Results

30 Claims

1. A computer system for classifying items, comprising:
- a plurality of classifiers;
  
  a computer system component comprising probabilistic dependency models, one for each of a plurality of categories, the computer system component applies the probabilistic dependency models to an item to provide with respect to each of the plurality of categories an indication of whether the item belongs;
  
  wherein the probabilistic dependency models collectively employ outputs from the plurality of classifiers; and
  
  the outputs employed by the probabilistic dependency models vary among the probabilistic dependency models.
- View Dependent Claims (2, 3, 4)
- - 2. The computer system of claim 1, wherein the dependency models collectively employ one or more reliability indicators.
  - 3. The computer system of claim 1, wherein the probabilistic dependency models are decision trees.
  - 4. The computer system of claim 1, wherein the items are texts.

5. A computer system for classifying items, comprising:
- a plurality of classifiers; and
  
  , a computer system component that applies a probabilistic dependency model to classify an item, wherein the probabilistic dependency model contains dependencies on one or more classical outputs from the plurality of classifiers and dependencies on one or more reliability indicators.
- View Dependent Claims (6, 7, 8)
- - 6. The computer system of claim 5, wherein the computer system outputs a quantitative measure relating to confidence that the item belongs in a category.
  - 7. The computer system of claim 6, wherein the probabilistic dependency models are decision trees.
  - 8. The computer system of claim 6, wherein the items are texts.

9. A computer system, comprising:
- a plurality of classifiers; and
  
  , a first computer system component that learns, from training examples, probabilistic dependency models for classifying items according to one or more reliability indicators together with classical outputs from the plurality of classifiers.
- View Dependent Claims (10, 11, 12, 13, 29)
- - 10. The computer system of claim 9, further comprising a second computer system component that repeatedly invokes the first component to learn probabilistic dependency models employing various potentially effective reliability indicators and compares the performances of the resulting probabilistic dependency models to identify reliability indicators that are effective.
  - 11. The computer system of claim 9, wherein the first computer system component employs the classical outputs from classifiers and the reliability indicators in the same manner.
  - 12. The computer system of claim 9, wherein the probabilistic dependency models are decision trees.
  - 13. The computer system of claim 9, wherein the items are texts.
  - 29. The computer system of claim 10, wherein the second component automatically selects the potentially effective reliability indicators.

14. A computer readable medium having computer executable instructions for performing steps comprising:
- implementing a plurality of classifiers adapted to receive and classify at least one item, the plurality of classifiers each generating a score related to classification of the at least one item; and
  
  for each of one or more categories, facilitating classification, selection, and/or utilization of the at least one item with a probabilistic dependency model that employs one or more of the scores and, in addition, one or more reliability indicators.
- View Dependent Claims (15)
- - 15. The computer readable medium of claim 14, wherein:
    - the instructions implement a different probabilistic dependency model for each of two or more categories;
      
      the probabilistic dependency models are based on subsets of parameters selected from the group consisting of the scores and the reliability indicators; and
      
      the subsets of parameters vary among the probabilistic dependency models.

16. A system for classifying items, comprising:
- means for determining a model that classifies the items based on a probabilistic approach that combines information about the items including one or more classical outputs of classifiers and one or more reliability indicators; and
  
  means for applying the model to classify the items.

17. A computer-readable medium having stored thereon a data structure useful in classifying items, comprising:
- first data fields containing data representing an attribute to test, wherein the attributes represented include both classical classifier outputs and reliability indicators;
  
  second data fields corresponding to the first data fields and containing data representing values against which to compare the attributes;
  
  third data fields containing data representing classifier outcomes;
  
  fourth data fields facilitating determination of relationships among instances of the first, second, and third data fields, the relationships having a decision tree structure with the first and second data fields corresponding to decision nodes and the third data fields corresponding to leaf nodes.
- View Dependent Claims (18)
- - 18. The computer-readable medium of claim 17, wherein the data represented by the first data fields comprises classical classifier outputs from a plurality of classifiers.

19. A method of generating a classifier, comprising:
- obtaining a set of training examples;
  
  applying a probabilistic approach that uses the training examples to develop a model that combines evidence to provide an output relating to whether an item belongs in a category; and
  
  storing the model in a computer-readable media for use as a classifier;
  
  wherein the evidence comprises one or more classical outputs of other classifiers and one or more attributes of the item other than classical outputs of classifiers.
- View Dependent Claims (20, 21, 22, 23, 30)
- - 20. A method of identifying useful reliability indicators, comprisingobtaining potentially useful reliability indicators;
    - applying the method of claim 19 using various of the potentially useful reliability indicators as evidence; and
      
      comparing the resulting classifiers to identify which of the potentially useful reliability indicators are, in fact, useful.
  - 21. The method of claim 19, wherein the model is a decision tree.
  - 22. The method of claim 19, wherein the evidence comprises classical outputs from two or more classifiers.
  - 23. A method of classifying items, comprising:
    - obtaining the items in computer readable format, employing a computer to classify the item using a classifier generated according to the method of claim 19.
  - 30. The method of claim 23, wherein the items are texts.

24. A method of classifying items, comprising:
- applying probabilistic dependency models, one for each of a plurality of categories, to an item stored in computer readable format to provide an output relating to whether the item belongs in the category with respect to each of the plurality of categories;
  
  wherein the probabilistic dependency models collectively contain dependencies on outputs from a plurality of classifiers; and
  
  the outputs considered by the probabilistic dependency models vary among the probabilistic dependency models.
- View Dependent Claims (25, 26)
- - 25. The method of claim 24, wherein the dependency models collectively contain dependencies based on one or more reliability indicators.
  - 26. The method of claim 24, wherein the probabilistic dependency models are decision trees.

27. A method of combining a plurality of classifiers to classify items, comprising:
- sequentially applying tests to the items to obtain test results; and
  
  classifying the items based on the test results, wherein the sequence of tests applied varies among the items in that the outcome of one or more tests affects whether another test is applied, whereby the classifiers utilized vary depending on the items.
- View Dependent Claims (28)
- - 28. The method of claim 27, wherein one or more of the tests involves a reliability indicator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Bennett, Paul Nathan, Dumais, Susan T., Horvitz, Eric J.
Primary Examiner(s)
Knight, Anthony
Assistant Examiner(s)
Hirl, Joseph P.

Application Number

US09/850,172
Time in Patent Office

1,954 Days
Field of Search

706/50, 706/12, 706/14
US Class Current

706/50
CPC Class Codes

G06N 7/01 Probabilistic graphical mod...

Probablistic models and methods for combining multiple content classifiers

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

128 Citations

30 Claims

Specification

Use Cases

Quick Links

Others

Probablistic models and methods for combining multiple content classifiers

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

128 Citations

30 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others