ENTITY RECOGNITION USING PROBABILITIES FOR OUT-OF-COLLECTION DATA
First Claim
Patent Images
1. A computer-implemented process for building a classifier for associating an entity with a document, comprising:
- accessing a dictionary that maps entities to related terms;
partitioning the dictionary by entity names to provide a set of partitions, each partition relating to an entity name;
estimating a probability that an entity having the entity name for a partition is not represented in the dictionary; and
creating a classifier for the partition including the estimated probability.
2 Assignments
0 Petitions
Accused Products
Abstract
A classifier that disambiguates among entities based on a dictionary, such as corpus of documents about those entities, is built by incorporating probabilities that an entity exists that is not in the dictionary. Given a document it is associated by the classifier with an entity. By incorporating out of collection probabilities into the classifier, a higher level of confidence in the match between an entity and a document is achieved.
42 Citations
20 Claims
-
1. A computer-implemented process for building a classifier for associating an entity with a document, comprising:
-
accessing a dictionary that maps entities to related terms; partitioning the dictionary by entity names to provide a set of partitions, each partition relating to an entity name; estimating a probability that an entity having the entity name for a partition is not represented in the dictionary; and creating a classifier for the partition including the estimated probability. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computing machine comprising:
-
a partitioner having an input for receiving a dictionary of documents related to entities and an output providing a set of partitions, each partition relating to an entity name; a statistics module having an input for receiving the dictionary and an output providing statistics regarding terms used in the dictionary; a classification builder having an input for receiving data about the partitions and an input for receiving the statistics, and providing data describing a classifier as an output, wherein the classifier incorporates an estimated probability that an entity having the entity name for a partition is not represented in the dictionary. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented process for associating an entity with a document, comprising:
-
accessing a classifier associated with an entity name in the document, wherein the classifier incorporates an estimated probability that an entity having the entity name for a partition is not represented in a dictionary; and applying the classifier to the document to obtain probabilities that the document is associated with specific entities having the entity name. - View Dependent Claims (18, 19, 20)
-
Specification