Method and system for identifying entities
First Claim
1. A non-transitory computer readable medium storing a program which when executed by at least one processing unit identifies an entity having an entity attribute in a document, the program comprising sets of instructions for:
- receiving, from each process of a plurality of processes, a corresponding set of candidate identity attributes that are each for identifying a particular entity having said entity attribute specified in the document, wherein each process of the plurality of processes generates the corresponding set of candidate identity attributes based on the entity attribute specified in the document;
calculating a score for each candidate identity attribute in the sets of candidate identity attributes, the calculating of a score for a particular candidate identity attribute comprising (1) identifying a set of tokens in the particular candidate identity attribute, (2) assigning a value to each token in the set of tokens based on a token count that represents a number of instances of the token across the sets of candidate identity attributes and (3) calculating the score based on the assigned values; and
identifying, based on the scores calculated for the candidate identity attributes, an identity attribute from the sets of candidate identity attributes that identifies the entity having said entity attribute specified in the document,wherein a process in the plurality of processes comprises a service that identifies the set of candidate identity attributes based on a probability of a set of keywords appearing in the document.
5 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments provide a program that identifies an entity having an entity attribute. The program receives, from each method of several methods, a set of candidate identity attributes that are each for identifying a particular entity having the entity attribute specified in the document. Each method of the several methods generates the corresponding set of candidate identity attributes based on the entity attribute specified in a document. The program calculates a score for each candidate identity attribute in the sets of candidate identity attributes. The program identifies, based on the sets of scores, an identity attribute from the sets of candidate identity attributes that identifies the entity having the entity attribute specified in the document.
85 Citations
36 Claims
-
1. A non-transitory computer readable medium storing a program which when executed by at least one processing unit identifies an entity having an entity attribute in a document, the program comprising sets of instructions for:
-
receiving, from each process of a plurality of processes, a corresponding set of candidate identity attributes that are each for identifying a particular entity having said entity attribute specified in the document, wherein each process of the plurality of processes generates the corresponding set of candidate identity attributes based on the entity attribute specified in the document; calculating a score for each candidate identity attribute in the sets of candidate identity attributes, the calculating of a score for a particular candidate identity attribute comprising (1) identifying a set of tokens in the particular candidate identity attribute, (2) assigning a value to each token in the set of tokens based on a token count that represents a number of instances of the token across the sets of candidate identity attributes and (3) calculating the score based on the assigned values; and identifying, based on the scores calculated for the candidate identity attributes, an identity attribute from the sets of candidate identity attributes that identifies the entity having said entity attribute specified in the document, wherein a process in the plurality of processes comprises a service that identifies the set of candidate identity attributes based on a probability of a set of keywords appearing in the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for identifying entities having entity attributes in documents, the system comprising:
-
a set of processing units; a non-transitory computer readable medium storing a program for execution by at least one processing unit in the set of processing units, the program comprising; a method processing module for receiving, from each process of a plurality of processes, a corresponding set of candidate identity attributes for identifying a particular entity having an entity attribute specified in a document, wherein each process in the plurality of processes generates the corresponding set of candidate identity attributes based on the entity attribute specified in the document; and a scoring module for calculating a score for each candidate identity attribute in the sets of candidate identity attributes, the calculating of a score for a particular candidate identity attribute comprising (1) identifying a set of tokens in the particular candidate identity attribute, (2) assigning a value to each token in the set of tokens based on a token count that represents a number of instances of the token across the sets of candidate identity attributes and (3) calculating the score based on the assigned values, the method processing module further for, based on the scores calculated for the candidate identity attributes, identifying an identity attribute from the sets of candidate identity attributes that identifies the entity having said entity attribute specified in the document, wherein a process in the plurality of processes comprises a service that identifies the corresponding set of candidate identity attributes based on a probability of a set of keywords appearing in the document. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A method for identifying an entity having an entity attribute in a document, the method comprising:
-
receiving, from each process of a plurality of processes, a corresponding set of candidate identity attributes that are each for identifying a particular entity having said entity attribute specified in the document, wherein each process of the plurality of processes generates the corresponding set of candidate identity attributes based on the entity attribute specified in the document; calculating a score for each candidate identity attribute in the sets of candidate identity attributes, the calculating of a score for a particular candidate identity attribute comprising (1) identifying a set of tokens in the particular candidate identity attribute, (2) assigning a value to each token in the sets of tokens based on a token count that represents a number of instances of the token across the sets of candidate identity attributes and (3) calculating the score based on the assigned values; and identifying, based on the scores calculated for the candidate identity attributes, an identity attribute from the sets of candidate identity attributes that identifies the entity having said entity attribute specified in the document, wherein a process in the plurality of processes comprises a service that identifies the corresponding set of candidate identity attributes based on a probability of a set of keywords appearing in the document. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
Specification