Identifying entities in email signature blocks
First Claim
1. A system for identifying entities in email signature blocks, the apparatus comprising:
- one or more processors; and
a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to;
create a plurality of scores for each token, in a sequence of tokens from an email signature block, based on a corresponding independent probability distribution that has been previously trained for a plurality of entity types, wherein each token comprises one of a word, a punctuation symbol, and an end-of-line character, an entity being a part of one of a person name, a job title, an enterprise name, a telephone number, an email address, and a uniform resource locator, and being associated with at least one of an entity type, an entity sequence, and a set of entities;
identify each entity sequence that has a total number of entities that is identical to a total number of tokens in the sequence of tokens;
determine, for each of the identified entity sequences, an entity sequence score by combining corresponding scores for each token in the sequence of tokens, that corresponds to an entity type in an identified entity sequence;
identify an entity sequence from the identified entity sequences with a highest entity sequence score; and
output the sequence of tokens as an identified set of entities, in the email signature block, based on the entity sequence with the highest score.
2 Assignments
0 Petitions
Accused Products
Abstract
Identifying entities in email signature blocks is described. A system scores each token, in a sequence of tokens from an email signature block, based on entity types, wherein each token is a word, a punctuation symbol, or an end-of-line character. The system identifies each entity sequence which includes a number of entities that matches the number of tokens in the sequence of tokens. The system identifies an entity sequence with a highest score based on applying scores for each token in the sequence of tokens to each identified entity sequence. The system outputs the sequence of tokens as an identified set of entities based on the entity sequence with the highest score.
141 Citations
20 Claims
-
1. A system for identifying entities in email signature blocks, the apparatus comprising:
one or more processors; and a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to; create a plurality of scores for each token, in a sequence of tokens from an email signature block, based on a corresponding independent probability distribution that has been previously trained for a plurality of entity types, wherein each token comprises one of a word, a punctuation symbol, and an end-of-line character, an entity being a part of one of a person name, a job title, an enterprise name, a telephone number, an email address, and a uniform resource locator, and being associated with at least one of an entity type, an entity sequence, and a set of entities; identify each entity sequence that has a total number of entities that is identical to a total number of tokens in the sequence of tokens; determine, for each of the identified entity sequences, an entity sequence score by combining corresponding scores for each token in the sequence of tokens, that corresponds to an entity type in an identified entity sequence; identify an entity sequence from the identified entity sequences with a highest entity sequence score; and output the sequence of tokens as an identified set of entities, in the email signature block, based on the entity sequence with the highest score. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A computer program product comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, the program code including instructions to:
-
create a plurality of scores for each token, in a sequence of tokens from an email signature block, based on a corresponding independent probability distribution that has been previously trained for a plurality of entity types, wherein each token comprises one of a word, a punctuation symbol, and an end-of-line character, an entity being a part of one of a person name, a job title, an enterprise name, a telephone number, an email address, and a uniform resource locator, and being associated with at least one of an entity type, an entity sequence, and a set of entities; identify each entity sequence that has a total number of entities that is identical to a total number of tokens in the sequence of tokens; determine, for each of the identified entity sequences, an entity sequence score by combining corresponding scores for each token, in the sequence of tokens, that corresponds to an entity type in an identified entity sequence; identify an entity sequence from the identified entity sequences with a highest entity sequence score; and output the sequence of tokens as an identified set of entities, in the email signature block, based on the entity sequence with the highest score. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method for identifying entities in email signature blocks, the method comprising:
-
creating a plurality of scores for each token, in a sequence of tokens from an email signature block, based on a corresponding independent probability distribution that has been previously trained for a plurality of entity types, wherein each token comprises one of a word, a punctuation symbol, and an end-of-line character, an entity being a part of one of a person name, a job title, an enterprise name, a telephone number, an email address, and a uniform resource locator, and being associated with at least one of an entity type, an entity sequence, and a set of entities; identifying each entity sequence that has a total number of entities that is identical to a total number of tokens in the sequence of tokens; determining, for each of the identified entity sequences, an entity sequence score by combining corresponding scores for each token, in the sequence of tokens, that corresponds to an entity type in an identified entity sequence; identifying an entity sequence from the identified entity sequences with a highest entity sequence score; and outputting the sequence of tokens as an identified set of entities, in the email signature block, based on the entity sequence with the highest score. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification