Tagging entities with descriptive phrases
First Claim
1. A system comprising:
- at least one device processor;
an etag collector that obtains a determined set of a plurality of determined description indicators that have been determined as description indicators that are specifically associated with a first domain;
an entity receiving component that obtains an entity indicator associated with the first domain;
a context determination component that initiates, via one or more of the at least one device processor, an analysis of a plurality of documents to identify occurrences of mentions of the obtained entity indicator and contexts associated with each of the occurrences of the mentions in each one of the plurality of documents;
a proximity determination component that, for each one of the plurality of documents that includes one or more of the determined description indicators that are also included in the previously determined set, and one or more identified occurrences of mentions of the obtained entity indicator, determines proximities of the determined description indicators, from the previously determined set, to the identified occurrences of mentions of the obtained entity indicator, based on the obtained contexts;
an etag association component that generates a description tag association between the obtained entity indicator and one of the description indicators from the determined plurality of description indicators associated with the first domain, using the determined proximities; and
a fitness evaluator that determines a measure of fitness associated with the description tag.
2 Assignments
0 Petitions
Accused Products
Abstract
A plurality of description phrases associated with a first domain may be determined, based on an analysis of a first plurality of documents to determine co-occurrences of the description phrases with one or more name labels associated with the first domain. An entity associated with the first domain may be obtained. An analysis of a second plurality of documents may be initiated to identify co-occurrences of mentions of the obtained entity and one or more of the plurality of description phrases, and contexts associated with each of the co-occurrences of the mentions and description phrases, in each one of the second plurality of documents. A description tag association between the obtained entity and one of the description phrases may be determined, based on an analysis of the identified contexts.
-
Citations
20 Claims
-
1. A system comprising:
-
at least one device processor; an etag collector that obtains a determined set of a plurality of determined description indicators that have been determined as description indicators that are specifically associated with a first domain; an entity receiving component that obtains an entity indicator associated with the first domain; a context determination component that initiates, via one or more of the at least one device processor, an analysis of a plurality of documents to identify occurrences of mentions of the obtained entity indicator and contexts associated with each of the occurrences of the mentions in each one of the plurality of documents; a proximity determination component that, for each one of the plurality of documents that includes one or more of the determined description indicators that are also included in the previously determined set, and one or more identified occurrences of mentions of the obtained entity indicator, determines proximities of the determined description indicators, from the previously determined set, to the identified occurrences of mentions of the obtained entity indicator, based on the obtained contexts; an etag association component that generates a description tag association between the obtained entity indicator and one of the description indicators from the determined plurality of description indicators associated with the first domain, using the determined proximities; and a fitness evaluator that determines a measure of fitness associated with the description tag. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method comprising:
-
determining a plurality of description phrases that are determined as description phrases associated with a first domain, based on an analysis of a first plurality of documents to determine co-occurrences of the description phrases with one or more name labels associated with the first domain, based on a plurality of text variants of Hearst patterns; obtaining an entity associated with the first domain; initiating, via a MapReduce platform, an analysis of a second plurality of documents to identify occurrences of mentions of the obtained entity and contexts associated with each of the occurrences of the mentions in each one of the second plurality of documents based on a set-based similarity containment analysis; and generating, via a device processor, a description tag association between the obtained entity and one of the determined description phrases from the determined plurality of description phrases associated with the first domain, using the identified occurrences of mentions of the obtained entity and contexts associated with each of the occurrences of the mentions. - View Dependent Claims (15, 16)
-
-
17. A computer program product comprising a hardware
machine readable storage device storing executable instructions that cause at least one data processing apparatus to: -
determine a plurality of description phrases that are determined as description phrases associated with a first domain, based on an analysis of a first plurality of documents to determine co-occurrences of the description phrases with one or more name labels associated with the first domain, based on a plurality of text variants of Hearst patterns; obtain an entity associated with the first domain; initiate, via a MapReduce platform, an analysis of a second plurality of documents to identify co-occurrences of mentions of the obtained entity and one or more of the plurality of determined description phrases, and contexts associated with each of the co-occurrences of the mentions and determined description phrases, in each one of the second plurality of documents based on a set-based similarity containment analysis; and generate a description tag association between the obtained entity and one of the determined description phrases from the determined plurality of description phrases associated with the first domain, using the identified co-occurrences of mentions of the obtained entity and one or more of the plurality of determined description phrases, and an analysis of the identified contexts. - View Dependent Claims (18, 19, 20)
-
Specification