NLP-based entity recognition and disambiguation
First Claim
1. A computer-implemented method for disambiguating one or more entities in an indicated text segment, comprising:
- processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;
performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names;
generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the performed linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context;
disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities retrieved from a data repository; and
invoking the method to annotate information on a web page.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for entity recognition and disambiguation using natural language processing techniques are provided. Example embodiments provide an entity recognition and disambiguation system (ERDS) and process that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text. In at least some embodiments, supplemental or related information that can be used to assist in the recognition and/or disambiguation process can be retrieved from knowledge repositories such as an ontology knowledge base. In one embodiment, the ERDS comprises a linguistic analysis engine, a knowledge analysis engine, and a disambiguation engine that cooperate to identify candidate entities from a knowledge repository and determine which of the candidates best matches the one or more detected entities in a text segment using context information.
183 Citations
23 Claims
-
1. A computer-implemented method for disambiguating one or more entities in an indicated text segment, comprising:
-
processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names; generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the performed linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context; disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities retrieved from a data repository; and invoking the method to annotate information on a web page. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium containing contents that, when executed causes a computing system to perform a method comprising:
-
processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names; generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the entity name that have been retrieved from a linguistic analysis of the surrounding context; automatically disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities retrieved from a knowledge base; and causing information to be annotated on a web page. - View Dependent Claims (16, 17, 18)
-
-
19. An entity recognition and disambiguation computing system, comprising;
-
a memory; a computer processor; and a recognition and disambiguation module stored in the memory that is configured, when executed on the computer processor, to receive a text segment for processing; process the received text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; perform linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the processed text segment by potential entity names; generate and store, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context; disambiguate the potential entities to determine a single named entity to which the received text segment is deemed to refer, based upon a combination of linguistic analysis, contextual information gleaned from text surrounding the potential entity name in the received text segment, and information retrieved from one or more knowledge repositories, by comparing the entity profiles generated for each potential entity with attributes of one or more candidate entities; and causing information on a web page to be annotated. - View Dependent Claims (20, 21, 22, 23)
-
Specification