NLP-based entity recognition and disambiguation
First Claim
1. A computer-implemented method for disambiguating one or more entities in an indicated text segment, comprising:
- processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;
performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names;
generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information, the surrounding context and linguistic information retrieved from the performed linguistic analysis of the processed text segment, by;
retrieving, from the linguistic analysis of the processed text segment, actions and/or modifiers associated with the determined potential entity name, the actions and/or modifiers appearing in the context surrounding the potential entity names in the indicated text segment;
determining one or more roles that are attributable to the potential entity based upon the retrieved actions and/or modifiers associated with the determined potential entity names; and
storing, in the entity profile data structure as part of the properties associated with the potential entity, the determined one or more roles; and
disambiguating which entities are being referred to in the indicated text segment by determining one or more most likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities, including a comparison of the determined one or more roles stored in the profile generated for each potential entity with the one or more roles of each of the candidate entities.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for entity recognition and disambiguation using natural language processing techniques are provided. Example embodiments provide an entity recognition and disambiguation system (ERDS) and process that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text. In at least some embodiments, supplemental or related information that can be used to assist in the recognition and/or disambiguation process can be retrieved from knowledge repositories such as an ontology knowledge base. In one embodiment, the ERDS comprises a linguistic analysis engine, a knowledge analysis engine, and a disambiguation engine that cooperate to identify candidate entities from a knowledge repository and determine which of the candidates best matches the one or more detected entities in a text segment using context information.
185 Citations
24 Claims
-
1. A computer-implemented method for disambiguating one or more entities in an indicated text segment, comprising:
-
processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names; generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information, the surrounding context and linguistic information retrieved from the performed linguistic analysis of the processed text segment, by; retrieving, from the linguistic analysis of the processed text segment, actions and/or modifiers associated with the determined potential entity name, the actions and/or modifiers appearing in the context surrounding the potential entity names in the indicated text segment; determining one or more roles that are attributable to the potential entity based upon the retrieved actions and/or modifiers associated with the determined potential entity names; and storing, in the entity profile data structure as part of the properties associated with the potential entity, the determined one or more roles; and disambiguating which entities are being referred to in the indicated text segment by determining one or more most likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities, including a comparison of the determined one or more roles stored in the profile generated for each potential entity with the one or more roles of each of the candidate entities. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable medium containing contents that, when executed causes a computing system to perform a method comprising:
-
processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names; generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information, the surrounding context and linguistic information retrieved from the performed linguistic analysis of the processed text segment, by; retrieving, from the linguistic analysis of the processed text segment, actions and/or modifiers associated with the determined potential entity name, the actions and/or modifiers appearing in the context surrounding the potential entity names in the indicated text segment; determining one or more roles that are attributable to the potential entity based upon the retrieved actions and/or modifiers associated with the determined potential entity names; and storing, in the entity profile data structure as part of the properties associated with the potential entity, the determined one or more roles; and automatically disambiguating which entities are being referred to in the indicated text segment by determining one or more most likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities, including a comparison of the determined one or more roles stored in the profile generated for each potential entity with the one or more roles of each of the candidate entities. - View Dependent Claims (17, 18, 19)
-
-
20. An entity recognition and disambiguation computing system, comprising;
-
a memory; and a recognition and disambiguation module stored in the memory that is configured, when executed, to receive a text segment for processing; recognize one or more candidate named entities which are referred to by a potential entity detected in a received text segment based, at least in part, upon a natural language analysis of the text segment; and disambiguate the candidate named entities to determine a single named entity to which the detected entity in the received text segment is deemed to refer based upon a combination of linguistic analysis, contextual information gleaned from text surrounding the detected entity mention in the received text segment, and information retrieved from one or more knowledge repositories, wherein the contextual information and linguistic-analysis is used to determine and store;
as part of an entity profile data structure generated for the detected entity, at least one or more roles of the detected entity based upon determining actions and/or modifiers from the linguistic analysis of the surrounding text, and wherein the one or more roles of the entity profile data structure generated for the detected entity are compared to one or more roles of the candidate entities to help disambiguate the candidate named entities. - View Dependent Claims (21, 22, 23, 24)
-
Specification