NLP-based entity recognition and disambiguation
First Claim
1. A computer-implemented method for disambiguating one or more entities in an indicated text segment to present entity information to a user using a web browser, comprising:
- processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;
performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names;
generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the performed linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context;
disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities; and
presenting the entity information to the user using the web browser based on the disambiguation.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for entity recognition and disambiguation using natural language processing techniques are provided. Example embodiments provide an entity recognition and disambiguation system (ERDS) and process that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text. In at least some embodiments, supplemental or related information that can be used to assist in the recognition and/or disambiguation process can be retrieved from knowledge repositories such as an ontology knowledge base. In one embodiment, the ERDS comprises a linguistic analysis engine, a knowledge analysis engine, and a disambiguation engine that cooperate to identify candidate entities from a knowledge repository and determine which of the candidates best matches the one or more detected entities in a text segment using context information.
-
Citations
24 Claims
-
1. A computer-implemented method for disambiguating one or more entities in an indicated text segment to present entity information to a user using a web browser, comprising:
-
processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names; generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the performed linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context; disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities; and presenting the entity information to the user using the web browser based on the disambiguation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer-readable medium containing contents that, when executed, causes a computing system to perform a method to present entity information to a user using a web browser comprising:
-
processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names; generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the entity name that have been retrieved from a linguistic analysis of the surrounding context; automatically disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities retrieved from a knowledge base; and presenting the entity information to the user using the web browser based on the disambiguation. - View Dependent Claims (12)
-
-
13. An entity recognition and disambiguation computing system to present entity information to a user using a web browser, comprising:
-
a memory; a computer processor; and a recognition and disambiguation module stored in the memory that is configured, when executed on the computer processor, to receive a text segment for processing; process the received text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; perform linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the processed text segment by potential entity names; generate and store, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context; disambiguate the potential entities to determine a single named entity to which the received text segment is deemed to refer, based upon a combination of linguistic analysis, contextual information gleaned from text surrounding the potential entity name in the received text segment, and information retrieved from one or more knowledge repositories, by comparing the entity entity profiles generated for each potential entity with attributes of one or more stored candidate entities; and present the entity information to the user using the web browser based on the disambiguation. - View Dependent Claims (14, 15)
-
-
16. A computer-implemented method for presenting entity information to a user using a web browser comprising:
-
receiving an indication of a segment of text; and invoking a recognition and disambiguation module to process the indicated text segment to automatically determine one or more named entities referred to in the text segment, wherein the recognition and disambiguation module is configured to receive the indiciated text segment for processing; process the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; perform linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the processed text segment by potential entity names; generate and store, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the linguistic analysis of the processed text segment; disambiguate the potential entities to determine one or more named entities to which the received text segment is deemed to refer, based upon a combination of linguistic analysis, contextual information gleaned from text surrounding the potential entity names in the received text segment, and information retrieved from one or more knowledge repositories, by comparing the entity profiles generated for each potential entity with attributes of one or more candidate entities; and for each determined one or more named entities, cause entity information to be annotated by presenting, to the user using the web browser, a link to the entity information associated with the named entity based on the disambiguation. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
Specification