NLP-based entity recognition and disambiguation
First Claim
1. A computer-implemented method for identifying one or more entities in an indicated text segment, comprising:
- processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles;
performing linguistic analysis of the processed text segment to determine one or more potential entity names which are referred to in the text segment;
generating, for each potential entity name, an entity profile having one or more associated properties that characterize the entity based upon surrounding context; and
determining one or more mostly likely entities which are referred to in the text segment by comparing the entity profiles generated for each potential entity name with one or more candidate entities using both linguistic and contextual information.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for entity recognition and disambiguation using natural language processing techniques are provided. Example embodiments provide an entity recognition and disambiguation system (ERDS) and process that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text. In at least some embodiments, supplemental or related information that can be used to assist in the recognition and/or disambiguation process can be retrieved from knowledge repositories such as an ontology knowledge base. In one embodiment, the ERDS comprises a linguistic analysis engine, a knowledge analysis engine, and a disambiguation engine that cooperate to identify candidate entities from a knowledge repository and determine which of the candidates best matches the one or more detected entities in a text segment using context information.
614 Citations
24 Claims
-
1. A computer-implemented method for identifying one or more entities in an indicated text segment, comprising:
-
processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entity names which are referred to in the text segment; generating, for each potential entity name, an entity profile having one or more associated properties that characterize the entity based upon surrounding context; and determining one or more mostly likely entities which are referred to in the text segment by comparing the entity profiles generated for each potential entity name with one or more candidate entities using both linguistic and contextual information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable medium containing contents that, when executed causes a computing system to perform a method comprising:
-
processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entity names which are referred to in the text segment; generating, for each potential entity name, an entity profile having one or more associated properties that characterize the entity based upon surrounding context; and automatically determining one or more mostly likely entities which are referred to in the text segment by comparing the entity profiles generated for each potential entity name with one or more candidate entities from a knowledge base using both linguistic and contextual information. - View Dependent Claims (17, 18, 19)
-
-
20. An entity recognition and disambiguation computing system, comprising;
-
a memory; and a recognition and disambiguation module stored in the memory that is configured, when executed, to receive a text segment for processing; recognize one or more candidate named entities which are referred to by a detected entity in a received text segment based, at least in part, upon a natural language analysis of the text segment; and disambiguate the candidate named entities to determine a single named entity to which the detected entity in the received text segment is deemed to refer based upon a combination of linguistic analysis, contextual information gleaned from surrounding text, and information retrieved from one or more knowledge repositories. - View Dependent Claims (21, 22, 23, 24)
-
Specification