Ambiguous entity disambiguation method
First Claim
1. An ambiguous entity disambiguation method, wherein an article comprises entities and each entity is a single-word or a multi-word entity, wherein at least one entity has an ambiguous meaning, the method comprising the steps of:
- providing a disambiguation database which references a digital encyclopedia database, the disambiguation database comprising links to redirect pages of the digital encyclopedia database, links to disambiguation pages of the digital encyclopedia database, and for each redirect page and disambiguation page, the popularity of the page and the type of page;
extracting entities from the article;
combining multi-word entities;
creating entity aliases for combined multi-word entities;
searching the disambiguation database for pages in the digital encyclopedia database matching each extracted entity and entity alias;
for each matching page, creating a list of links to other encyclopedia pages;
scoring each extracted entity and entity alias according to the list of links and disambiguation database;
adjusting each of the scores; and
for each entity, selecting the highest scoring entity alias;
whereby the entity type for each entity is the type of matching page for the highest scoring entity alias in the disambiguation database.
1 Assignment
0 Petitions
Accused Products
Abstract
Ambiguous entities extracted from an article are disambiguated to determine an entity type. Entities are extracted, combined, and entity aliases are created. The entity type is determined by searching a disambiguation database for matching pages in a digital encyclopedia database. A score is computed for each entity and entity alias according to a number of links in the matching pages, and according to a page popularity for the matching pages in the disambiguation database. The highest scoring entity alias is selected and the entity type is the page type of the matching page. Abstracts for the entities may also be retrieved from the matching pages.
-
Citations
16 Claims
-
1. An ambiguous entity disambiguation method, wherein an article comprises entities and each entity is a single-word or a multi-word entity, wherein at least one entity has an ambiguous meaning, the method comprising the steps of:
-
providing a disambiguation database which references a digital encyclopedia database, the disambiguation database comprising links to redirect pages of the digital encyclopedia database, links to disambiguation pages of the digital encyclopedia database, and for each redirect page and disambiguation page, the popularity of the page and the type of page; extracting entities from the article; combining multi-word entities; creating entity aliases for combined multi-word entities; searching the disambiguation database for pages in the digital encyclopedia database matching each extracted entity and entity alias; for each matching page, creating a list of links to other encyclopedia pages; scoring each extracted entity and entity alias according to the list of links and disambiguation database; adjusting each of the scores; and for each entity, selecting the highest scoring entity alias; whereby the entity type for each entity is the type of matching page for the highest scoring entity alias in the disambiguation database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An ambiguous entity disambiguation method for an entity in an article, the method comprising:
-
providing a digital encyclopedia database; creating a disambiguation database from the digital encyclopedia database; and determining the entity type of the entity in the article from the disambiguation database and digital encyclopedia database. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer program product for ambiguous entity disambiguation, wherein an article comprises entities and each entity is a single-word or a multi-word entity, wherein at least one entity has an ambiguous meaning, the program product comprising:
-
a computer readable medium; disambiguation database means stored on said computer readable medium for providing a disambiguation database which references a digital encyclopedia database, the disambiguation database comprising links to redirect pages of the digital encyclopedia database, links to disambiguation pages of the digital encyclopedia database, and for each redirect page and disambiguation page, the popularity of the page and the type of page; extracting entities means stored on said computer readable medium for extracting entities from the article; combining means stored on said computer readable medium for combining multi-word entities; creating means stored on said computer readable medium for creating entity aliases for combined multi-word entities; searching means stored on said computer readable medium for searching the disambiguation database for pages in the digital encyclopedia database matching each extracted entity and entity alias; creating means stored on said computer readable medium for creating a list of links for each matching page to other encyclopedia pages; scoring means stored on said computer readable medium for scoring each extracted entity and entity alias according to the list of links and disambiguation database; adjusting means stored on said computer readable medium for adjusting each of the scores; and selecting means stored on said computer readable medium for selecting the highest scoring entity alias for each entity.
-
Specification