Learning synonymous object names from anchor texts
First Claim
1. A method of determining a synonymous name for an entity represented by an object stored in a fact repository, comprising:
- at a server having a plurality of processors and memory storing the repository and programs configured for execution by the processors, wherein the repository includes a plurality of facts extracted from web documents, wherein a subset of the facts is associated with the object, and wherein the object has an object name associated with the entity,identifying a source document from which one or more of the subset of facts of the entity represented by the object were derived;
identifying a plurality of linking documents having hyperlinks to the source document, each hyperlink having an anchor text;
generating a collection of synonym candidates for the entity using the anchor texts in the plurality of linking documents;
selecting a synonymous name for the entity represented by the object from the collection of synonym candidates, wherein selecting the synonymous name for the entity represented by the object from the collection of synonym candidates further comprises;
determining a score for each synonym candidate in the collection of synonym candidates based on a score function, the score function taking into account;
a frequency of occurrence of the synonym candidate in the collection of synonym candidates, anda proportion of the synonym candidate in the collection of synonym candidates;
selecting the synonymous name for the entity represented by the object from the collection of synonym candidates based on their scores; and
storing the synonymous name in the repository in association with the object in addition to the object name.
2 Assignments
0 Petitions
Accused Products
Abstract
A repository contains objects representing entities. The objects also include facts about the represented entities. The facts are derived from source documents. A synonymous name of an object is determined by identifying a source document from which one or more facts of the entity represented by the object were derived, identifying a plurality of linking documents that link to the source document through hyperlinks, each hyperlink having an anchor text, processing the anchor texts in the plurality of linking documents to generate a collection of synonym candidates for the entity represented by the object, and selecting a synonymous name for the entity represented by the object from the collection of synonym candidates.
-
Citations
27 Claims
-
1. A method of determining a synonymous name for an entity represented by an object stored in a fact repository, comprising:
at a server having a plurality of processors and memory storing the repository and programs configured for execution by the processors, wherein the repository includes a plurality of facts extracted from web documents, wherein a subset of the facts is associated with the object, and wherein the object has an object name associated with the entity, identifying a source document from which one or more of the subset of facts of the entity represented by the object were derived; identifying a plurality of linking documents having hyperlinks to the source document, each hyperlink having an anchor text; generating a collection of synonym candidates for the entity using the anchor texts in the plurality of linking documents; selecting a synonymous name for the entity represented by the object from the collection of synonym candidates, wherein selecting the synonymous name for the entity represented by the object from the collection of synonym candidates further comprises; determining a score for each synonym candidate in the collection of synonym candidates based on a score function, the score function taking into account; a frequency of occurrence of the synonym candidate in the collection of synonym candidates, and a proportion of the synonym candidate in the collection of synonym candidates; selecting the synonymous name for the entity represented by the object from the collection of synonym candidates based on their scores; and storing the synonymous name in the repository in association with the object in addition to the object name. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
15. A system for determining a synonymous name for an entity represented by an object stored in a repository, comprising:
-
a processor for executing programs; memory for storing the repository and the programs, wherein the repository includes a plurality of facts extracted from web documents, wherein a subset of the facts is associated with the object, and wherein the object has an object name associated with the entity; and a subsystem executable by the processor, the subsystem including; instructions for identifying a source document from which one or more of the subset of facts of the entity represented by the object were derived; instructions for identifying a plurality of linking documents having hyperlinks to the source document, each hyperlink having an anchor text; instructions for generating a collection of synonym candidates for the entity using the anchor texts in the plurality of linking documents; instructions for selecting a synonymous name for the entity represented by the object from the collection of synonym candidates, wherein selecting the synonymous name for the entity represented by the object from the collection of synonym candidates further comprises; determining a score for each synonym candidate in the collection of synonym candidates based on a score function, the score function taking into account; a frequency of occurrence of the synonym candidate in the collection of synonym candidates, and a proportion of the synonym candidate in the collection of synonym candidates selecting the synonymous name for the entity represented by the object from the collection of synonym candidates based on their scores; and instructions for storing the synonymous name in the repository in association with the object in addition to the object name. - View Dependent Claims (16, 17, 21, 22, 23, 24, 25, 26, 27)
-
-
18. A computer program product stored on a non-transitory computer readable storage medium and for use in conjunction with a computer system, the computer program product comprising a computer program mechanism embedded therein, the computer program mechanism including:
-
instructions for identifying a source document from which one or more facts of an entity represented by an object were derived, the facts and the object being stored in a repository that can be accessed by the computer system, the facts being associated with the object, and the object having an object name associated with the entity; instructions for identifying a plurality of linking documents having hyperlinks to the source document, each hyperlink having an anchor text; instructions for generating a collection of synonym candidates for the entity using the anchor texts in the plurality of linking documents; instructions for selecting a synonymous name for the entity represented by the object from the collection of synonym candidates, wherein selecting the synonymous name for the entity represented by the object from the collection of synonym candidates further comprises; determining a score for each synonym candidate in the collection of synonym candidates based on a score function, the score function taking into account; a frequency of occurrence of the synonym candidate in the collection of synonym candidates, and a proportion of the synonym candidate in the collection of synonym candidates; selecting the synonymous name for the entity represented by the object from the collection of synonym candidates based on their scores; and instructions for storing the synonymous name in the repository in association with the object in addition to the object name. - View Dependent Claims (19, 20)
-
Specification