Disambiguation and tagging of entities
First Claim
1. A method comprising:
- determining, by a computing device, a name in a sequence of text that identifies two or more candidate persons;
creating a first reference chain for a first candidate person of the two or more candidate persons;
creating a second reference chain for a second candidate person of the two or more candidate persons;
determining that the first reference chain and the second reference chain both comprise the name as conflicted entities;
determining first co-occurrence information based on one or more unconflicted entities, from the first reference chain, occurring in the sequence of text;
determining second co-occurrence information based on one or more unconflicted entities, from the second reference chain, occurring in the sequence of text;
determining, based on a comparison of the first co-occurrence information and the second co-occurrence information, a highest-ranked reference chain from the first reference chain and the second reference chain; and
determining, based on the highest-ranked reference chain, a person of the two or more candidate persons as being identified by the name.
1 Assignment
0 Petitions
Accused Products
Abstract
Tagging of content items and entities identified therein may include a matching process, a classification process and a disambiguation process. Matching may include the identification of potential matching candidate entities in a content item whereas the classification process may categorize or group identified candidate entities according to known entities to which they are likely a match. In some instances, a candidate entity may be categorized with multiple known entities. Accordingly, a disambiguation process may be used to reduce the potential matches to a single known entity. In one example, the disambiguation process may include ranking potentially matching known entities according to a hierarchy of criteria.
208 Citations
25 Claims
-
1. A method comprising:
-
determining, by a computing device, a name in a sequence of text that identifies two or more candidate persons; creating a first reference chain for a first candidate person of the two or more candidate persons; creating a second reference chain for a second candidate person of the two or more candidate persons; determining that the first reference chain and the second reference chain both comprise the name as conflicted entities; determining first co-occurrence information based on one or more unconflicted entities, from the first reference chain, occurring in the sequence of text; determining second co-occurrence information based on one or more unconflicted entities, from the second reference chain, occurring in the sequence of text; determining, based on a comparison of the first co-occurrence information and the second co-occurrence information, a highest-ranked reference chain from the first reference chain and the second reference chain; and determining, based on the highest-ranked reference chain, a person of the two or more candidate persons as being identified by the name. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method comprising:
-
determining, by a computing device, a title in a textual-content item, the title corresponding to a plurality of candidate content assets; creating a first reference chain for a first candidate video content asset of the plurality of candidate content assets, the first reference chain comprising the title; creating a second reference chain for a second candidate content asset of the plurality of candidate content assets, the second reference chain comprising the title; determining first co-occurrence information based on one or more unconflicted entities from the first reference chain for the first candidate content asset, occurring in the textual-content item; determining second co-occurrence information based on one or more unconflicted entities, from the second reference chain for the second candidate content asset, occurring in the textual-content item; determining a highest-ranked reference chain from the first reference chain and the second reference chain based on the first co-occurrence information and the second co-occurrence information; and determining, based on the highest-ranked reference chain, one of the first candidate content asset and the second candidate content asset as being identified by the title. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A method comprising:
-
determining, by a computing device, an ambiguity of a name in a string of text associated with a piece of media content, wherein the ambiguity is based on the name identifying a plurality of persons; creating a first reference chain for a first person of the plurality of persons, the first reference chain comprising the name; creating a second reference chain for a second person of the plurality of persons, the second reference chain comprising the name; determining first co-occurrence information based on one or more unconflicted entities, from the first reference chain for the first person of the plurality of persons occurring in the string of text associated with the piece of media content; determining second co-occurrence information based on one or more unconflicted entities, from the second reference chain for the second person of the plurality of persons occurring in the string of text associated with the piece of media content; determining a highest-ranked reference chain from the first reference chain and the second reference chain based on the first co-occurrence information and the second co-occurrence information; and resolving the ambiguity based on the highest-ranked reference chain. - View Dependent Claims (24, 25)
-
Specification