Disambiguation and tagging of entities

US 9,626,424 B2
Filed: 08/28/2013
Issued: 04/18/2017
Est. Priority Date: 05/12/2009
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

determining, by a computing device, a name in a sequence of text that identifies two or more candidate persons;

creating a first reference chain for a first candidate person of the two or more candidate persons;

creating a second reference chain for a second candidate person of the two or more candidate persons;

determining that the first reference chain and the second reference chain both comprise the name as conflicted entities;

determining first co-occurrence information based on one or more unconflicted entities, from the first reference chain, occurring in the sequence of text;

determining second co-occurrence information based on one or more unconflicted entities, from the second reference chain, occurring in the sequence of text;

determining, based on a comparison of the first co-occurrence information and the second co-occurrence information, a highest-ranked reference chain from the first reference chain and the second reference chain; and

determining, based on the highest-ranked reference chain, a person of the two or more candidate persons as being identified by the name.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Tagging of content items and entities identified therein may include a matching process, a classification process and a disambiguation process. Matching may include the identification of potential matching candidate entities in a content item whereas the classification process may categorize or group identified candidate entities according to known entities to which they are likely a match. In some instances, a candidate entity may be categorized with multiple known entities. Accordingly, a disambiguation process may be used to reduce the potential matches to a single known entity. In one example, the disambiguation process may include ranking potentially matching known entities according to a hierarchy of criteria.

208 Citations

25 Claims

1. A method comprising:
- determining, by a computing device, a name in a sequence of text that identifies two or more candidate persons;
  
  creating a first reference chain for a first candidate person of the two or more candidate persons;
  
  creating a second reference chain for a second candidate person of the two or more candidate persons;
  
  determining that the first reference chain and the second reference chain both comprise the name as conflicted entities;
  
  determining first co-occurrence information based on one or more unconflicted entities, from the first reference chain, occurring in the sequence of text;
  
  determining second co-occurrence information based on one or more unconflicted entities, from the second reference chain, occurring in the sequence of text;
  
  determining, based on a comparison of the first co-occurrence information and the second co-occurrence information, a highest-ranked reference chain from the first reference chain and the second reference chain; and
  
  determining, based on the highest-ranked reference chain, a person of the two or more candidate persons as being identified by the name.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein a first database identifier is uniquely associated with the first candidate person, and a second database identifier is uniquely associated with the second candidate person.
  - 3. The method of claim 1, wherein the two or more candidate persons comprise a first candidate person, a second candidate person, and a third candidate person, andwherein the determining the person of the two or more candidate persons as being identified by the name comprises:
    - ranking the first candidate person, the second candidate person, and the third candidate person based on a number of potential identifications respectively for the first candidate person, the second candidate person, and the third candidate person;
      
      determining that the third candidate person is ranked lower than the first candidate person and the second candidate person; and
      
      re-ranking the first candidate person and the second candidate person based on the number of potential identifications respectively for the first candidate person and the second candidate person.
  - 4. The method of claim 1, further comprising:
    - determining, based on capitalization of the name in the sequence of text, whether the person of the two or more candidate persons is identified by the name.
  - 5. The method of claim 1, further comprising:
    - based on the determining the name in the sequence of text, determining a relationship between one of the two or more candidate persons identified by the name in the sequence of text and one of a plurality of persons identified by a different name in the sequence of text,wherein the determining the person of the two or more candidate persons as being identified by the name is based at least in part on the relationship.
  - 6. The method of claim 1, wherein the sequence of text comprises a different name, wherein the different name is a name of a piece of media content, and wherein the determining the person of the two or more candidate persons as being identified by the name comprises determining respective relationships between the piece of media content and each of the two or more candidate persons.
  - 7. The method of claim 1, further comprising:
    - matching the name in the sequence of text with a string associated with each of the two or more candidate persons in a database comprising a plurality of previously-tagged persons;
      
      matching a different name in the sequence of text with a different string associated with a different person in the database comprising the plurality of previously-tagged persons; and
      
      evaluating respective relationships between each of the two or more candidate persons and the different person,wherein the determining the person of the two or more candidate persons as being identified by the name is based on one or more of the respective relationships between each of the two or more candidate persons and the different person.
  - 8. The method of claim 1, comprising:
    - determining that the person is referred to using an epithet in the sequence of text;
      
      finding the epithet in a different sequence of text; and
      
      determining that the epithet in the different sequence of text refers to the person,wherein the person is determined as being identified by the name based on determining that the epithet in the different sequence of text refers to the person.
  - 9. The method of claim 1, wherein the determining the highest-ranked reference chain is further based a confidence of a matching process used to determine the two or more candidate persons.
  - 10. The method of claim 1, wherein the determining the highest-ranked reference chain is further based on a comparison of a length of each of the first reference chain and the second reference chain.
  - 11. The method of claim 1, wherein the first reference chain is a sequence comprising one or more potentially-matching mentions of the first candidate person in the sequence of text, wherein the first reference chain comprises the name.
  - 12. The method of claim 11, wherein the first reference chain is formed according to an order in which the one or more potentially-matching mentions of the first candidate person appear in the sequence of text.
  - 13. The method of claim 1, wherein the sequence of text comprises multiple instances of the name, wherein the first reference chain and the second reference chain both comprise at least one instance of the multiple instances of the name, and wherein determining the person of the two or more candidate persons as being identified by the name comprises determining the person of the two or more candidate persons as being identified by the at least one instance of the multiple instances of the name.
  - 14. The method of claim 13, wherein at least one of the multiple instances of the name is not identical to at least one other of the multiple instances of the name.
  - 15. The method of claim 1, further comprising:
    - before creating the first reference chain, classifying the name in the sequence of text according to type of entity,wherein the creating the first reference chain and the creating the second reference chain are based on the classifying the name in the sequence of text according to the type of entity.
  - 16. The method of claim 1, further comprising:
    - tagging the name as identifying the person based on the determining the person of the two or more candidate persons.
  - 17. The method of claim 1, wherein the first co-occurrence information comprises a rate at which the one or more unconflicted entities, from the first reference chain, occur in the sequence of text.

18. A method comprising:
- determining, by a computing device, a title in a textual-content item, the title corresponding to a plurality of candidate content assets;
  
  creating a first reference chain for a first candidate video content asset of the plurality of candidate content assets, the first reference chain comprising the title;
  
  creating a second reference chain for a second candidate content asset of the plurality of candidate content assets, the second reference chain comprising the title;
  
  determining first co-occurrence information based on one or more unconflicted entities from the first reference chain for the first candidate content asset, occurring in the textual-content item;
  
  determining second co-occurrence information based on one or more unconflicted entities, from the second reference chain for the second candidate content asset, occurring in the textual-content item;
  
  determining a highest-ranked reference chain from the first reference chain and the second reference chain based on the first co-occurrence information and the second co-occurrence information; and
  
  determining, based on the highest-ranked reference chain, one of the first candidate content asset and the second candidate content asset as being identified by the title.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The method of claim 18, further comprising:
    - tagging the title with a database identifier of the one of the first candidate content asset and the second candidate content asset; and
      
      associating the title with a link to additional information about the one of the first candidate content asset and the second candidate content asset.
  - 20. The method of claim 18, further comprising:
    - determining a third co-occurrence between the title and a different title in the textual-content item, wherein the different title is associated with a third candidate content asset of the plurality of candidate content assets.
  - 21. The method of claim 18, further comprising:
    - classifying a first string in the textual-content item as referencing the title based on a plurality of words associated with titles being within a threshold number of words of the first string; and
      
      classifying a second string in the textual-content item as being at least one name in the textual-content item based on a plurality of words associated with names being within the threshold number of words of the second string.
  - 22. The method of claim 18, wherein the first co-occurrence information comprises a rate at which the one or more unconflicted entities, from the first reference chain for the first candidate content asset, occur in the textual-content item.

23. A method comprising:
- determining, by a computing device, an ambiguity of a name in a string of text associated with a piece of media content, wherein the ambiguity is based on the name identifying a plurality of persons;
  
  creating a first reference chain for a first person of the plurality of persons, the first reference chain comprising the name;
  
  creating a second reference chain for a second person of the plurality of persons, the second reference chain comprising the name;
  
  determining first co-occurrence information based on one or more unconflicted entities, from the first reference chain for the first person of the plurality of persons occurring in the string of text associated with the piece of media content;
  
  determining second co-occurrence information based on one or more unconflicted entities, from the second reference chain for the second person of the plurality of persons occurring in the string of text associated with the piece of media content;
  
  determining a highest-ranked reference chain from the first reference chain and the second reference chain based on the first co-occurrence information and the second co-occurrence information; and
  
  resolving the ambiguity based on the highest-ranked reference chain.
- View Dependent Claims (24, 25)
- - 24. The method of claim 23, further comprising:
    - determining a first relationship between a word in the string of text and the first person of the plurality of persons, wherein the word is different from the name;
      
      determining a second relationship between the word in the string of text and the second person of the plurality of persons; and
      
      determining, based on a comparison between the first relationship and the second relationship, that the name does not identify the second person.
  - 25. The method of claim 23, wherein the first co-occurrence information comprises a rate at which the one or more unconflicted entities, from the first reference chain for the first person of the plurality of persons, occur in the string of text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Comcast Interactive Media LLC (Comcast Corporation)
Original Assignee
Comcast Interactive Media LLC (Comcast Corporation)
Inventors
Houghton, David F.
Primary Examiner(s)
ASPINWALL, EVAN S

Application Number

US14/012,289
Publication Number

US 20140040272A1
Time in Patent Office

1,329 Days
Field of Search

707740, 707741, 707748, 707913, 707723, 707749, 707758
US Class Current
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 40/169   Annotation, e.g. comment da...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/295   Named entity recognition

G06F 40/30   Semantic analysis

Disambiguation and tagging of entities

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

208 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Disambiguation and tagging of entities

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

208 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links