Training image-recognition systems using a joint embedding model on online social networks

US 10,026,021 B2
Filed: 09/27/2016
Issued: 07/17/2018
Est. Priority Date: 09/27/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising, by one or more computing systems:

identifying a shared visual concept in two or more visual-media items, wherein each visual-media item comprises one or more images, each image comprising one or more visual features, and wherein each visual-media item comprises one or more visual concepts, the shared visual concept being identified based on one or more shared visual features in the respective images of the visual-media items;

extracting, for each of the visual-media items, one or more n-grams from one or more communications associated with the visual-media item;

generating, in a d-dimensional space, an embedding for each of the visual-media items, wherein a location of the embedding for the visual-media item is based on the one or more visual concepts included in the visual-media item;

generating, in the d-dimensional space, an embedding for each of the extracted n-grams, wherein a location of the embedding for the n-gram is based on a frequency of occurrence of the n-gram in the communications associated with the visual-media items;

associating with the shared visual concept, one or more of the extracted n-grams that have embeddings within a threshold area of the embeddings for the identified visual-media items;

populating a visual-concept index that indexes visual concepts with their respective associated n-grams;

receiving, from a client system of a user, a search query comprising one or more n-grams;

determining, based on the visual-concept index, one or more visual concepts associated with the n-grams of the search query; and

sending, to the client system of the user, one or more search results comprising visual-media items in which the determined visual concepts are identified.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, a method includes identifying a shared visual concept in visual-media items based on shared visual features in images of the visual-media items; extracting, for each of the visual-media items, n-grams from communications associated with the visual-media item; generating, in a d-dimensional space, an embedding for each of the visual-media items at a location based on the visual concepts included in the visual-media item; generating, in the d-dimensional space, an embedding for each of the extracted n-grams at a location based on a frequency of occurrence of the n-gram in the communications associated with the visual-media items; and associating, with the shared visual concept, the extracted n-grams that have embeddings within a threshold area of the embeddings for the identified visual-media items.

198 Citations

20 Claims

1. A method comprising, by one or more computing systems:
- identifying a shared visual concept in two or more visual-media items, wherein each visual-media item comprises one or more images, each image comprising one or more visual features, and wherein each visual-media item comprises one or more visual concepts, the shared visual concept being identified based on one or more shared visual features in the respective images of the visual-media items;
  
  extracting, for each of the visual-media items, one or more n-grams from one or more communications associated with the visual-media item;
  
  generating, in a d-dimensional space, an embedding for each of the visual-media items, wherein a location of the embedding for the visual-media item is based on the one or more visual concepts included in the visual-media item;
  
  generating, in the d-dimensional space, an embedding for each of the extracted n-grams, wherein a location of the embedding for the n-gram is based on a frequency of occurrence of the n-gram in the communications associated with the visual-media items;
  
  associating with the shared visual concept, one or more of the extracted n-grams that have embeddings within a threshold area of the embeddings for the identified visual-media items;
  
  populating a visual-concept index that indexes visual concepts with their respective associated n-grams;
  
  receiving, from a client system of a user, a search query comprising one or more n-grams;
  
  determining, based on the visual-concept index, one or more visual concepts associated with the n-grams of the search query; and
  
  sending, to the client system of the user, one or more search results comprising visual-media items in which the determined visual concepts are identified.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, further comprising:
    - accessing a social graph comprising a plurality of nodes and a plurality of edges connecting the nodes, each of the edges between two of the nodes representing a single degree of separation between them, the nodes comprising;
      
      a first node corresponding to a user associated with an online social network; and
      
      a plurality of second nodes that each correspond to a visual-media item or a visual concept associated with the online social network.
  - 3. The method of claim 1, wherein extracting the one or more n-grams from communications associated with the visual-media items comprises filtering out one or more non-descriptive n-grams from a plurality of n-grams included in the communications, wherein the non-descriptive n-grams are present on a pre-generated list of non-descriptive n-grams.
  - 4. The method of claim 1, wherein one or more of the communications associated with the visual-media items are communications that include one or more of the visual-media items or one or more references to one or more of the visual-media items.
  - 5. The method of claim 1, wherein the location of the embedding for each of one or more of the visual-media items is a point in the d-dimensional space determined by projecting a vector representation of the visual-media item in the d-dimensional space.
  - 6. The method of claim 1, wherein the location of the embedding for each of one or more of the visual-media items is further based on metadata of the visual-media item.
  - 7. The method of claim 1, wherein the location of the embedding for each of one or more visual-media items is further based on a title or a description of the visual-media item.
  - 8. The method of claim 1, wherein the location of the embedding for each of one or more extracted n-grams is based on a triplet-loss algorithm, wherein the triplet-loss algorithm analyzes a plurality of information triplets, each of the information triplets comprising:
    - a media-item identifier corresponding to a particular visual-media item including a particular visual concept;
      
      a positive n-gram, wherein the positive n-gram is an n-gram that is included in a number of communications associated with the particular visual-media item that is greater than a threshold number; and
      
      a negative n-gram, wherein the negative n-gram is an n-gram that is not included in a minimum number of communications associated with the particular visual-media item.
  - 9. The method of claim 8, further comprising, for each particular visual concept:
    - compiling occurrences of the positive n-grams and the negative n-grams from information triplets comprising media-item identifiers corresponding to visual-media items including the particular visual concept;
      
      determining, for each positive n-gram, a count of occurrences of the positive n-gram;
      
      determining, for each negative n-gram, a count of occurrences of the negative n-gram; and
      
      determining locations of embeddings for the positive n-grams and the negative n-grams with respect to the locations of embeddings for the visual-media items having the particular visual concept, the locations of embeddings for each of the positive n-grams and each of the negative n-grams being based on their respective counts of occurrences.
  - 10. The method of claim 9, wherein a distance between the embedding for each positive n-gram and the embedding for the particular visual-media item is less than a distance between the embedding for each negative n-gram and the embedding for the particular visual-media item.
  - 11. The method of claim 1, wherein the location of the embedding for each of one or more extracted n-grams is further based on a topic associated with the n-gram, the topic being determined based on a topic index that indexes n-grams by topic.
  - 12. The method of claim 1, wherein the search results are displayed on the client system of the user in an order based on relative proximities of the embeddings for the respective visual-media items to the embeddings for one or more of the n-grams of the search query.
  - 13. The method of claim 12, wherein the order is further based on a relative degree of matching between one or more of the n-grams of the search query and one or more n-grams of the respective title or description of each of the visual-media items.
  - 14. The method of claim 1, wherein the visual-media items comprise one or more of videos, photos, or image files.

15. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
- identify a shared visual concept in two or more visual-media items, wherein each visual-media item comprises one or more images, each image comprising one or more visual features, and wherein each visual-media item comprises one or more visual concepts, the shared visual concept being identified based on one or more shared visual features in the respective images of the visual-media items;
  
  extract, for each of the visual-media items, one or more n-grams from one or more communications associated with the visual-media item;
  
  generate, in a d-dimensional space, an embedding for each of the visual-media items, wherein a location of the embedding for the visual-media item is based on the one or more visual concepts included in the visual-media item;
  
  generate, in the d-dimensional space, an embedding for each of the extracted n-grams, wherein a location of the embedding for the n-gram is based on a frequency of occurrence of the n-gram in the communications associated with the visual-media items;
  
  associate, with the shared visual concept, one or more of the extracted n-grams that have embeddings within a threshold area of the embeddings for the identified visual-media items;
  
  populate a visual-concept index that indexes visual concepts with their respective associated n-grams;
  
  receive, from a client system of a user, a search query comprising one or more n-grams;
  
  determine, based on the visual-concept index, one or more visual concepts associated with the n-grams of the search query; and
  
  send, to the client system of the user, one or more search results comprising visual-media items in which the determined visual concepts are identified.
- View Dependent Claims (16, 17, 19, 20)
- - 16. The media of claim 15, wherein the location of the embedding for each of one or more extracted n-grams is based on a triplet-loss algorithm, wherein the triplet-loss algorithm analyzes a plurality of information triplets, each of the information triplets comprising:
    - a media-item identifier corresponding to a particular visual-media item including a particular visual concept;
      
      a positive n-gram, wherein the positive n-gram is an n-gram that is included in a number of communications associated with the particular visual-media item that is greater than a threshold number; and
      
      a negative n-gram, wherein the negative n-gram is an n-gram that is not included in a minimum number of communications associated with the particular visual-media item.
  - 17. The media of claim 16, wherein the software is further operable when executed to, for each particular visual concept:
    - compile occurrences of the positive n-grams and the negative n-grams from information triplets comprising media-item identifiers corresponding to visual-media items including the particular visual concept;
      
      determine, for each positive n-gram, a count of occurrences of the positive n-gram;
      
      determine, for each negative n-gram, a count of occurrences of the negative n-gram; and
      
      determine locations of embeddings for the positive n-grams and the negative n-grams with respect to the locations of embeddings for the visual-media items having the particular visual concept, the locations of embeddings for each of the positive n-grams and each of the negative n-grams being based on their respective counts of occurrences.
  - 19. The media of claim 17, wherein a distance between the embedding for each positive n-gram and the embedding for the particular visual-media item is less than a distance between the embedding for each negative n-gram and the embedding for the particular visual-media item.
  - 20. The media of claim 15, wherein the location of the embedding for each of one or more extracted n-grams is further based on a topic associated with the n-gram, the topic being determined based on a topic index that indexes n-grams by topic.

18. A system comprising:
- one or more processors; and
  
  a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to;
  
  identify a shared visual concept in two or more visual-media items, wherein each visual-media item comprises one or more images, each image comprising one or more visual features, and wherein each visual-media item comprises one or more visual concepts, the shared visual concept being identified based on one or more shared visual features in the respective images of the visual-media items;
  
  extract, for each of the visual-media items, one or more n-grams from one or more communications associated with the visual-media item;
  
  generate, in a d-dimensional space, an embedding for each of the visual-media items, wherein a location of the embedding for the visual-media item is based on the one or more visual concepts included in the visual-media item;
  
  generate, in the d-dimensional space, an embedding for each of the extracted n-grams, wherein a location of the embedding for the n-gram is based on a frequency of occurrence of the n-gram in the communications associated with the visual-media items;
  
  associate with the shared visual concept, one or more of the extracted n-grams that have embeddings within a threshold area of the embeddings for the identified visual-media items;
  
  populate a visual-concept index that indexes visual concepts with their respective associated n-grams;
  
  receive, from a client system of a user, a search query comprising one or more n-grams;
  
  determine, based on the visual-concept index, one or more visual concepts associated with the n-grams of the search query; and
  
  send, to the client system of the user, one or more search results comprising visual-media items in which the determined visual concepts are identified.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Stoop, Dirk John, Paluri, Balmanohar
Primary Examiner(s)
Vu, Kim
Assistant Examiner(s)
BLOOM, NATHAN J

Application Number

US15/277,938
Publication Number

US 20180089541A1
Time in Patent Office

658 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 16/24573   using data annotations, e.g...

G06F 16/435   Filtering based on addition...

G06F 16/5838   using colour

G06F 16/5866   using information manually ...

G06F 16/587   using geographical or spati...

G06F 16/9535   Search customisation based ...

G06F 18/2415   based on parametric or prob...

G06N 20/00   Machine learning

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06Q 50/01   Social networking

G06V 10/424   Syntactic representation, e...

G06V 20/30   in albums, collections or s...

G06V 30/1985   Syntactic analysis, e.g. us...

Training image-recognition systems using a joint embedding model on online social networks

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

198 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Training image-recognition systems using a joint embedding model on online social networks

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

198 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links