System and method for content-based media analysis

US 10,055,489 B2
Filed: 02/29/2016
Issued: 08/21/2018
Est. Priority Date: 02/08/2016
Status: Active Grant

First Claim

Patent Images

1. A media analysis system comprising:

one or more hardware processors;

a memory storing synopses associated with a plurality of catalog books, each catalog book in the plurality of catalog books including a different synopsis; and

a content analysis engine, executable by the one or more hardware processors, configured to perform operations comprising;

generating a different media vector for each catalog book of the plurality of catalog books based on the synopsis of the catalog book, the generating comprising;

generating a word vector for each word of a plurality of words in the synopsis of the catalog book, thereby generating a plurality of word vectors;

combining the plurality of word vectors into a mean vector for the catalog book, the mean vector being the media vector; and

storing the mean vector, in the memory, as the media vector associated with the catalog book;

identifying a target book, the target book associated with a seed media vector;

determining R nearest neighbors for the target book from the plurality of catalog books based on (1) the seed media vector and (2) the media vectors associated with the plurality of catalog books;

clustering the R nearest neighbors for the target book into K clusters; and

selecting a second plurality of catalog books for recommendation to a user based on the K clusters.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A media analysis system includes one or more hardware processors, a memory storing synopses associated with catalog books, and a content analysis engine. The content analysis engine generates a media vector for each catalog book based on the associated synopsis by generating a word vector for each word in the synopsis, combining the plurality of word vectors into a mean vector for the catalog book, and storing the mean vector as the media vector associated with the catalog book. The content analysis engine also identifies a target book associated with a seed media vector, determines R nearest neighbors for the target book from the plurality of catalog books based on (1) the seed media vector and (2) the media vectors associated with the plurality of catalog books, clusters the R nearest neighbors into K clusters, and selects catalog books for recommendation to a user based on the K clusters.

Citations

20 Claims

1. A media analysis system comprising:
- one or more hardware processors;
  
  a memory storing synopses associated with a plurality of catalog books, each catalog book in the plurality of catalog books including a different synopsis; and
  
  a content analysis engine, executable by the one or more hardware processors, configured to perform operations comprising;
  
  generating a different media vector for each catalog book of the plurality of catalog books based on the synopsis of the catalog book, the generating comprising;
  
  generating a word vector for each word of a plurality of words in the synopsis of the catalog book, thereby generating a plurality of word vectors;
  
  combining the plurality of word vectors into a mean vector for the catalog book, the mean vector being the media vector; and
  
  storing the mean vector, in the memory, as the media vector associated with the catalog book;
  
  identifying a target book, the target book associated with a seed media vector;
  
  determining R nearest neighbors for the target book from the plurality of catalog books based on (1) the seed media vector and (2) the media vectors associated with the plurality of catalog books;
  
  clustering the R nearest neighbors for the target book into K clusters; and
  
  selecting a second plurality of catalog books for recommendation to a user based on the K clusters.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The media analysis system of claim 1, wherein generating the word vector for each word of the plurality of words in the synopsis of the catalog book comprises generating the word vector using word2vec.
  - 3. The media analysis system of claim 1,wherein the synopsis of the catalog book includes N words,wherein the plurality of word vectors includes a multiset of word vectors, X, of N word vectors {x₁, x₂, . . . , x_N}, andwherein combining the plurality of word vectors into the mean vector, v, comprises computing:
  - 4. The media analysis system of claim 1, wherein the content analysis engine is further configured to perform operations comprising:
    - computing a cosine distance, cosine(s,t), between each pairing of the seed media vector, s, and the associated media vector, t, of each catalog book of the plurality of catalog books;
  - 5. The media analysis system of claim 1, wherein the content analysis engine is further configured to perform operations comprising selecting one catalog book from each of the K clusters.
  - 6. The media analysis system of claim 1, wherein the content analysis engine is further configured to perform operations comprising:
    - clustering the seed media vector into a seed cluster, the seed cluster being one of the K clusters,wherein selecting the second plurality of catalog books based on the K clusters excludes the seed cluster.
  - 7. The media analysis system of claim 1, wherein the content analysis engine is further configured to perform operations comprising:
    - generating a media vector for each catalog movie of a plurality of catalog movies based on an associated synopsis of each catalog movie,wherein determining the R nearest neighbors for the target book further includes identifying the R nearest neighbors from the plurality of catalog movies, and further based on the media vectors associated with the plurality of catalog movies.

8. A computer-implemented method for content-based media analysis, the method comprising:
- generating a different media vector for each catalog book of a plurality of catalog books based on a synopsis of the catalog book, the generating comprising;
  
  generating a word vector for each word of a plurality of words in the synopsis of the catalog book, thereby generating a plurality of word vectors;
  
  combining the plurality of word vectors into a mean vector for the catalog book, the mean vector being the media vector; and
  
  storing the mean vector, in a memory, as the media vector associated with the catalog book;
  
  identifying a target book, the target book associated with a seed media vector;
  
  determining R nearest neighbors for the target book from the plurality of catalog books based on (1) the seed media vector and (2) the media vectors associated with the plurality of catalog books;
  
  clustering the R nearest neighbors for the target book into K clusters; and
  
  selecting a second plurality of catalog books for recommendation to a user based on the K clusters.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, wherein generating the word vector for each word of the plurality of words in the synopsis of the catalog book comprises generating the word vector using word2vec.
  - 10. The method of claim 8,wherein the synopsis of the catalog book includes N words,wherein the plurality of word vectors includes a multiset of word vectors, X, of N word vectors {x₁, x₂, . . . , x_N}, andwherein combining the plurality of word vectors into the mean vector, v, includes computing:
  - 11. The method of claim 8, further comprising:
    - computing a cosine distance, cosine(s,t), between each pairing of the seed media vector, s, and the associated media vector, t, of each catalog book of the plurality of catalog books;
  - 12. The method of claim 8, further comprising selecting one catalog book from each of the K clusters.
  - 13. The method of claim 8, further comprising:
    - clustering the seed media vector into a seed cluster, the seed cluster being one of the K clusters,wherein selecting the second plurality of catalog books based on the K clusters excludes the seed cluster.
  - 14. The method of claim 8, further comprising:
    - generating a media vector for each catalog movie of a plurality of catalog movies based on an associated synopsis of each catalog movie,wherein determining the R nearest neighbors for the target book further includes identifying the R nearest neighbors from the plurality of catalog movies, and further based on the media vectors associated with the plurality of catalog movies.

15. A non-transitory machine-readable medium storing processor-executable instructions which, when executed by a processor, cause the processor to:
- generate a different media vector for each catalog book of a plurality of catalog books based on a synopsis of the catalog book, the generating comprising;
  
  generating a word vector for each word of a plurality of words in the synopsis of the catalog book, thereby generating a plurality of word vectors;
  
  combining the plurality of word vectors into a mean vector for the catalog book, the mean vector being the media vector; and
  
  storing the mean vector, in a memory, as the media vector associated with the catalog book;
  
  identify a target book, the target book is associated with a seed media vector;
  
  determine R nearest neighbors for the target book from the plurality of catalog books based on (1) the seed media vector and (2) the media vectors associated with the plurality of catalog books;
  
  cluster the R nearest neighbors for the target book into K clusters; and
  
  select a second plurality of catalog books for recommendation to a user based on the K clusters.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The machine-readable medium of claim 15,wherein the synopsis of the catalog book includes N words,wherein the plurality of word vectors includes a multiset of word vectors, X, of N word vectors {x₁, x₂, . . . , x_N}, andwherein combining the plurality of word vectors into the mean vector, v, includes computing:
  - 17. The machine-readable medium of claim 15, wherein the processor-executable instructions further cause the processor to:
    - compute a cosine distance, cosine(s,t), between each pairing of the seed media vector, s, and the associated media vector, t, of each catalog book of the plurality of catalog books;
  - 18. The machine-readable medium of claim 15, wherein the processor-executable instructions further cause the processor to select one catalog book from each of the K clusters.
  - 19. The machine-readable medium of claim 15, wherein the processor-executable instructions further cause the processor to:
    - cluster the seed media vector into a seed cluster, the seed cluster being one of the K clusters,wherein selecting the second plurality of catalog books based on the K clusters excludes the seed cluster.
  - 20. The machine-readable medium of claim 15, wherein the processor-executable instructions further cause the processor to:
    - generate a media vector for each catalog movie of a plurality of catalog movies based on an associated synopsis of each catalog movie,wherein determining the R nearest neighbors for the target book further includes identifying the R nearest neighbors from the plurality of catalog movies, and further based on the media vectors associated with the plurality of catalog movies.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
eBay Inc.
Original Assignee
eBay Inc.
Inventors
Haviv, Adi Guila, Klein, Benjamin Eliot, Shetty, Krutika
Primary Examiner(s)
Cao, Phuong Thao

Application Number

US15/057,024
Publication Number

US 20170228382A1
Time in Patent Office

904 Days
Field of Search

707739
US Class Current
CPC Class Codes

G06F 16/355   Class or cluster creation o...

G06N 20/00   Machine learning

G06Q 30/0631   Item recommendations

System and method for content-based media analysis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for content-based media analysis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links