Determination of relationships between collections of disparate media types

US 9,864,817 B2
Filed: 01/28/2012
Issued: 01/09/2018
Est. Priority Date: 01/28/2012
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a relationship component that automatically determines relationships between disparate collections of media types, the collections and relationships computed in a single process and employed as part of processing a query to return documents at query time, the relationship component computing vectors for the collections, the vectors including query vectors and document vectors both of which are generated based on a collection-wise probabilistic algorithm and are processed with a similarity function to create a combined model of query-document labeled data; and

a processor that executes computer-executable instructions associated with at least the relationship component,wherein the relationship component employs a cost function that defines the relationships between the disparate collections of media based on truth data, the defined relationships being usable in the processing of a query, andwherein the collections are clusters that are concurrently created as query clusters and document clusters, and the relationship component computes the relationships between the query clusters and document clusters.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Architecture that automatically determines relationships between vector spaces of disparate media types, and outputs ranker signals based on these relationships, all in a single process. The architecture improves search result relevance by simultaneously clustering queries and documents, and enables the training of a model for creating one or more ranker signals using simultaneous clustering of queries and documents in their respective spaces.

67 Citations

20 Claims

1. A system, comprising:
- a relationship component that automatically determines relationships between disparate collections of media types, the collections and relationships computed in a single process and employed as part of processing a query to return documents at query time, the relationship component computing vectors for the collections, the vectors including query vectors and document vectors both of which are generated based on a collection-wise probabilistic algorithm and are processed with a similarity function to create a combined model of query-document labeled data; and
  
  a processor that executes computer-executable instructions associated with at least the relationship component,wherein the relationship component employs a cost function that defines the relationships between the disparate collections of media based on truth data, the defined relationships being usable in the processing of a query, andwherein the collections are clusters that are concurrently created as query clusters and document clusters, and the relationship component computes the relationships between the query clusters and document clusters.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein the query is of a media type that is different from a media type of the documents.
  - 3. The system of claim 1, wherein the disparate collections include a text media type as the query and an image media type as the documents.
  - 4. The system of claim 1, wherein the cost function minimizes cross-entropy and KL-divergence to create the combined model of query-document labeled data.
  - 5. The system of claim 1, wherein the relationship component computes vectors for both query collections and document collections, elements of the vectors being probabilistic data that a given query and document belong to a common collection.
  - 6. The system of claim 1, wherein the system returns documents that are images, andwherein the processing a query includes processing respective second-order image features that are invariant with respect to a size of the image.
  - 7. The system of claim 1, wherein the number of clusters is fixed, wherein the queries and documents are projected into a probabilistic space of N clusters, and wherein respective functions are used map the queries and documents into a simplex of a dimension N.

8. A system, comprising:
- a relationship component that automatically determines relationships between disparate collections of media types, the collections and relationships computed in a single process and employed as part of processing a query to return documents at query time, the relationship component computing a total weight of each query in a query collection, and a probability that the query belongs in a given query collection; and
  
  a processor that executes computer-executable instructions associated with at least the relationship component,wherein the relationship component employs a cost function that defines the relationships between the disparate collections of media based on truth data, the defined relationships being usable in the processing of a query, andwherein the collections are clusters that are concurrently created as query clusters and document clusters, and the relationship component computes the relationships between the query clusters and document clusters.

9. A system, comprising:
- a relationship component that automatically determines relationships between disparate collections of media types, the collections and relationships computed in a single process and employed as part of processing a query to return documents at query time, the relationship component computing a total weight of each document in a document collection, and a probability that the document belongs in a given document collection; and
  
  a processor that executes computer-executable instructions associated with at least the relationship component,wherein the relationship component employs a cost function that defines the relationships between the disparate collections of media based on truth data, the defined relationships being usable in the processing of a query, andwherein the collections are clusters that are concurrently created as query clusters and document clusters, and the relationship component computes the relationships between the query clusters and document clusters.

10. A method, comprising:
- processing a query of a single, first media type to return documents of a different media type, as part of a training phase;
  
  converting the query into collections of multi-dimensional query vectors and the documents into collections of multi-dimensional document vectors;
  
  automatically computing relationships between the query vectors and the document vectors for relevancy of the query to a given document;
  
  computing the relationship based on vector probabilities; and
  
  utilizing a processor that executes instructions stored in memory to perform at least one of the acts of processing, converting, or computing.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The method of claim 10, further comprising performing the acts of converting and computing as a single process.
  - 12. The method of claim 10, further comprising creating each query vector as a vector of probabilities that the query belongs to a given query collection and creating each document vector as a vector of probabilities that a document belongs to a given document collection.
  - 13. The method of claim 10, further comprising creating a query-level model that employs logistic regression on query words.
  - 14. The method of claim 10, further comprising computing a probability that the query and a document simultaneously belong to a same collection.
  - 15. The method of claim 10, further comprising applying a cost function to measure similarity between expected true labels and predicted labels as the relationships.
  - 16. The method of claim 15, wherein the cost function to measure similarity between expected true labels and predicted labels is performed using cross-entropy and KL-divergence functions.

17. A method, comprising:
- as part of a training phase, processing a query of a single, first media type to return documents of a different media type;
  
  converting the query into clusters of multi-dimensional query vectors and the documents into clusters of multi-dimensional document vectors, the query and document vectors having elements of probabilities;
  
  automatically computing relationships between the query vectors and the document vectors based on vector probabilities;
  
  computing a probability that the query and a document simultaneously belong to a same cluster based on the relationships;
  
  applying a cost function to measure similarity between expected true labels and predicted labels as the relationships; and
  
  utilizing a processor that executes instructions stored in memory to perform at least one of the acts of processing, converting, computing relationships, or computing a probability.
- View Dependent Claims (18, 19, 20)
- - 18. The method of claim 17, further comprising performing the acts of converting and computing as a single process.
  - 19. The method of claim 17, further comprising creating a combined model based on query and document labeled data for utilization with a similarity function.
  - 20. The method of claim 17, wherein the cost function to measure similarity between expected true labels and predicted labels is performed using cross-entropy and KL-divergence functions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Parakhin, Mikhail, Korolev, Dmitry, Poyarkov, Alexey
Primary Examiner(s)
Herndon, Heather
Assistant Examiner(s)
Davanlou, Soheila (Gina)

Application Number

US13/360,664
Publication Number

US 20130198186A1
Time in Patent Office

2,173 Days
Field of Search

707737, 707748
US Class Current
CPC Class Codes

G06F 16/90335 Query processing

Determination of relationships between collections of disparate media types

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

67 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Determination of relationships between collections of disparate media types

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links