Enterprise relevancy ranking using a neural network

US 7,840,569 B2
Filed: 10/18/2007
Issued: 11/23/2010
Est. Priority Date: 10/18/2007
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of determining a relevancy rank ordering score for a plurality of documents comprising:

(a) identifying, by at least one processing unit, a finite set of candidate documents;

(b) for each of the candidate documents;

(i) obtaining raw data for a plurality of ranking features associated with the candidate document, the plurality of ranking features comprising at least two of;

BM25, click distance, URL depth, file type, and language of the candidate document;

(ii) transforming the raw data for the plurality of ranking features;

(iii) normalizing the transformed raw data for the plurality of ranking features;

(iv) using a neural network to calculate a relevancy score from the transformed, normalized raw data for the plurality of ranking features, wherein calculating the relevancy score further comprises;

calculating hidden node scores at a plurality of hidden nodes from the transformed, normalized raw data, wherein the transformed, normalized raw data for each of the ranking features is provided to each of the plurality of hidden nodes; and

calculating the relevancy score based on the hidden node scores;

(c) ranking the candidate documents according to the relevancy score for each of the candidate documents; and

(d) displaying a list of the ranked documents.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A neural network is used to process a set of ranking features in order to determine the relevancy ranking for a set of documents or other items. The neural network calculates a predicted relevancy score for each document and the documents can then be ordered by that score. Alternate embodiments apply a set of data transformations to the ranking features before they are input to the neural network. Training can be used to adapt both the neural network and certain of the data transformations to target environments.

252 Citations

17 Claims

1. A computer-implemented method of determining a relevancy rank ordering score for a plurality of documents comprising:
- (a) identifying, by at least one processing unit, a finite set of candidate documents;
  
  (b) for each of the candidate documents;
  
  (i) obtaining raw data for a plurality of ranking features associated with the candidate document, the plurality of ranking features comprising at least two of;
  
  BM25, click distance, URL depth, file type, and language of the candidate document;
  
  (ii) transforming the raw data for the plurality of ranking features;
  
  (iii) normalizing the transformed raw data for the plurality of ranking features;
  
  (iv) using a neural network to calculate a relevancy score from the transformed, normalized raw data for the plurality of ranking features, wherein calculating the relevancy score further comprises;
  
  calculating hidden node scores at a plurality of hidden nodes from the transformed, normalized raw data, wherein the transformed, normalized raw data for each of the ranking features is provided to each of the plurality of hidden nodes; and
  
  calculating the relevancy score based on the hidden node scores;
  
  (c) ranking the candidate documents according to the relevancy score for each of the candidate documents; and
  
  (d) displaying a list of the ranked documents.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein at least one of the transformations is of the form
  - 3. The method of claim 2 wherein at least one of the saturation values is adjusted during training of the neural network.
  - 4. The method of claim 1 wherein at least one of the transformations comprises mapping each value of an enumerated data type to a discrete binary value.
  - 5. The method of claim 4 wherein the neural network accepts each discrete binary value as a separate input and applies a separate trainable weight to each of the separate inputs.
  - 6. The method of claim 1 wherein the BM25 feature comprises the BM25G formula which uses at least one property selected from the group consisting of body, title, author, anchor text, URL, and extracted title.

7. A system for generating a relevancy ranking for documents comprising:
- at least one processing unit;
  
  a memory, communicatively coupled to the at least one processing unit, containing instructions that, when executed by the at least one processing unit, comprise;
  
  a module which identifies a set of candidate documents and makes available raw data for a plurality of ranking features for each of the candidate documents, the plurality of ranking features comprising at least two of;
  
  BM25, click distance, URL depth, file type, and language of the candidate document; and
  
  a ranking module comprising at least one input transformation, at least one input normalization, and a neural network, wherein the ranking module accepts the raw data for the plurality of ranking features for each of the candidate documents individually, applies the at least one input transformation to the raw data for each of the plurality of ranking features, applies the at least one input normalization to the transformed raw data for each of the plurality of ranking features, provides the transformed, normalized raw data for the plurality of ranking features to the neural network which calculates hidden node scores at a plurality of hidden nodes from the transformed, normalized raw data, wherein the transformed, normalized raw data for each of the ranking features is provided to each of the plurality of hidden nodes, and wherein the neural network calculates a relevancy score based on each of the hidden node scores for each of the candidate documents, and wherein the ranking module ranks the candidate documents and provides a list of the candidate documents for display.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The relevancy ranking system of claim 7 wherein the BM25 feature comprises the BM25G formula which uses at least one property selected from the group consisting of body, title, author, anchor text, URL, and extracted title.
  - 9. The relevancy ranking system of claim 7 wherein the data module further comprises at least one transformation constant and the input transformation utilizes the transformation constant.
  - 10. The relevancy ranking system of claim 7 wherein at least one of the input transformations is of the form
  - 11. The relevancy ranking system of claim 10 wherein at least one of the configurable constants is adjusted during training of the neural network.
  - 12. The relevancy ranking system of claim 7 wherein at least one of the input transformations comprises mapping each value of an enumerated data type to a discrete binary value and the neural network accepts each discrete binary value as a separate ranking feature and applies a separate ranking feature weight to each of the separate ranking features.

13. A computer implemented method of rank ordering a plurality of documents by relevancy comprising:
- (a) identifying, by at least one processing unit, a finite set of candidate documents;
  
  (b) for each of the candidate documents;
  
  (i) obtaining raw data for a plurality of ranking features associated with the candidate document, the plurality of ranking features comprising at least two of;
  
  BM25, click distance, URL depth, file type, and language of the candidate documents;
  
  (ii) applying a transformation to the raw data for the plurality of ranking features, wherein the transformation comprises a constant which is configurable;
  
  (iii) normalizing the transformed raw data for the plurality of ranking features;
  
  (iv) using a neural network to calculate a relevancy score from the transformed, normalized raw data for the plurality of ranking features, wherein calculating the relevancy score further comprises;
  
  calculating hidden node scores at a plurality of hidden nodes from the transformed, normalized raw data, wherein the transformed, normalized raw data for each of the ranking features is provided to each of the plurality of hidden nodes; and
  
  calculating the relevancy score based on each of the hidden node scores;
  
  (c) ordering the candidate documents by the calculated relevancy scores; and
  
  (d) displaying a list of the ordered candidate documents.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The rank ordering method of claim 13 wherein at least one of the transformation is of the form
  - 15. The rank ordering method of claim 14 further comprising at least one ranking feature transformation comprising mapping each value of an enumerated data type to a discrete binary value and wherein the neural network accepts each discrete binary value as a separate ranking feature and applies a separate trainable weight to each of the discrete binary values.
  - 16. The rank ordering method of claim 15 wherein the ranking features comprise at least BM25, click distance, URL depth, file type, and language and wherein click distance and URL depth are transformed using:
  - 17. The method of claim 16 wherein the BM25 feature comprises the BM25G formula which uses the properties of body, title, author, anchor text, URL, and extracted title.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Taylor, Michael James, Burges, Chris J. C., Meyerzon, Dmitriy, Shnitko, Yauhen
Primary Examiner(s)
Le; Debbie

Application Number

US11/874,844
Publication Number

US 20090106223A1
Time in Patent Office

1,132 Days
Field of Search

707 1- 7, 706/62
US Class Current

707/748
CPC Class Codes

G06F 16/951 Indexing; Web crawling tech...

G06N 3/02 Neural networks

Enterprise relevancy ranking using a neural network

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

252 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Enterprise relevancy ranking using a neural network

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

252 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links