System and method for labeling a document
First Claim
1. A method for selecting a proxy keyword for a document, the method comprising:
- receiving a first document;
determining at least a first candidate document from the world wide web for the first document;
determining at least a first keyword for the first candidate document;
determining at least a second candidate document from the world wide web for the first document;
determining at least a second keyword for the second candidate document;
determining a proxy keyword for the first document based on the first and second keywords.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for selecting a proxy keyword for an unknown document. An unknown document is received by a receiver. A plurality of candidate documents and corresponding keywords are determined for the unknown document. Using the keywords from the candidate documents, proxy keywords are determined for the unknown document based on a plurality of factors including a length of the keywords, a distance of the candidate documents from the unknown document, a similarity of the text between the unknown document and the respective candidate document, a rank of the keywords within each candidate document, and a frequency of the keyword within its respective candidate document.
-
Citations
23 Claims
-
1. A method for selecting a proxy keyword for a document, the method comprising:
-
receiving a first document;
determining at least a first candidate document from the world wide web for the first document;
determining at least a first keyword for the first candidate document;
determining at least a second candidate document from the world wide web for the first document;
determining at least a second keyword for the second candidate document;
determining a proxy keyword for the first document based on the first and second keywords. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for selecting a proxy keyword for a document, the system comprising:
-
a receiver effective to receive a first document;
a candidate document determination module effective to determine at least a first candidate document from the World Wide Web for the first document;
a proxy phrase determination module effective to determine at least a first keyword for the first candidate document;
the candidate document determination module further effective to determine at least a second candidate document World Wide Web for the first document;
the proxy phrase determination module effective to determine at least a second keyword for the second candidate document; and
a processor effective to determine a proxy keyword for the first document based on the first and second keywords. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A document label file for a first document produced by the method of:
-
receiving a first document;
determining at least a first candidate document from the world wide web for the first document;
determining at least a first keyword for the first candidate document;
determining at least a second candidate document from the world wide web for the first document;
determining at least a second keyword for the second candidate document;
determining a proxy keyword for the first document based on the first and second keywords.
-
Specification