System and method for labeling a document
First Claim
1. A computerized method for selecting a proxy keyword for a document, the method comprising:
- receiving an unknown document;
electronically determining, using a processing device, at least a first candidate document from the world wide web for the unknown document on the basis of a search of one or more valued search terms;
determining at least a first keyword for the first candidate document;
electronically determining, using the processing device, at least a second candidate document from the world wide web for the unknown document on the basis of a search of one or more valued search terms;
determining at least a second keyword for the second candidate document;
electronically determining, using the processing device, a proxy keyword for the unknown document based on the first and second keywords; and
wherein the first candidate document and the second candidate document are documents which are most similar to the unknown document.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for selecting a proxy keyword for an unknown document. An unknown document is received by a receiver. A plurality of candidate documents and corresponding keywords are determined for the unknown document. Using the keywords from the candidate documents, proxy keywords are determined for the unknown document based on a plurality of factors including a length of the keywords, a distance of the candidate documents from the unknown document, a similarity of the text between the unknown document and the respective candidate document, a rank of the keywords within each candidate document, and a frequency of the keyword within its respective candidate document.
-
Citations
21 Claims
-
1. A computerized method for selecting a proxy keyword for a document, the method comprising:
-
receiving an unknown document; electronically determining, using a processing device, at least a first candidate document from the world wide web for the unknown document on the basis of a search of one or more valued search terms; determining at least a first keyword for the first candidate document; electronically determining, using the processing device, at least a second candidate document from the world wide web for the unknown document on the basis of a search of one or more valued search terms; determining at least a second keyword for the second candidate document; electronically determining, using the processing device, a proxy keyword for the unknown document based on the first and second keywords; and wherein the first candidate document and the second candidate document are documents which are most similar to the unknown document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for selecting a proxy keyword for a document, the system comprising:
-
a receiver effective to receive an unknown document; a candidate document determination module effective to determine at least a first candidate document from the World Wide Web for the unknown document on the basis of a search of one or more valued search terms; a proxy phrase determination module effective to determine at least a first keyword for the first candidate document; the candidate document determination module further effective to determine at least a second candidate document from the World Wide Web for the unknown document on the basis of a search of one or more valued search terms; the proxy phrase determination module effective to determine at least a second keyword for the second candidate document; and a processor effective to determine a proxy keyword for the unknown document based on the first and second keywords; and wherein the first candidate document and the second candidate document are documents which are most similar to the unknown document. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
Specification