OPTIMIZATION FILTERS FOR USER GENERATED CONTENT SEARCHES
First Claim
1. A network device, comprising:
- a transceiver to send and receive data over a network; and
a processor that is operative to perform actions, including;
determining a plurality of document features for each document in a plurality of documents, wherein the plurality of documents include at least one document defined as having sufficient subject matter specificity, and at least one other document having insufficient subject matter specificity;
training a classifier based at least on the plurality of document features;
providing to the trained classifier, at least one other document, wherein the trained classifier determines a quality value for the at least one other document; and
if the determined quality value of the at least one other document is above a quality threshold value, identifying the at least one other document to have sufficient subject matter specificity, and providing the at least one other document to a client device for display.
9 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are directed towards filtering from a user generated content (UGC) search result those documents determined to have insufficient subject matter specificity as defined by a training of a classification filter. The training comprises selecting a set of UGC that is definable as having sufficient subject matter specificity (a good set), and another set of UGC that is definable as having insufficient subject matter specificity (a bad set). The trained UGC classifier may examine search documents, and based on the documents having values above a defined threshold categorize the document as having sufficient subject matter specificity (or not). Those documents having insufficient subject matter specificity based on their determined thresholds may be filtered out of the submitted UGC search results. The documents remaining within the UGC search results may then be provided to a searcher for display at a client device.
51 Citations
20 Claims
-
1. A network device, comprising:
-
a transceiver to send and receive data over a network; and a processor that is operative to perform actions, including; determining a plurality of document features for each document in a plurality of documents, wherein the plurality of documents include at least one document defined as having sufficient subject matter specificity, and at least one other document having insufficient subject matter specificity; training a classifier based at least on the plurality of document features; providing to the trained classifier, at least one other document, wherein the trained classifier determines a quality value for the at least one other document; and if the determined quality value of the at least one other document is above a quality threshold value, identifying the at least one other document to have sufficient subject matter specificity, and providing the at least one other document to a client device for display. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system that is operative to manage a search query, comprising:
-
a search tool configured to receive a search query request and to provide a plurality of documents in response; and a document classifier that is configured to detect a document'"'"'s subject matter specificity based on a plurality of document features about the document, wherein the document classifier performs actions, including; receiving a plurality of documents from the search tool; and for each document in the plurality of documents; extracting from each respective document, a plurality of document features; employing a machine learning algorithm to generate a document quality feature score for the respective document based on the extracted document features; and if the document quality feature score for the respective document is below a threshold value, removing the respective document from the received plurality of documents as indicating that the document has insufficient subject matter specificity; and providing a resulting set of documents to a search requester based on each document quality feature score, such that the provided documents are determined to have sufficient subject matter specificity. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A computer-readable storage medium having computer-executable instructions, the computer-executable instructions when installed onto a computing device enable the computing device to perform actions, comprising:
-
receiving a plurality of documents in response to a search query; and for each document in the plurality of documents; extracting from each respective document, a plurality of document features; employing a document classifier to generate a document quality feature score for the respective document based on the extracted document features; if the document quality feature score for the respective document is below a threshold value, removing the respective document from the received plurality of documents as indicating that the document has insufficient subject matter specificity; and providing a resulting set of documents to a search requester based on each document quality feature score, such that the provided documents are determined to have sufficient subject matter specificity. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A method of performing a search query, comprising:
-
receiving a plurality of documents in response to the search query over user generated content (UGC); and for-each document in the plurality of documents; extracting document features from each respective document useable to identify subject matter specificity of a UGC document; employing a document classifier to generate a document quality feature score for the respective document based on the extracted document features; and if the document quality feature score for the respective document is below a threshold value, removing the respective document from the received plurality of documents as indicating that the document has insufficient subject matter specificity; and providing a resulting set of documents to a search requester based on each document quality feature score, such that the provided documents are determined to have sufficient subject matter specificity. - View Dependent Claims (18, 19, 20)
-
Specification