CONTENT AND QUALITY ASSESSMENT METHOD AND APPARATUS FOR QUALITY SEARCHING
First Claim
1. A computer-based process for filtering information according to content quality that is organized in documents comprising the steps of:
- obtaining a selected set of documents;
labeling each document of the selected set based on content quality;
extracting and representing features from each document in the selected set;
modifying the extracted and represented features;
constructing models using a pattern recognition algorithm, the constructed models being capable of assigning a label based on the content quality of each document, model parameters being instantiated using a first subset of the selected set of documents, and the parameters being chosen by validating the corresponding model against at least a second subset of the selected set, wherein during a search for related documents, the validated model labels the documents resulting from the search based on content quality.
0 Assignments
0 Petitions
Accused Products
Abstract
A computer-based process retrieves information organized in documents containing text and/or coded representations of text. The process involves obtaining and labeling a selected set of documents based on content quality, and extracting and representing features from each document in the selected set. The extracted and selected features are modified, and models are constructed using parametric learning algorithms. The constructed models are capable of assigning a label to each document. The model parameters are instantiated using a first subset of the selected set of documents. Parameters are chosen by validating the corresponding model against at least a second subset of the full document set. The constructed models also are capable of assigning labels to similar documents outside a selected subset not previously given to the process of model construction.
33 Citations
25 Claims
-
1. A computer-based process for filtering information according to content quality that is organized in documents comprising the steps of:
-
obtaining a selected set of documents; labeling each document of the selected set based on content quality; extracting and representing features from each document in the selected set; modifying the extracted and represented features; constructing models using a pattern recognition algorithm, the constructed models being capable of assigning a label based on the content quality of each document, model parameters being instantiated using a first subset of the selected set of documents, and the parameters being chosen by validating the corresponding model against at least a second subset of the selected set, wherein during a search for related documents, the validated model labels the documents resulting from the search based on content quality. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A process of using a pattern recognition algorithm to filter documents based on quality, comprising the steps of:
-
obtaining a first set of documents; obtaining a model for labeling documents based on content quality, constructed by the pattern recognition algorithm using a second set of documents that is related to the first set of documents; using the model to label the first set of documents, wherein the label of at least one of the documents of the first set is displayed. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A system using a model to filter documents according to quality comprising:
-
a storage device; at least one processor programmed to; obtain a first set of documents labeled according to content quality from the storage device; extract and represent features from the first set of document; modify the extracted and represented features; construct models for labeling documents based on content quality using a pattern recognition algorithm stored in the storage device; use a first subset of the first set of documents to instantiate parameters for the models; use a second subset of the first set of documents to validate the models to select the parameters of the model; obtain a second set of documents, related to the first set of documents; use the validated model to label the second set of documents according to content quality; distinguish between labels of the second set of documents. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
Specification