×

Content and quality assessment method and apparatus for quality searching

  • US 8,275,772 B2
  • Filed: 08/20/2008
  • Issued: 09/25/2012
  • Est. Priority Date: 05/14/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-based process utilizing a specifically programmed computer for filtering information according to content quality that is organized in documents comprising the steps of:

  • A) obtaining a selected set of documents;

    B) labeling each document of the selected set based on content quality with the documents labeled as positive if they match content and quality criteria and labeled as negative if they do not match content and quality criteria;

    C) extracting and representing features from each document in the selected set;

    D) modifying the extracted represented features;

    E) constructing models using pattern recognition algorithms to assign a label to a document;

    F) constructing models for labeling documents based on content quality using pattern recognition algorithms stored in the storage device consisting of the following steps;

    1) dividing the first set of documents into N subsets such that the union of all the subsets is the first set of documents;

    2) choosing at least one pattern recognition algorithm;

    a) instantiating a set of parameters for the pattern recognition algorithm;

    i) processing each of the N subsets comprising the following steps;



    a′

    ) defining a first subset and defining a second subset mutually exclusive of the first subset;



    b′

    ) training the pattern recognition algorithm to build a model using the first subset and the parameter set;



    c′

    ) applying the model to the second subset of documents to obtain labels and scores for each document;



    d′

    ) evaluating the labels and scores;



    e′

    ) storing the evaluation measure, set of parameters, and current pattern recognition algorithm; and

    b) repeating step 2a) until all appropriate sets of parameters for the pattern recognition algorithm have been applied; and

    3) repeating step

         2) until all pattern recognition algorithms have been applied;

    4) aggregating the evaluation measures for the N subsets, the pattern recognition algorithms with the set of parameters;

    5) selecting the parameter set and pattern recognition algorithm with an aggregate evaluation measure that meets a selection criteria;

    6) applying the parameter set and the pattern recognition algorithm identified from step

         5) to the first set of documents to build a final model; and

    G) constructing a final model with the appropriate pattern recognition model and associated parameter to assign a label and/or a score to a set of previously unlabeled documents; and

    H) displaying the label and/or score of at least one of the previously unlabeled documents.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×