METHOD AND APPARATUS FOR CLEANING DATA SETS FOR A SEARCH PROCESS
First Claim
1. A method comprising facilitating a processing of and/or processing (1) data and/or (2) information and/or (3) at least one signal, the (1) data and/or (2) information and/or (3) at least one signal based, at least in part, on the following:
- one or more reference documents associated with at least one region;
a processing of the one or more reference documents to determine a frequency distribution of one or more candidate stop words with respect to the at least one region;
a selection of one or more stop words applicable to the at least one region from the one or more candidate stop words based, at least in part, on one or more frequency distribution criteria; and
a processing of at least one data set associated with a search process to generate at least one enhanced data set by filtering the one or more stop words from the at least one data set.
9 Assignments
0 Petitions
Accused Products
Abstract
An approach is provided for cleaning data sets for a search process. The cleanup platform determines one or more reference documents associated with at least one region. Next, the cleanup platform processes and/or facilitates a processing of the one or more reference documents to determine a frequency distribution of one or more candidate stop words with respect to the at least one region. Then, the cleanup platform causes, at least in part, selection of one or more stop words applicable to the at least one region from the one or more candidate stop words based, at least in part, on one or more frequency distribution criteria. Additionally, the cleanup platform processes and/or facilitates a processing of at least one data set associated with a search process to generate at least one enhanced data set by filtering the one or more stop words from the at least one data set.
13 Citations
20 Claims
-
1. A method comprising facilitating a processing of and/or processing (1) data and/or (2) information and/or (3) at least one signal, the (1) data and/or (2) information and/or (3) at least one signal based, at least in part, on the following:
-
one or more reference documents associated with at least one region; a processing of the one or more reference documents to determine a frequency distribution of one or more candidate stop words with respect to the at least one region; a selection of one or more stop words applicable to the at least one region from the one or more candidate stop words based, at least in part, on one or more frequency distribution criteria; and a processing of at least one data set associated with a search process to generate at least one enhanced data set by filtering the one or more stop words from the at least one data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus comprising:
-
at least one processor; and at least one memory including computer program code for one or more programs, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following, determine one or more reference documents associated with at least one region; process and/or facilitate a processing of the one or more reference documents to determine a frequency distribution of one or more candidate stop words with respect to the at least one region; cause, at least in part, selection of one or more stop words applicable to the at least one region from the one or more candidate stop words based, at least in part, on one or more frequency distribution criteria; and process and/or facilitate a processing of at least one data set associated with a search process to generate at least one enhanced data set by filtering the one or more stop words from the at least one data set. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification