Unsupervised learning tool for feature correction
First Claim
1. A method of maintaining a search index, the method comprising:
- identifying one or more characteristics that are associated with particular data items that were excerpted from a first plurality of documents;
selecting, from the one or more characteristics, one or more particular characteristics; and
for each particular document in a second plurality of documents, performing steps comprising;
determining whether to insert, into the search index, a candidate data item excerpted from the particular document, wherein determining whether to insert the candidate data item is based at least in part on whether the candidate data item is associated with characteristics in the one or more particular characteristics; and
inserting the candidate data item into the search index in response to a determination that the candidate data item should be inserted.
9 Assignments
0 Petitions
Accused Products
Abstract
Techniques for correcting miscategorized features excerpted from web pages are provided. For each of several categories and several pages on a particular web site, a separate feature may be excerpted from that page and associated with that page in relation to that category. Often, many of the “high confidence” features that have been associated with the same category are found to be associated with similar characteristics regardless of the pages from which those features were excerpted. Thus, a set of category characteristics, which are often found associated with the “high confidence” features in a particular category, may be determined. For each page, a candidate feature that is associated with the set of category characteristics may be identified in that page. If, in relation to the particular category, a feature other than the candidate feature is associated with that page, then that other feature may be replaced by the candidate feature.
-
Citations
24 Claims
-
1. A method of maintaining a search index, the method comprising:
-
identifying one or more characteristics that are associated with particular data items that were excerpted from a first plurality of documents;
selecting, from the one or more characteristics, one or more particular characteristics; and
for each particular document in a second plurality of documents, performing steps comprising;
determining whether to insert, into the search index, a candidate data item excerpted from the particular document, wherein determining whether to insert the candidate data item is based at least in part on whether the candidate data item is associated with characteristics in the one or more particular characteristics; and
inserting the candidate data item into the search index in response to a determination that the candidate data item should be inserted. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
12. A method of revising a search index, the method comprising:
-
for each data item in the search index, (a) identifying, within a document from which the data item was excerpted, a set of one or more characteristics that are associated with the data item, and (b) adding the set of one or more characteristics to a candidate set;
selecting a particular set of one or more characteristics from the candidate set; and
for each particular document in a plurality of documents, (a) identifying, within the particular document, a candidate data item that is associated with the characteristics in the particular set of one or more characteristics, and (b) inserting the candidate data item into the search index if the search index does not contain the candidate data item. - View Dependent Claims (24)
-
Specification