Ranking search results using feature extraction
First Claim
1. A method, comprising:
- receiving on a computing device resource items generated by a search engine in response to a search request;
parsing each of the resource items to obtain data, wherein the data includes;
text, formatting information and metadata;
passing the data for each of the resource items that includes the text, the formatting information and the metadata through a feature extraction application for determining applicability of the obtained metadata by comparing the obtained metadata against known metadata for use in ranking search results;
comparing words parsed from the text that is separate from the obtained metadata against a database of known words that have been previously characterized;
using formatting characteristics of the data to determine when to extract a title from the resource item;
wherein the formatting characteristics that are used include;
a bold formatting characteristic;
an underlining formatting characteristic;
wherein the feature extraction application stores statistical information for each extracted title that provides a number of and a frequency of appearance of the extracted title in the data; and
extracting features from the one or more portions for each of the resource items;
passing extracted features through a ranking application for generating a ranking value for each of the resource items based on a relevance of each of the resource items to the search request; and
generating a list of the resource items in an order according to the ranking value for each of the resource items, whereby when the resource item has the ranking value associated with being ranked as most relevant to the search request received by the search engine is displayed first.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and computer-readable media are provided for ranking search results using feature extraction data. Each of the results of a search engine query is parsed to obtain data, such as text, formatting information, metadata, and the like. The text, the formatting information and the metadata are passed through a feature extraction application to extract data that may be used to improve a ranking of the search results based on relevance of the search results to the search engine query. The feature extraction application extracts features, such as titles, found in any of the text based on formatting information applied to or associated with the text. The extracted titles, the text, the formatting information and the metadata for any given search results item are processed according to a field weighting application for determining a ranking of the given search results item. Ranked search results items may then be displayed according to ranking.
236 Citations
20 Claims
-
1. A method, comprising:
-
receiving on a computing device resource items generated by a search engine in response to a search request; parsing each of the resource items to obtain data, wherein the data includes;
text, formatting information and metadata;passing the data for each of the resource items that includes the text, the formatting information and the metadata through a feature extraction application for determining applicability of the obtained metadata by comparing the obtained metadata against known metadata for use in ranking search results;
comparing words parsed from the text that is separate from the obtained metadata against a database of known words that have been previously characterized;
using formatting characteristics of the data to determine when to extract a title from the resource item;
wherein the formatting characteristics that are used include;
a bold formatting characteristic;
an underlining formatting characteristic;
wherein the feature extraction application stores statistical information for each extracted title that provides a number of and a frequency of appearance of the extracted title in the data; and
extracting features from the one or more portions for each of the resource items;passing extracted features through a ranking application for generating a ranking value for each of the resource items based on a relevance of each of the resource items to the search request; and generating a list of the resource items in an order according to the ranking value for each of the resource items, whereby when the resource item has the ranking value associated with being ranked as most relevant to the search request received by the search engine is displayed first. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method, comprising:
-
receiving on a computing device resource items generated by a search engine in response to a search request; obtaining resource items from an information source; parsing a metadata source in each of the resource items for one or more metadata items, and parsing a content portion of each of the resource items into one or more text selections and associated formatting information applied to the one or more text selections; passing the one or more metadata items and the one or more text selections and associated formatting information for each of the resource items through a feature extraction application; extracting titles from the one or more text selections and associated formatting information for each of the resources including using formatting characteristics to determine when to extract a title from the resource item;
wherein the formatting characteristics that are used include;
a bold formatting characteristic;
an underlining formatting characteristic;
wherein the feature extraction application stores statistical information for each extracted title that provides a number of and a frequency of appearance of the extracted title;determining applicability of the one or more metadata items by comparing the one or more metadata items against known metadata for use in ranking search results; comparing words parsed from the text that is separate from the one or more metadata items against a database of known words that have been previously characterized; processing the extracted titles, the one or more metadata items, the one or more text selections and the associated formatting information according to a ranking algorithm for generating a ranking value for each of the resource items based on a relevance of each of the resource items to the search request received by the search engine; and generating a list of the resource items in an order according to the ranking value for each of the resource items. - View Dependent Claims (10, 11, 12)
-
-
13. A computer-readable medium having stored thereon computer-executable instructions which when executed by a computer perform a method, comprising:
-
receiving resource items generated by a search engine in response to a search request, the resource items including one or more content portions and one or more metadata portions; parsing each of the resource items into the one or more content portions and associated formatting information applied to the one or more content portions; passing the one or more content portions and associated formatting information for each of the resource items through a feature extraction application; determining applicability of the one or more metadata portions by comparing the one or more metadata portions against known metadata for use in ranking search results; comparing words parsed from the content portions that is separate from the one or more metadata portions against a database of known words that have been previously characterized; using formatting characteristics to determine when to extract a title from the resource item;
wherein the formatting characteristics that are used include;
a bold formatting characteristic;
an underlining formatting characteristic;
wherein the feature extraction application stores statistical information for each extracted title that provides a number of and a frequency of appearance of the extracted title in the data; and
extracts features from the one or more content portions and associated formatting information for each of the resource items;passing the extracted features and the one or more content portions and associated formatting information through a ranking application for generating a ranking value for each of the resource items based on a relevance of each of the resource items to the search request received by the search engine; and generating a list of the resource items in an order according to the ranking value for each of the resource items. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification