Question answering over structured content on the web
First Claim
1. A system that facilitates determination of search query answers, comprising:
- a processor that executes computer executable components stored in a memory;
an extraction component that crawls content located via multiple sources on a network and obtains, at crawl-time, structured content along with associated metadata;
an indexing component that indexes the structured content from the extraction component based at least on the associated metadata, and stores the indexed structured data in a database;
a query response component that employs the indexed structured content to determine a plurality of query answer strings in response to receiving a user query, wherein the query response component;
parses the user query to ascertain a question focus and an answer type;
searches the indexed structured content stored in the database for the question focus and identifies the plurality of query answer strings that conform with the answer type in accordance with one or more relationships specified by the obtained structured content; and
provides feature information included in a feature vector for each of the plurality of query answer strings, wherein the feature information includes a set of features that describe characteristics of an answer, the set of features including at least one source feature relating to a source from which a particular string of information was extracted, and at least one answer type feature relating to an intrinsic property of an actual query answer string; and
a ranking component that utilizes the feature vectors to automatically order a top K number of query answer strings, where K is an integer from one to a total number of available query answer strings.
2 Assignments
0 Petitions
Accused Products
Abstract
Structured content and associated metadata from the Web are leveraged to provide specific answer string responses to user questions. The structured content can also be indexed at crawl-time to facilitate searching of the content at search-time. Ranking techniques can also be employed to facilitate in providing an optimum answer string and/or a top K list of answer strings for a query. Ranking can be based on trainable algorithms that utilize feature vectors for candidate answer strings. In one instance, at crawl-time, structured content is indexed and automatically associated with metadata relating to the structured content and the source web page. At search-time, candidate indexed structured content is then utilized to extract an appropriate answer string in response to a user query.
-
Citations
18 Claims
-
1. A system that facilitates determination of search query answers, comprising:
-
a processor that executes computer executable components stored in a memory; an extraction component that crawls content located via multiple sources on a network and obtains, at crawl-time, structured content along with associated metadata; an indexing component that indexes the structured content from the extraction component based at least on the associated metadata, and stores the indexed structured data in a database; a query response component that employs the indexed structured content to determine a plurality of query answer strings in response to receiving a user query, wherein the query response component; parses the user query to ascertain a question focus and an answer type; searches the indexed structured content stored in the database for the question focus and identifies the plurality of query answer strings that conform with the answer type in accordance with one or more relationships specified by the obtained structured content; and provides feature information included in a feature vector for each of the plurality of query answer strings, wherein the feature information includes a set of features that describe characteristics of an answer, the set of features including at least one source feature relating to a source from which a particular string of information was extracted, and at least one answer type feature relating to an intrinsic property of an actual query answer string; and a ranking component that utilizes the feature vectors to automatically order a top K number of query answer strings, where K is an integer from one to a total number of available query answer strings. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for facilitating determination of search query answers, comprising:
-
employing a processor executing computer-executable instructions stored on a computer-readable storage medium to implement the following acts; extracting structured content and associated metadata from a plurality of web pages on the Internet at crawl-time, wherein the structured content includes tables that specify relationships between strings of information; indexing the structured content utilizing the associated metadata; storing the indexed structured content in a database; obtaining a query from at least one user; ascertaining a question focus and an answer type based upon the obtained query; searching the indexed structured content stored in the database based upon the question focus; identifying a plurality of candidate answer strings for the query based upon one or more relationships specified by the indexed structured content, wherein the one or more relationships include the question focus such that the plurality of candidate answer strings conform to the answer type; providing feature information included in a feature vector for each of the plurality of candidate answer strings, wherein the feature information includes a set of features that describe characteristics of an answer, the set of features including at least one source feature relating to a source from which a particular candidate answer string was extracted, and at least one answer type feature relating to an intrinsic property of the particular candidate answer string; ranking the candidate answer strings based on the feature vectors; and providing the ranked candidate answer strings to the at least one user. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer-implemented system that facilitates determination of search query answers, comprising:
-
at least one processor that executes computer executable code stored in memory; means for extracting structured content from a plurality of web pages on the Internet along with associated metadata, wherein the structured content is extracted at an Internet crawl-time and includes at least one list that provides relationships between strings of information that facilitates formation of an answer string; means for indexing the structured content based at least on the associated metadata; means for storing indexed structured content at a database; means for obtaining a query from a user; means for determining a question focus of the query and an answer type for the query; means for searching the indexed structured content stored at the database based upon the question focus; means for identifying a plurality of candidate answer strings for the query based upon one or more relationships specified by the indexed structured content, wherein the one or more relationships include the question focus such that the plurality of candidate answer strings conform to the answer type; means for providing feature information included in a feature vector for each of the plurality of candidate answer strings, wherein the feature information includes a set of features that describe characteristics of an answer, the set of features including at least one source feature relating to a source from which a particular candidate answer string was extracted, and at least one answer type feature relating to an intrinsic property of the particular candidate answer string; and means for ranking the candidate answer strings based on the feature vectors. - View Dependent Claims (18)
-
Specification