Mining forums for solutions to questions and scoring candidate answers
First Claim
1. A method, in an information handling system comprising a processor and a memory, of mining threaded online discussions, the method comprising:
- performing, by the information handling system, a natural language processing (NLP) analysis of one or more threaded discussions pertaining to a given topic, wherein the analysis is performed across one or more web sites with each of the web sites including one or more of the threaded discussions, wherein the analysis results in a plurality of harvested discussions;
correlating the plurality of harvested discussions across a plurality of threads from the one or more web sites;
identifying a question from the harvested discussions;
identifying a plurality of candidate answers from the harvested discussions, wherein each of the plurality of candidate answers pertain to the identified question;
aggregating and merging a selected plurality of harvested discussions corresponding to each of the candidate answers, wherein the selected plurality of harvested discussions are supporting evidence corresponding to the respective candidate answer;
generating a supporting evidence score based on one or more factors of the supporting evidence for each of the candidate answers, wherein at least one of the factors is selected from the group consisting of a quality of the supporting evidence, and a quantity of the supporting evidence;
generating an answer post score for each of the candidate answers based on an identification of a rating within the threaded discussions pertaining to the respective candidate answer;
generating a post provider score for each of the candidate answers based on an identified expertise level that corresponds to a provider of the respective candidate answer;
generating a follow-up score for each of the candidate answers based on one or more follow-up comments from posters that indicate that the respective candidate answer was correct; and
scoring each of the plurality of candidate answers, wherein the scoring calculates an overall score corresponding to each of the candidate answers, wherein the overall score is based upon one or more component scores selected from the group consisting of the supporting evidence score, the answer post score, the post provider score, and the follow-up score, and wherein a selected answer has the highest overall score when compared to the other candidate answers.
1 Assignment
0 Petitions
Accused Products
Abstract
An approach is provided for mining threaded online discussions. In the approach, performed by an information handling system, a natural language processing (NLP) analysis is performed on threaded discussions pertaining to a given topic. The analysis is performed across multiple web sites with each of the web sites including one or more threaded discussions. The analysis results in harvested discussions pertaining to the topic. The harvested discussions are correlated and a question is identified from the harvested discussions. A set of candidate answers is also identified from the harvested discussions, with one of the candidate answers being selected as the most likely answer to the identified question.
16 Citations
14 Claims
-
1. A method, in an information handling system comprising a processor and a memory, of mining threaded online discussions, the method comprising:
-
performing, by the information handling system, a natural language processing (NLP) analysis of one or more threaded discussions pertaining to a given topic, wherein the analysis is performed across one or more web sites with each of the web sites including one or more of the threaded discussions, wherein the analysis results in a plurality of harvested discussions; correlating the plurality of harvested discussions across a plurality of threads from the one or more web sites; identifying a question from the harvested discussions; identifying a plurality of candidate answers from the harvested discussions, wherein each of the plurality of candidate answers pertain to the identified question; aggregating and merging a selected plurality of harvested discussions corresponding to each of the candidate answers, wherein the selected plurality of harvested discussions are supporting evidence corresponding to the respective candidate answer; generating a supporting evidence score based on one or more factors of the supporting evidence for each of the candidate answers, wherein at least one of the factors is selected from the group consisting of a quality of the supporting evidence, and a quantity of the supporting evidence; generating an answer post score for each of the candidate answers based on an identification of a rating within the threaded discussions pertaining to the respective candidate answer; generating a post provider score for each of the candidate answers based on an identified expertise level that corresponds to a provider of the respective candidate answer; generating a follow-up score for each of the candidate answers based on one or more follow-up comments from posters that indicate that the respective candidate answer was correct; and scoring each of the plurality of candidate answers, wherein the scoring calculates an overall score corresponding to each of the candidate answers, wherein the overall score is based upon one or more component scores selected from the group consisting of the supporting evidence score, the answer post score, the post provider score, and the follow-up score, and wherein a selected answer has the highest overall score when compared to the other candidate answers. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An information handling system comprising:
-
one or more processors; a memory coupled to at least one of the processors; a display; and a set of instructions stored in the memory and executed by at least one of the processors to mine threaded online discussions, wherein the set of instructions perform actions of; performing, by the information handling system, a natural language processing (NLP) analysis of one or more threaded discussions pertaining to a given topic, wherein the analysis is performed across one or more web sites with each of the web sites including one or more of the threaded discussions, wherein the analysis results in a plurality of harvested discussions; correlating the plurality of harvested discussions across a plurality of threads from the one or more web sites; identifying a question from the harvested discussions; identifying a plurality of candidate answers from the harvested discussions, wherein each of the plurality of candidate answers pertain to the identified question; aggregating and merging a selected plurality of harvested discussions corresponding to each of the candidate answers, wherein the selected plurality of harvested discussions are supporting evidence corresponding to the respective candidate answer; generating a supporting evidence score based on one or more factors of the supporting evidence for each of the candidate answers, wherein at least one of the factors is selected from the group consisting of a quality of the supporting evidence, and a quantity of the supporting evidence; generating an answer post score for each of the candidate answers based on an identification of a rating within the threaded discussions pertaining to the respective candidate answer; generating a post provider score for each of the candidate answers based on an identified expertise level that corresponds to a provider of the respective candidate answer; generating a follow-up score for each of the candidate answers based on one or more follow-up comments from posters that indicate that the respective candidate answer was correct; and scoring each of the plurality of candidate answers, wherein the scoring calculates an overall score corresponding to each of the candidate answers, wherein the overall score is based upon one or more component scores selected from the group consisting of the supporting evidence score, the answer post score, the post provider score, and the follow-up score, and wherein a selected answer has the highest overall score when compared to the other candidate answers. - View Dependent Claims (7, 8, 9)
-
-
10. A computer program product stored in a computer readable medium, comprising computer instructions that, when executed by an information handling system, causes the information handling system to mine threaded online discussions by performing actions comprising:
-
performing, by the information handling system, a natural language processing (NLP) analysis of one or more threaded discussions pertaining to a given topic, wherein the analysis is performed across one or more web sites with each of the web sites including one or more of the threaded discussions, wherein the analysis results in a plurality of harvested discussions; correlating the plurality of harvested discussions across a plurality of threads from the one or more web sites; identifying a question from the harvested discussions; identifying a plurality of candidate answers from the harvested discussions, wherein each of the plurality of candidate answers pertain to the identified question; aggregating and merging a selected plurality of harvested discussions corresponding to each of the candidate answers, wherein the selected plurality of harvested discussions are supporting evidence corresponding to the respective candidate answer; generating a supporting evidence score based on one or more factors of the supporting evidence for each of the candidate answers, wherein at least one of the factors is selected from the group consisting of a quality of the supporting evidence, and a quantity of the supporting evidence; generating an answer post score for each of the candidate answers based on an identification of a rating within the threaded discussions pertaining to the respective candidate answer; generating a post provider score for each of the candidate answers based on an identified expertise level that corresponds to a provider of the respective candidate answer; generating a follow-up score for each of the candidate answers based on one or more follow-up comments from posters that indicate that the respective candidate answer was correct; and scoring each of the plurality of candidate answers, wherein the scoring calculates an overall score corresponding to each of the candidate answers, wherein the overall score is based upon one or more component scores selected from the group consisting of the supporting evidence score, the answer post score, the post provider score, and the follow-up score, and wherein a selected answer has the highest overall score when compared to the other candidate answers. - View Dependent Claims (11, 12, 13, 14)
-
Specification