Knowledge extraction from online discussion forums
First Claim
Patent Images
1. A method, comprising:
- accessing a thread from a discussion forum having a plurality of threads, the accessed thread having a root message with a thread title and a plurality of replies associated with the root message;
selecting replies from the plurality of replies in the accessed thread by analyzing structural features and content features of each reply, wherein the structural features provide context of a given reply as related to other of the plurality of replies to the root message and the content features include words related to the root message;
applying a filter to remove one or more replies from the previously selected replies by comparing a keyword list having a plurality of words, wherein the keyword list includes words indicative of personal identifying information, to content features in each of the selected replies and removing those replies that have at least one of the words indicative of personal identifying information in its content features;
ranking the replies previously selected from the plurality of replies in the accessed thread that remain after applying the filter using a ranking model based on ranking features of the replies;
generating a list of replies from the selected replies based on results of the ranking; and
storing the list of replies in a data store to create a knowledge base for an automated conversational agent.
2 Assignments
0 Petitions
Accused Products
Abstract
Concepts presented herein relate to extracting knowledge for a chatbot knowledge base from online discussion forms. Within a thread of an online discussion form, replies are selected based on structural features and content features therein. The replies can be ranked and used in a chatbot knowledge base.
-
Citations
17 Claims
-
1. A method, comprising:
-
accessing a thread from a discussion forum having a plurality of threads, the accessed thread having a root message with a thread title and a plurality of replies associated with the root message; selecting replies from the plurality of replies in the accessed thread by analyzing structural features and content features of each reply, wherein the structural features provide context of a given reply as related to other of the plurality of replies to the root message and the content features include words related to the root message; applying a filter to remove one or more replies from the previously selected replies by comparing a keyword list having a plurality of words, wherein the keyword list includes words indicative of personal identifying information, to content features in each of the selected replies and removing those replies that have at least one of the words indicative of personal identifying information in its content features; ranking the replies previously selected from the plurality of replies in the accessed thread that remain after applying the filter using a ranking model based on ranking features of the replies; generating a list of replies from the selected replies based on results of the ranking; and storing the list of replies in a data store to create a knowledge base for an automated conversational agent. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, comprising:
-
accessing a thread of a discussion forum having a root message and a plurality of replies to the root message; identifying structural features for each reply of the plurality of replies, the structural features being indicative of a contextual relationship in the thread between a given reply and at least one of the root message and another reply of the plurality of replies; identifying content features for each reply of the plurality of replies, the content features being indicative of terms used in the reply; selecting replies from the plurality of replies based on the structural features and the content features for each reply; applying a filter to remove one or more entire replies from the previously selected replies by comparing a keyword list having a plurality of words, including words that are indicative of personal identifying information to content features in each of the selected replies and removing those replies that have at least one of the words indicative of personal identifying information in its content features; and ranking the selected replies that remain after applying the filter based on the structural features, content features, and labels applied to the replies, wherein the labels indicate a quality of the replies. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system having a computer readable storage medium storing a plurality of executable modules for processing threads in an online discussion forum, each thread having a title and a plurality of replies, the executable modules comprising:
-
an identification module that accesses the threads and selects some of the replies located within the threads as a function of structural features, the structural features being indicative of a relationship between replies in the context of the thread and content features contained within the replies; a filter that removes one or more replies from the previously selected replies by comparing a keyword list having a plurality of words, including words that are indicative of personal information, to content features in each of the selected replies and removing those replies that have at least one of the words that are indicative of personal information in its content features; a ranking module for ranking the previously selected replies based on ranking features contained therein and a computer processor being a functional component of the system and activated by the identification module, the filter and the ranking module, to facilitate selecting, removing and ranking. - View Dependent Claims (15, 16, 17)
-
Specification