Method for automatically finding frequently asked questions in a helpdesk data set
First Claim
1. A method for automatically classifying frequently asked questions, comprising the steps of:
- generating a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
generating a count of occurrences of each word in the dictionary within each document in the document set;
partitioning the set of documents into a plurality of clusters, each cluster containing at least one document;
for each cluster, sorting dictionary terms with reference to occurrence frequency within the cluster;
determining a search space by selecting candidate dictionary terms within a desired depth of search; and
selecting a plurality of terms from the candidate dictionary terms that correspond to a predetermined level of detail.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method automatically identify candidate helpdesk problem categories that are most amenable to automated solutions. The system generates a dictionary wherein each word in the text data set is identified, and the number of documents containing these words is counted, and a corresponding count is generated. The documents are partitioned into clusters. For each generated cluster, the system sorts the dictionary terms in order of decreasing occurrence frequency. It then determines a search space by selecting the top dictionary terms as specified by a user defined depth of search. Next, the system chooses a set of terms from the search space as specified by a user-defined value indicating the desired level of detail. For each possible combination of frequent terms in the search space, the system finds the set of examples containing all the terms, and then determines if the frequency is sufficiently high and the overlap sufficiently low for this candidate set of examples to be a frequently asked question.
37 Citations
22 Claims
-
1. A method for automatically classifying frequently asked questions, comprising the steps of:
-
generating a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
generating a count of occurrences of each word in the dictionary within each document in the document set;
partitioning the set of documents into a plurality of clusters, each cluster containing at least one document;
for each cluster, sorting dictionary terms with reference to occurrence frequency within the cluster;
determining a search space by selecting candidate dictionary terms within a desired depth of search; and
selecting a plurality of terms from the candidate dictionary terms that correspond to a predetermined level of detail. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for automatically classifying frequently asked questions, comprising:
-
a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
a count of occurrences of each word in the dictionary generated within each document in the document set;
a cluster module that partitions the set of documents into a plurality of clusters, each cluster containing at least one document, wherein dictionary terms for each cluster are sorted with reference to occurrence frequency;
a processing routine that determines a search space by selecting candidate dictionary terms within a desired depth of search, and that selects a plurality of terms from the candidate dictionary terms that correspond to a predetermined level of detail. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer program product for automatically classifying frequently asked questions, comprising:
-
a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
means for generating a count of occurrences of each word in the dictionary within each document in the document set;
means for partitioning the set of documents into a plurality of clusters, each cluster containing at least one document, wherein dictionary terms for each cluster are sorted with reference to occurrence frequency;
means for determining a search space by selecting candidate dictionary terms within a desired depth of search, and that selects a plurality of terms from the candidate dictionary terms that correspond to a predetermined level of detail. - View Dependent Claims (19, 20, 21, 22)
-
Specification