Method for automatically finding frequently asked questions in a helpdesk data set

US 20030050908A1
Filed: 08/22/2001
Published: 03/13/2003
Est. Priority Date: 08/22/2001
Status: Active Grant

First Claim

Patent Images

1. A method for automatically classifying frequently asked questions, comprising the steps of:

generating a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;

generating a count of occurrences of each word in the dictionary within each document in the document set;

partitioning the set of documents into a plurality of clusters, each cluster containing at least one document;

for each cluster, sorting dictionary terms with reference to occurrence frequency within the cluster;

determining a search space by selecting candidate dictionary terms within a desired depth of search; and

selecting a plurality of terms from the candidate dictionary terms that correspond to a predetermined level of detail.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method automatically identify candidate helpdesk problem categories that are most amenable to automated solutions. The system generates a dictionary wherein each word in the text data set is identified, and the number of documents containing these words is counted, and a corresponding count is generated. The documents are partitioned into clusters. For each generated cluster, the system sorts the dictionary terms in order of decreasing occurrence frequency. It then determines a search space by selecting the top dictionary terms as specified by a user defined depth of search. Next, the system chooses a set of terms from the search space as specified by a user-defined value indicating the desired level of detail. For each possible combination of frequent terms in the search space, the system finds the set of examples containing all the terms, and then determines if the frequency is sufficiently high and the overlap sufficiently low for this candidate set of examples to be a frequently asked question.

37 Citations

View as Search Results

22 Claims

1. A method for automatically classifying frequently asked questions, comprising the steps of:
- generating a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
  
  generating a count of occurrences of each word in the dictionary within each document in the document set;
  
  partitioning the set of documents into a plurality of clusters, each cluster containing at least one document;
  
  for each cluster, sorting dictionary terms with reference to occurrence frequency within the cluster;
  
  determining a search space by selecting candidate dictionary terms within a desired depth of search; and
  
  selecting a plurality of terms from the candidate dictionary terms that correspond to a predetermined level of detail.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, further including the step of identifying a set of examples containing the selected set of terms.
  - 3. The method of claim 2, further including the step of setting the identified set of examples as a frequently asked question.
  - 4. The method of claim 3, wherein the step of setting the identified set of examples includes the step of determining if the number of identified set of examples exceeds zero.
  - 5. The method of claim 4, wherein if the number of identified set of examples exceeds zero, selecting an overlap between the identified set of examples and other sets of examples is less than a predetermined value, P, then setting the identified set of examples as a frequently asked question.
  - 6. The method of claim 4, wherein the step of setting the identified set of examples further includes the step of removing frequently asked questions whose frequencies occur below a user-selected confidence.
  - 7. The method of claim 6, further including the step of specifying the user-selected confidence by defining a maximum number of frequently asked questions.
  - 8. The method of claim 7, further including the step of generating a centroid for each cluster in the search space;
    - and wherein if the number of identified set of examples exceeds zero, comparing the identified set of examples to the centroid.
  - 9. The method of claim 7, further including the step of preparing a report listing frequently asked questions having the user-selected confidence.
  - 10. The method of claim 1, wherein the step of sorting includes sorting the dictionary terms in order of decreasing occurrence frequency within the cluster.
  - 11. The method of claim 1, further including the step of generating a name for each cluster.
  - 12. The method of claim 1, further including the step of displaying a table including a name of each cluster and a frequency of occurrence of the frequently asked question.

13. A system for automatically classifying frequently asked questions, comprising:
- a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
  
  a count of occurrences of each word in the dictionary generated within each document in the document set;
  
  a cluster module that partitions the set of documents into a plurality of clusters, each cluster containing at least one document, wherein dictionary terms for each cluster are sorted with reference to occurrence frequency;
  
  a processing routine that determines a search space by selecting candidate dictionary terms within a desired depth of search, and that selects a plurality of terms from the candidate dictionary terms that correspond to a predetermined level of detail.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The system of claim 13, wherein the processing routine identifies a set of examples containing the selected set of terms.
  - 15. The system of claim 14, wherein the processing routine further sets the identified set of examples as a frequently asked question.
  - 16. The system of claim 13, further including a database system that generates a centroid for each cluster in the search space.
  - 17. The system of claim 16, wherein if the number of identified set of examples exceeds zero, the database system compares the identified set of examples to the centroid.

18. A computer program product for automatically classifying frequently asked questions, comprising:
- a dictionary including a subset of words contained in a document set based on a frequency of occurrence of each word in the document set;
  
  means for generating a count of occurrences of each word in the dictionary within each document in the document set;
  
  means for partitioning the set of documents into a plurality of clusters, each cluster containing at least one document, wherein dictionary terms for each cluster are sorted with reference to occurrence frequency;
  
  means for determining a search space by selecting candidate dictionary terms within a desired depth of search, and that selects a plurality of terms from the candidate dictionary terms that correspond to a predetermined level of detail.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The system of claim 18, wherein the means for means for determining the search space identifies a set of examples containing the selected set of terms.
  - 20. The system of claim 19, wherein the means for determining the search space further sets the identified set of examples as a frequently asked question.
  - 21. The system of claim 18, further including means for generating a centroid for each cluster in the search space.
  - 22. The system of claim 21, wherein if the number of identified set of examples exceeds zero, the means for determining the search space compares the identified set of examples to the centroid.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lenovo PC International Limited (Lenovo Group Ltd.)
Original Assignee
International Business Machines Corporation
Inventors
Sanchez, Michael Ponce, Spangler, William Scott, Kreulen, Jeffrey Thomas, Lessler, Justin Thomas

Granted Patent

US 6,804,670 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/1
CPC Class Codes

G06F 16/355   Class or cluster creation o...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99937   Sorting

Y10S 707/99943   Generating database or data...

Method for automatically finding frequently asked questions in a helpdesk data set

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method for automatically finding frequently asked questions in a helpdesk data set

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links