Method and apparatus for text classification
First Claim
1. A method for classifying natural language text input into a computer system, the system includes memory having a domain specific knowledge base having a plurality of categories stored therein, the method comprising the steps of:
- (a) accepting as input natural language input text;
(b) parsing the natural language input text into a first list of recognized keywords;
(c) using the first list to deduce further facts from the natural language input text;
(d) compiling the deduced facts into a second list;
(e) calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text;
(f) applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list, comprising the sub-steps of;
(I) calculating a value for the dynamic threshold based upon a similarity score of a most similar category and a predefined threshold offset, and(II) classifying the categories based upon their respective similarity scores by discarding categories whose similarity scores are below the threshold value;
(g) compiling the ones of the plurality of categories determined to be most similar in step (f) into a third list; and
(i) passing the first list, the second list and the third list to an external application.
3 Assignments
0 Petitions
Accused Products
Abstract
A text classification system and method that can be used by an application for classifying natural language text input into a computer system having a domain specific knowledge base that includes a knowledge base having a plurality of categories. The text classification system classifies input natural language input text by first parsing the natural language input text into a first list of recognized keywords. This list is then used to deduce further facts from the natural language input text which are then compiled into a second list. Next, a numeric similarity score for each one of the plurality of categories in the knowledge base is calculated which indicates how similar one of the plurality of categories is to the natural language input text. A dynamic threshold is then applied to determine which ones of the plurality of categories are most similar to the recognized keywords of the natural language input text. A third list is compiled of the ones of the plurality of categories determined to be most similar to the recognized keywords. An optional rule base can be utilized to further refine the determination of which ones of the plurality of categories are most similar to the recognized keywords of the natural language input text. Also, an optional learning capability can be added to improve the accuracy of the text classification system.
315 Citations
24 Claims
-
1. A method for classifying natural language text input into a computer system, the system includes memory having a domain specific knowledge base having a plurality of categories stored therein, the method comprising the steps of:
-
(a) accepting as input natural language input text; (b) parsing the natural language input text into a first list of recognized keywords; (c) using the first list to deduce further facts from the natural language input text; (d) compiling the deduced facts into a second list; (e) calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text; (f) applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list, comprising the sub-steps of; (I) calculating a value for the dynamic threshold based upon a similarity score of a most similar category and a predefined threshold offset, and (II) classifying the categories based upon their respective similarity scores by discarding categories whose similarity scores are below the threshold value; (g) compiling the ones of the plurality of categories determined to be most similar in step (f) into a third list; and (i) passing the first list, the second list and the third list to an external application. - View Dependent Claims (2, 3, 4, 5, 6, 9, 10)
-
-
7. A text classification system comprising:
-
memory; a domain specific knowledge base stored in said memory having a plurality of categories, the domain specific knowledge base includes a knowledge base of keyword/category profiles, each category in the keyword/category profiles knowledge base having an associated profile which indicates what information provides evidence for a given category, the keyword/profile weight knowledge base arranged to have associated with each keyword in a profile a profile weight that represents the amount of evidence a keyword provides for a given category; and a computer coupled to the memory, the computer including; a natural language module for accepting as input into the computer natural language input text, the natural language module includes means for parsing the natural language input text into a first list of recognized keywords; an intelligent inferencer module for using the first list to deduce further facts from the information explicitly stated in the natural language input text, the intelligent inferencer module includes means for compiling the deduced facts into a second list; a similarity measuring module for calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text, the similarity measuring module includes; means for applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the natural language input text, and means for compiling the ones of the plurality of categories determined to be most similar into a third list; and a relevance feedback learning module for adjusting the profile weights in the keyword/category profiles in the domain specific knowledge base based upon the ones of the plurality of categories determined most relevant to the natural language input text by the similarity measuring module and a second ones of the plurality of categories determined most relevant to the natural language input text by an external source.
-
-
8. A method for classifying natural language text input into a computer system, the system includes memory having a domain specific knowledge base having a plurality of categories stored therein, the method comprising the steps of:
-
(a) accepting as input natural language input text; (b) parsing the natural language input text into a first list of recognized keywords; (c) using the first list to deduce further facts from the natural language input text; (d) compiling the deduced facts into a second list; (e) calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text; (f) applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list, the step of applying a dynamic threshold further comprising the sub-steps of; (1) calculating a value for the dynamic threshold based upon a similarity score of a most similar category and a predefined threshold offset, and (2) classifying the categories based upon their respective similarity scores by discarding categories whose similarity scores are below the threshold value; and (g) compiling the ones of the plurality of categories determined to be most similar in step (f) into a third list.
-
-
11. A method for routing customer service requests by a computer system in a customer support center which includes support groups to service customer requests, the computer system including a call handling system, a text classification system and memory having a domain specific knowledge base having a plurality of categories stored therein representative of the support groups within the customer support center, each support group being identified by a name, the method comprising the steps of:
-
(a) receiving a customer service request by the computer system from the call handling system; (b) passing the customer service request to the text classification system to determine where to route the customer service request within the customer support center; (c) parsing the customer service request into a first list of recognized keywords; (d) using the first list to deduce further facts from the customer service request; (e) compiling the deduced facts into a second list; (f) calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar each one of the plurality of categories is to the the customer service request; (g) applying a dynamic threshold to identify which one of the support groups should handle the customer service request by determining which ones of the plurality of categories are most similar to the recognized keywords of the customer service request; (h) compiling the ones of the plurality of categories determined to be most similar in step (g) into a third list; (i) passing the first list, the second list and the third list back to the call handling system; and (j) routing the customer service request to the identified one of the support groups. - View Dependent Claims (13)
-
-
12. A method for routing customer service requests by a computer system in a customer support center which includes support groups to service customer requests, the computer system including a call handling system, a text classification system and memory having a domain specific knowledge base having a plurality of categories stored therein representative of the support groups within the customer support center, each support group being identified by a name, and a rule base, the method comprising the steps of:
-
(a) receiving a customer service request by the computer system from the call handling system; (b) passing the customer service request to the text classification system to determine where to route the customer service request within the customer support center; (c) parsing the customer service request into a first list of recognized keywords; (d) using the first list to deduce further facts from the customer service request; (e) compiling the deduced facts into a second list; (f) calculating, utilizing the first list, a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar each one of the plurality of categories is to the customer service request; (g) applying a dynamic threshold to identify which support groups should handle the customer service request by determining which ones of the plurality of categories are most similar to the recognized keywords of the customer service request; (h) compiling the ones of the plurality of categories determined to be most similar in step (g) into a third list; (i) utilizing the rule base to select certain ones of the plurality of categories determined to be most similar to the recognized keywords over other ones of the plurality of categories based on the first and second lists; (j) modifying the third list of the most similar categories to include the certain ones of the plurality of categories selected; (k) passing the first list, the second list and the third list back to the call handling system; and (l) routing the customer service request to the selected one of the support groups.
-
-
14. A text classification system comprising:
-
a memory; a domain specific knowledge base stored in said memory having a plurality of categories wherein the domain specific knowledge base includes a knowledge base of keyword/category profiles, each category in the keyword/category profiles knowledge base having an associated profile which indicates what information provides evidence for a given category, the keyword/profile knowledge base is arranged to have associated with each keyword in a profile a profile weight that represents the amount of evidence a keyword provides for a given category; and a computer coupled to the memory, the computer including; means for accepting as input into the computer, natural language input text, means for parsing the natural language input text into a first list of recognized keywords, means for using the first list to deduce further facts from the natural language input text, means for compiling the deduced facts into a second list, means for calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text, means for applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list, means for adjusting the profile weights in the keyword/categories determined to be the most relevant to the natural language input text and a second ones of the plurality of categories determined most relevant to the natural language input text by an external source, means for compiling the ones of the plurality of categories determined to be most similar into a third list, and means for passing the first list, the second list and the third list to an external application. - View Dependent Claims (15, 16, 17, 19)
-
-
18. A method for classifying natural language text input into a computer system, the system includes memory having a domain specific knowledge base having a plurality of categories stored therein and including a rule base, the method comprising the steps of:
-
(a) accepting as input natural language input text; (b) parsing the natural language input text into a first list of recognized keywords; (c) using the first list to deduce further facts from the natural language input text; (d) compiling the deduced facts into a second list; (e) calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text; (f) applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list; (g) compiling the ones of the plurality of categories determined to be most similar in step (f) into a third list; (h) utilizing the rule base to select certain ones of the plurality of categories determined to be most similar to the recognized keywords over other ones of the plurality of categories based on the first and second lists; and (i) modifying the third list of the most similar categories to include the certain ones of the plurality of categories selected.
-
-
20. A method for classifying natural language text input into a computer system, the system includes memory having a domain specific knowledge base having a plurality of categories stored therein, the knowledge base including a lexicon that includes words, phrases and expressions and a keyword class hierarchy structured such that keywords that share something in common are grouped into classes, each class has associated facts that are true when a member of the class is identified in the natural language inputs text, the method comprising the steps of:
-
(a) accepting as input natural language input text; (b) parsing the natural language input text into a first list of recognized keywords; (c) using the first list to deduce further facts from the natural language input text comprising the sub-steps of; (1) searching the keyword class hierarchy for all classes of which an identified keyword in the first list is a member, (2) locating all substitution keywords associated with each class of which the identified keyword is a member, (3) retrieving the located substitution keywords, (4) substituting the located substitution keywords for the identified keyword, (5) using the located substitution keywords to identify matches between the located substitution keywords and phrases in the lexicon, (6) recursively applying sub-steps (2) through (5) on all classes above the classes of which the identified keyword is a member in the keyword class hierarchy, and (7) repeating sub-steps (1) through (6) for each keyword in the first list; (d) compiling the deduced facts into a second list; (e) calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text; (f) applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list; and (g) compiling the ones of the plurality of categories determined to be most similar in step (f) into a third list.
-
-
21. A text classification system comprising:
-
memory; a domain specific knowledge base stored in said memory having a plurality of categories, the domain specific knowledge base including a rule base; and a computer coupled to the memory, the computer including; a natural language module for accepting as input into the computer natural language input text, the natural language module includes means for parsing the natural language input text into a first list of recognized keywords; an intelligent inferencer module for using the first list to deduce further facts from the information explicitly stated in the natural language input text, the intelligent inferencer module includes means for compiling the deduced facts into a second list; a similarity measuring module for calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text, the similarity measuring module includes; means for applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the natural language input text, and means for compiling the ones of the plurality of categories determined to be most similar into a third list; and a category disambiguation module for utilizing the rule base to select certain ones of the plurality of categories determined to be most similar to the recognized keywords over other ones of the plurality of categories based on the first and second lists, the category disambiguation module includes means for modifying the third list of the most similar categories to include the certain ones of the plurality of categories selected.
-
-
22. A text classification system comprising:
-
a memory; a domain specific knowledge base stored in said memory having a rule base and a plurality of categories; and a computer coupled to the memory, the computer including; means for accepting as input into the computer, natural language input text, means for parsing the natural language input text into a first list of recognized keywords, means for using the first list to deduce further facts from the natural language input text, means for compiling the deduced facts into a second list, means for calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text, means for applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list, means for compiling the ones of the plurality of categories determined to be most similar into a third list, means for utilizing the rule base to select certain ones of the plurality of categories that were determined to be most similar to the recognized keywords over other ones of the plurality of categories based on the first and second lists, and means for modifying the third list of the most similar categories to include the certain ones of the plurality of categories selected.
-
-
23. A text classification system comprising:
-
a memory; a domain specific knowledge base stored in said memory having a plurality of categories; and a computer coupled to the memory, the computer including; means for accepting as input into the computer, natural language input text, means for parsing the natural language input text into a first list of recognized keywords, means for using the first list to deduce further facts from the natural language input text, means for compiling the deduced facts into a second list, means for calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text, means for applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list, means for calculating a value for the dynamic threshold based upon a similarity score of a most similar category and a predefined threshold offset, means for classifying the categories based upon their respective similarity scores by discarding categories whose similarity scores are below the threshold value, and means for compiling the ones of the plurality of categories determined to be most similar into a third list.
-
-
24. A text classification system comprising:
-
a memory; a domain specific knowledge base stored in said memory having a plurality of categories, the domain specific knowledge base including a knowledge base of keyword/category profiles, each category in the keyword/category profiles knowledge base having an associated profile which indicates what information provides evidence for a given category, the keyword/profile weight knowledge base is arranged to have associated with each keyword in a profile a profile weight that represents the amount of evidence a keyword provides for a given category; and a computer coupled to the memory, the computer including; means for accepting as input into the computer, natural language input text, means for parsing the natural language input text into a first list of recognized keywords, means for using the first list to deduce further facts from the natural language input text, means for compiling the deduced facts into a second list, means for calculating a numeric similarity score for each one of the plurality of categories in the knowledge base to indicate how similar one of the plurality of categories is to the natural language input text, means for applying a dynamic threshold to determine which ones of the plurality of categories are most similar to the recognized keywords of the first list, means for compiling the ones of the plurality of categories determined to be most similar into a third list, and means for adjusting the profile weights in the keyword/category profiles in the domain specific knowledge base based upon the ones of the plurality of categories determined most relevant to the natural language input text and a second ones of the plurality of categories determined most relevant to the natural language input text by an external source.
-
Specification