Categorization based text processing
First Claim
1. A configurable system for determining for a given electronically represented text document which linguistic analysis and extraction processes and which application specific processes should be invoked to provide formatted answers to a user'"'"'s query comprising:
- a categorizer receiving an input text document and classifying the text document into zero, one or more categories;
a processor selection system receiving a list of categories from the categorizer and determining which of a plurality of extractor processes to invoke on the input document based on categorization of the input text document into zero, one or more categories; and
an extractor system implementing the plurality of extractor processes invoked by the processor selection system for extracting data from the input text document and formatting the extracted data.
1 Assignment
0 Petitions
Accused Products
Abstract
A rules based configurable system efficiently and effectively determines for a given electronically represented text document which linguistic analysis and extraction processes and which application specific processes should be invoked to provide more accurate answers to a user'"'"'s query. In a rules based classifier, where each category or topic is represented by a set of rules, in an application such as routing, the categorization effecting the routing can be effectively combined with processes extracting other information. This may be in the form of a prompt for the user to input additional information.
98 Citations
10 Claims
-
1. A configurable system for determining for a given electronically represented text document which linguistic analysis and extraction processes and which application specific processes should be invoked to provide formatted answers to a user'"'"'s query comprising:
-
a categorizer receiving an input text document and classifying the text document into zero, one or more categories;
a processor selection system receiving a list of categories from the categorizer and determining which of a plurality of extractor processes to invoke on the input document based on categorization of the input text document into zero, one or more categories; and
an extractor system implementing the plurality of extractor processes invoked by the processor selection system for extracting data from the input text document and formatting the extracted data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer implemented method of extracting formatted information from unformatted text files to provide formatted answers to a user'"'"'s query comprising the steps of:
-
receiving an input text document;
classifying the input text document into zero, one or more categories;
determining which of a plurality of extractor processes to invoke on the input document based on categorization of the input text document;
invoking one or more of said plurality of extractor processes; and
extracting data from the input text document and formatting the extracted data. - View Dependent Claims (7, 8, 9, 10)
receiving formatted extracted data; and
routing the formatted extracted date to a message handler based on categorization of the input text document.
-
-
10. A method as recited in claim 6, wherein the step of determining which of a plurality of extractor processes to invoke on the input document based on categorization of the input text document, further comprises the steps of:
-
identifying a confidence level for each classified category of the input text document; and
selecting a subset of extractor processes associated with a 100% confidence level for each classified category of the input text document, where the plurality of extractor processes to invoke comprises the selected subsets of extractor processes.
-
Specification