Document categorisation system
First Claim
1. A document categorisation system including:
- a clusterer for generating clusters of related electronic documents based on features extracted from said documents; and
a filter module for generating a filter on the basis of said clusters to categorise further documents received by said system.
2 Assignments
0 Petitions
Accused Products
Abstract
A document categorisation system, including a clusterer for generating clusters of related electronic documents based on features extracted from said documents, and a filter module for generating a filter on the basis of said clusters to categorise further documents received by said system. The system may include an editor for manually browsing and modifying the clusters. The categorisation of the documents is based on n-grams, which are used to determine significant features of the documents. The system includes a trend analyzer for determining trends of changing document categories over time, and for identifying novel clusters. The system may be implemented as a plug-in module for a spreadsheet application, providing a convenient means for one-off or ongoing analysis of text entries in a worksheet.
124 Citations
39 Claims
-
1. A document categorisation system including:
-
a clusterer for generating clusters of related electronic documents based on features extracted from said documents; and
a filter module for generating a filter on the basis of said clusters to categorise further documents received by said system. - View Dependent Claims (3, 4, 6, 8, 9)
-
-
2. A document categorisation system including:
-
a clusterer for generating clusters of related electronic documents based on features extracted from said documents; and
an editor for browsing and modifying said clusters.
-
-
5. A document categorisation system including:
-
an editor for browsing and modifying clustered documents; and
a filter module for generating a filter on the basis of features of said clusters to categorise further documents received by said system.
-
-
7. A document categorisation system including:
-
a clusterer for generating clusters of documents by executing unsupervised learning on said documents; and
a filter module for generating a filter to categorise received documents by executing supervised learning on said clusters.
-
- 10. A method for categorising documents, including creating categories for said documents based on feature extraction, where said features include at least one of n-grams, words and phrases.
-
11. A method for categorising documents, including:
-
creating categories for said documents, based on feature extraction; and
manually modifying said categories with a category editor.
-
-
14. A method for categorising a document, including:
-
creating a document filter for a pre-existing document category by analysing pre-existing documents in said category; and
applying said filter to said document in order to determine whether said document belongs in said category. - View Dependent Claims (15, 16, 17, 19)
-
-
21. A data categorisation module for use with a spreadsheet application, said module including:
-
a cluster module for generating clusters of related data from data in a document of the spreadsheet application, based on extracted features of said data; and
a training module for generating a filter on the basis of said clusters to categorise further data. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A data categorisation module for use with a spreadsheet application, said module including a cluster module for generating clusters of related data from data in a document of the spreadsheet application, based on extracted features of said data.
-
39. A method of data categorisation in a spreadsheet application, including the steps of:
-
a cluster module for generating clusters of related data from data in a document of the spreadsheet application, based on extracted features of said data; and
a training module for generating a filter on the basis of said clusters to categorise further data.
-
Specification