Apparatus for classifying or disambiguating data
First Claim
1. A computer processing apparatus for classifying a document, comprising:
- a database having a database structure providing a classification scheme having a plurality of different subject matter categories, the database containing a classified vocabulary including a plurality of terms in each of the different subject matter categories with each term being classified in accordance with the classification scheme and the database also containing a classification data set comprising a plurality of groups of terms with each group being associated with a specific different one of the subject matter categories and each group including a plurality of terms exemplifying the associated category for facilitating disambiguation between different meanings of the same term;
means for receiving in computer-readable form a document to be classified;
processor means for comparing terms appearing in the text document with the terms in the database and for determining from the comparison the category for the document; and
means for supplying a signal carrying data representing the document and data associating the document with the determined category.
6 Assignments
0 Petitions
Accused Products
Abstract
A computing system has a data storage device for storing a database including a classified vocabulary of terms. A processor of the apparatus is arranged to associate each term with a number of different categories of data and to associate all terms falling within the same category with a common code identifying a collocation of terms that exemplify that category so that terms in different categories are associated with different codes and can be disambiguated. The processor is arranged to write, directly or indirectly, a classified vocabulary including the terms together with the associated code onto a computer-readable storage medium or to supply an electrical signal via, for example a MODEM or a LAN/WAN. The database may be used in classification of documents, spelling checking of documents and refining of keyword search results.
-
Citations
33 Claims
-
1. A computer processing apparatus for classifying a document, comprising:
-
a database having a database structure providing a classification scheme having a plurality of different subject matter categories, the database containing a classified vocabulary including a plurality of terms in each of the different subject matter categories with each term being classified in accordance with the classification scheme and the database also containing a classification data set comprising a plurality of groups of terms with each group being associated with a specific different one of the subject matter categories and each group including a plurality of terms exemplifying the associated category for facilitating disambiguation between different meanings of the same term;
means for receiving in computer-readable form a document to be classified;
processor means for comparing terms appearing in the text document with the terms in the database and for determining from the comparison the category for the document; and
means for supplying a signal carrying data representing the document and data associating the document with the determined category. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer processing apparatus for classifying a document, comprising:
-
means for accessing a database having a database structure providing a plurality of different subject matter categories, the database containing a classified vocabulary including a plurality of terms in each of the different subject matter categories with each term being classified in accordance with the subject matter category structure of the database and the database also containing a plurality of collocations each collocation being associated with a specific different one of the subject matter categories and each collocation including a plurality of terms exemplifying the associated category for disambiguating a different meaning of the same term;
means for receiving in computer-readable form a text document to be classified;
processor means for comparing terms appearing in the text document with the collocations to determine the collocation having the most terms in common with the document, and for allocating the category of the determined collocation to the document; and
means for supplying a signal carrying data representing the text document and data associating the text document with the determined category. - View Dependent Claims (19, 20, 21)
-
-
22. A method of classifying documents in a computer processing apparatus, comprising:
-
providing a database having a database structure providing a classification scheme having a plurality of different subject matter categories, the database containing a classified vocabulary including a plurality of terms in each of the different subject matter categories with each term being classified in accordance with the classification scheme and the database also containing a classification data set comprising a plurality of groups of terms with each group being associated with a specific different one of the subject matter categories and each group including a plurality of terms exemplifying the associated category whereby the classification data set facilitates disambiguation between different meanings of the same term, and a receiver configured to receive in computer-readable form a text document to be classified;
comparing terms appearing in the text document with the terms in the database;
determining from the comparison the category for the text document; and
supplying a signal carrying data representing the text document and data associating the text document with the determined category. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
28. A method of classifying documents in a computer processing apparatus, comprising:
-
providing a database having a database structure providing a classification scheme having a plurality of different subject matter categories, the database containing a classified vocabulary including a plurality of terms in each of the different subject matter categories with each term being classified in accordance with the classification scheme and the database also containing a classification data set comprising a plurality of collocations of terms with each collocation being associated with a specific different one of the subject matter categories and each collocation including a plurality of terms exemplifying the associated category for disambiguating different meanings of the same term, and a receiver configured to receive in computer-readable form a text document to be classified;
comparing terms appearing in the text document with the collocations to determine the collocation having the most terms in common with the text document;
allocating the category of the determined collocation to the document; and
supplying a signal carrying data representing the text document and data associating the text document with the determined category. - View Dependent Claims (29)
-
-
30. A processor readable medium storing processor readable instructions for causing a processor to:
-
access a database having a database structure providing a classification scheme having a plurality of different subject matter categories, the database containing a classified vocabulary including a plurality of terms in each of the different subject matter categories with each term being classified in accordance with the classification scheme and the database also containing a classification data set comprising a plurality of groups of terms with each group being associated with a specific different one of the subject matter categories and each group including a plurality of terms exemplifying the associated category for facilitating disambiguation of different meanings of the same term;
receive in computer-readable form a text document to be classified;
compare terms appearing in the text document with the terms in the database;
determine from the comparison the category for the document; and
supply a signal carrying data representing the text document and data associating the text document with the determined category.
-
-
31. A computer processing apparatus for classifying documents, the apparatus comprising:
-
a database having a database structure defining a classification scheme for terms;
the classification scheme having subject matter data defining main and subsidiary subject matter domains into which terms can be classified and genera data defining a predetermined number of genera to which terms can be allocated, the classification scheme being such that a term can be allocated to more than one subject matter domain but to only one genus so that each specific combination of subsidiary subject matter domain and genus defines a unique category;
the database also having classified vocabulary comprising a set of terms classified in accordance with the classification scheme such that each term is associated with category data identifying the corresponding category;
the database also including a classification scheme data set which includes a respective different classification scheme data set item associated with each category;
each classification scheme data set item comprising a collocation consisting of a list of terms that may be used to describe the function, appearance or relationship with other objects of classified terms in that category or that may be used in relation to terms in that category;
a receiver operable to receive in computer-readable form a text document to be classified;
a processor configured to compare terms in the text document with terms in at least one of the classified vocabulary and the collocations to determine a category for the text document; and
a signal supplier configured to supply a signal carrying data representing the text document and data associating the text document with the determined category data.
-
-
32. A method of classifying documents, the method comprising:
-
providing a classification scheme having subject matter data defining main and subsidiary subject matter domains into which terms can be classified and genera data defining a predetermined number of genera to which terms can be allocated, the classification scheme being such that a term can be allocated to more than one subject matter domain but to only one genus so that each specific combination of subsidiary subject matter domain and genus defines a unique category;
providing a classified vocabulary comprising a set of terms classified in accordance with the classification scheme such that each term in the classified vocabulary is associated with category data identifying the corresponding category;
providing a classification scheme data set which includes a respective different classification scheme data set item associated with each category with each classification scheme data set item comprising a collocation consisting of a list of terms that may be used to describe the function, appearance or relationship with other objects of classified terms in that category or that may be used in relation to terms in that category;
receiving data representing a text document to be classified; and
comparing terms in the text document with terms in at least one of the classified vocabulary and the collocations to determine a category for the text document.
-
-
33. A computer processing apparatus for classifying documents, the apparatus comprising:
-
a database having a database structure providing a classification scheme having a plurality of different subject matter categories, the database containing a classified vocabulary consisting of a plurality of terms in each of the different subject matter categories with each term being classified in accordance with the classification scheme and the database also containing a classification data set comprising a plurality of groups of terms with each group being associated with a specific different one of the subject matter categories and each group including terms that may be used to describe the function, appearance or relationship with other objects of classified terms in that category or that may be used in relation to terms in that category to facilitate disambiguation between different meanings of the same term;
a receiver configured to receive in computer-readable form a text document to be classified;
a processor configured to use the groups of terms in the classification data set to disambiguate different meanings of terms in the document and to determine a category for the text document using the database; and
a signal supplier configured to supply a signal carrying data representing the text document and data associating the text document with the determined category data.
-
Specification