Classification method and apparatus
First Claim
1. A method for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said method comprising the following steps:
- processing said plurality of documents according to a plurality of classification schemes;
representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space;
representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, creating a Voronoi-tessellation of said vector space separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes; and
calculating a maximum margin surrounding said hyperplanes in said vector space such that said margin contains none of the vectors contained in the subspaces corresponding to said classification classes.
14 Assignments
0 Petitions
Accused Products
Abstract
A method for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said method comprising the following steps: representing each of said plurality of documents by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; representing the classification of said already classified documents into classes by separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class.
-
Citations
20 Claims
-
1. A method for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said method comprising the following steps:
-
processing said plurality of documents according to a plurality of classification schemes;
representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space;representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, creating a Voronoi-tessellation of said vector space separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes; and calculating a maximum margin surrounding said hyperplanes in said vector space such that said margin contains none of the vectors contained in the subspaces corresponding to said classification classes. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
2. A method for the classification of a document digitally represented in a computer into one of a plurality of classes in each of a plurality of predefined classification schemes, said document respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said method comprising the steps of:
-
representing said document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; classifying said document into one of said plurality of classes in each of said classification schemes by determining, for each of said classification schemes, into which of a plurality of subspaces said vector falls, said subspaces being formed by creating a Voronoi-tessellation of said vector space separating said vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes for each of said classification schemes; and calculating a location of said vector in at least one said subspaces by calculating a distance of said vector from said hyperplanes. - View Dependent Claims (3, 4, 5, 6)
-
-
12. An apparatus for classifying a plurality of documents, which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said apparatus comprising a processor responsive to a program of stored instructions for:
-
processing said plurality of documents according to a plurality of classification schemes;
representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space;representing the classification of said already classified documents into classes by a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, creating a Voronoi-tessellation of said n-dimensional vector space separating said n-dimensional vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes; and calculating a maximum margin surrounding said hyperplanes in said vector space such that said margin contains none of the vectors contained in the subspaces corresponding to said classification classes.
-
-
13. An apparatus for classifying a document digitally represented in a computer into one of a plurality of classes in each of a plurality of predefined classification schemes, said document respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said document belonging to one of a plurality of classes, said apparatus comprising a processor responsive to a stored program of instructions for:
-
representing said document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; classifying said document into one of said plurality of classes in each of said classification schemes by determining, for each of said classification schemes, into which of a plurality of subspaces said vector falls, said subspaces being formed by creating a Voronoi-tessellation of said vector space separating said n-dimensional vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes for each of said classification schemes; and calculating a location of said vector in at least one said subspaces by calculating a distance of said vector from said hyperplanes.
-
-
14. A software tool embodied on a computer-readable medium for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said software tool comprising:
-
computer program code for processing said plurality of documents according to a plurality of classification schemes; computer program code for representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; computer program code for representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, creating a Voronoi-tessellation of said vector space separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes; and computer program code for calculating a maximum margin surrounding said hyperplanes in said vector space such that said margin contains none of the vectors contained in the subspaces corresponding to said classification classes.
-
-
15. A software tool embodied on a computer-readable medium for the classification of a document digitally represented in a computer into one of a plurality of classes in each of a plurality of classification schemes, said document respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said method classifying said document as belonging to one of a plurality of classes, and said software tool comprising:
-
computer program code for representing said document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; computer program code for classifying said document into one of said plurality of classes in each of said classification schemes by determining into which of a plurality of subspaces said vector falls for each of said classification schemes, said subspaces being formed by creating a Voronoi-tessellation of said vector space separating said vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes; and computer program code for calculating a location of said vector in at least one said subspaces by calculating a distance of said vector from said hyperplanes.
-
-
16. A computer readable medium having embodied thereon computer program code, said computer program code comprising:
-
a computer program code portion for processing a plurality of documents according to a plurality of classification schemes;
a computer program code portion for representing each of said plurality of documents in each said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; anda computer program code portion for representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, creating a Voronoi-tessellation of said vector space separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes; and a computer program code portion for calculating a maximum margin surrounding said hyperplanes in said vector space such that said margin contains none of the vectors contained in the subspaces corresponding to said classification classes.
-
-
17. A computer readable medium having embodied thereon computer program code, said computer program code comprising:
-
a computer program code portion for representing a document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; and a computer program code portion for classifying said document into one of said plurality of classes in each of a plurality of predefined classification schemes by determining into which of a plurality of subspaces said vector falls for each of said classification schemes, said subspaces being formed in each of said classification schemes by creating a Voronoi-tessellation of an n-dimensional vector space separating said vector space spanned up by said vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes in each of said classification schemes; and a computer program code portion for calculating a location of said vector in at least one said subspaces by calculating a distance of said vector from said hyperplanes.
-
-
18. A computer readable medium comprising computer program code for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said computer program code comprising:
-
a computer program code portion for processing said plurality of documents according to a plurality of classification schemes;
a computer program code portion for representing each of said plurality of documents in each said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space;a computer program code portion for representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, creating a Voronoi-tessellation of said vector space separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes; and a computer program code portion for calculating a maximum margin surrounding said hyperplanes in said vector space such that said margin contains none of the vectors contained in the subspaces corresponding to said classification classes.
-
-
19. A computer readable medium comprising computer program code for the classification of a document digitally represented in a computer into one of a plurality of classes in each of a predefined plurality of classification schemes, said document respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said method classifying said document as belonging to one of a plurality of classes, said computer program code comprising:
-
a computer program code portion for representing said document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; a computer program code portion for classifying said document into, for each of said classification schemes, one of said plurality of classes in a respective one of said classification schemes by determining into which of a plurality of subspaces said vector falls, said subspaces being formed by creating a Voronoi-tessellation of said vector space separating said vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes for each of said classification schemes; and a computer program code portion for calculating a location of said vector in at least one said subspaces by calculating a distance of said vector from said hyperplanes.
-
-
20. A method of generating a classification model, the classification model being stored in a data structure on a computer-readable medium, said data structure representing a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, wherein said classification model is generated by the steps of:
-
processing said plurality of documents according to each of a plurality of classification schemes; representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, creating a Voronoi-tessellation of said vector space separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes; and calculating a maximum margin surrounding said hyperplanes in said vector space such that said margin contains none of the vectors contained in the subspaces corresponding to said classification classes.
-
Specification