Classification method and apparatus
First Claim
1. A method for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said method comprising the following steps:
- processing said plurality of documents according to a plurality of classification schemes;
representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space;
representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes.
13 Assignments
0 Petitions
Accused Products
Abstract
A method for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said method comprising the following steps: representing each of said plurality of documents by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; representing the classification of said already classified documents into classes by separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class.
163 Citations
23 Claims
-
1. A method for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said method comprising the following steps:
-
processing said plurality of documents according to a plurality of classification schemes; representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes. - View Dependent Claims (3, 5, 6, 9, 11, 12, 14)
-
-
2. A method for the classification of a document digitally represented in a computer into one of a plurality of classes in each of a plurality of predefined classification schemes, said document respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said method comprising the steps of:
-
representing said document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; classifying said document into one of said plurality of classes in each of said classification schemes by determining, for each of said classification schemes, into which of a plurality of subspaces said vector falls, said subspaces being formed by separating said vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes for each of said classification schemes. - View Dependent Claims (4, 7, 8, 10, 13)
-
-
15. An apparatus for classifying a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said apparatus comprising:
-
means for processing said plurality of documents according to a plurality of classification schemes; means for representing each of said plurality of documents in each said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; and means for representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes.
-
-
16. An apparatus for classifying a document digitally represented in a computer into one of a plurality of classes in each of a plurality of predefined classification schemes, said document respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said method classifying said document as belonging to one of a plurality of classes, said apparatus comprising:
-
means for representing said document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; and means for classifying said document into one of said plurality of classes in each of said classification schemes by, for each of said classification schemes, determining into which of a plurality of subspaces said vector falls, said subspaces being formed by separating said vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes defined by a corresponding one of said plurality of classification schemes.
-
-
17. A software tool embodied on a computer-readable medium for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, and said software tool comprising the following:
-
computer program code for processing said plurality of documents according to a plurality of classification schemes; computer program code for representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; computer program code for representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes.
-
-
18. A software tool embodied on a computer-readable medium for the classification of a document digitally represented in a computer into one of a plurality of classes in each of a plurality of classification schemes, said document respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said method classifying said document as belonging to one of a plurality of classes, and said software tool comprising the following:
-
computer program code for representing said document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; computer program code for classifying said document into one of said plurality of classes in each of said classification schemes by determining into which of a plurality of subspaces said vector falls for each of said classification schemes, said subspaces being formed by separating said vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes.
-
-
19. A computer readable medium having embodied thereon computer program code, said computer program code comprising:
-
a computer program code portion for processing a plurality of documents according to a plurality of classification schemes; a computer program code portion for representing each of said plurality of documents in each said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; a computer program code portion for representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes.
-
-
20. A computer readable medium having embodied thereon computer program code, said computer program code comprising:
-
a computer program code portion for representing a document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; a computer program code portion for classifying said document into one of said plurality of classes in each of a plurality of predefined classification schemes by determining into which of a plurality of subspaces said vector falls for each of said classification schemes, said subspaces being formed in each of said classification schemes by separating said vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes in each of said classification schemes.
-
-
21. A computer-readable medium comprising computer program code for building a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said computer program code comprising:
-
a computer program code portion for processing said plurality of documents according to a plurality of classification schemes; a computer program code portion for representing each of said plurality of documents in each said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; and a computer program code portion for representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes.
-
-
22. A computer-readable medium comprising computer program code for the classification of a document digitally represented in a computer into one of a plurality of classes in each of a predefined plurality of classification schemes, said document respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, said method classifying said document as belonging to one of a plurality of classes, said computer program code comprising:
-
a computer program code portion for representing said document by a vector of n dimensions, said n dimensions spanning up a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector; and a computer program code portion for classifying said document into, for each of said classification schemes, one of said plurality of classes in a respective one of said classification schemes by determining into which of a plurality of subspaces said vector falls, said subspaces being formed by separating said vector space spanned up by said n-dimensional vector through one or more hyperplanes to define said subspaces such that each subspace corresponds to one of said plurality of classes for each of said classification schemes.
-
-
23. A data structure being stored on a computer-readable medium, said data structure representing a classification model for classifying unclassified documents based on the classification of a plurality of documents which respectively have been classified as belonging to one of a plurality of classes, said documents being digitally represented in a computer, said documents respectively comprising a plurality of terms which respectively comprise one or more symbols of a finite set of symbols, wherein said classification model is generated by the steps of:
-
processing said plurality of documents according to each of a plurality of classification schemes; representing each of said plurality of documents in each of said classification schemes by a vector of n dimensions, said n dimensions forming a vector space, whereas the value of each dimension of said vector corresponds to the frequency of occurrence of a certain term in the document corresponding to said vector, so that said n dimensions span up a vector space; and representing the classification of said already classified documents into a plurality of classes corresponding to respective ones of said plurality of classification schemes by, for each of said classification schemes, separating said vector space into a plurality of subspaces by one or more hyperplanes, such that each subspace comprises one or more documents as represented by their corresponding vectors in said vector space, so that said each subspace corresponds to a class defined by a corresponding one of said plurality of classification schemes.
-
Specification