Methods and apparatuses for determining and designating classifications of electronic documents
First Claim
1. A method comprising:
- defining a multi-dimensional vector space;
reducing each of a plurality of electronic documents to a corresponding multi-dimensional vector based upon the defined multi-dimensional vector space;
calculating a distance between each corresponding multi-dimensional vector of one or more portions of the plurality of corresponding multi-dimensional vectors, each portion of the plurality of corresponding multi-dimensional vectors containing a plurality of corresponding multi-dimensional vectors; and
determining one or more classifications for one or more respective portions of the electronic documents based upon the calculated distances, properties of the multi-dimensional vectors, and properties of the defined multi-dimensional vector space.
5 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide methods and apparatuses for automatically determining and designating classifications of electronic documents. In accordance with one embodiment of the invention, each of a plurality of electronic documents is reduced to a corresponding multidimensional vector based on a multi-dimensional vector space. The distances between multi-dimensional vectors are then evaluated. Multi-dimensional vectors within a specified distance of one another are considered to be a multi-dimensional vector cluster. The multi-dimensional vector space may contain one or more such clusters. Each cluster represents a distinct classification and the electronic documents corresponding to the multi-dimensional vectors of a cluster are classified as such. For one embodiment of the invention features of the electronic documents corresponding to the multi-dimensional vectors of a cluster are used to designate the classification represented by the cluster.
-
Citations
81 Claims
-
1. A method comprising:
-
defining a multi-dimensional vector space;
reducing each of a plurality of electronic documents to a corresponding multi-dimensional vector based upon the defined multi-dimensional vector space;
calculating a distance between each corresponding multi-dimensional vector of one or more portions of the plurality of corresponding multi-dimensional vectors, each portion of the plurality of corresponding multi-dimensional vectors containing a plurality of corresponding multi-dimensional vectors; and
determining one or more classifications for one or more respective portions of the electronic documents based upon the calculated distances, properties of the multi-dimensional vectors, and properties of the defined multi-dimensional vector space. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A machine-readable medium having stored thereon a set of instructions which when executed cause a system to perform a method comprising:
-
defining a multi-dimensional vector space;
reducing each of a plurality of electronic documents to a corresponding multi-dimensional vector based upon the defined multi-dimensional vector space;
calculating a distance between each corresponding multi-dimensional vector of one or more portions of the plurality of corresponding multi-dimensional vectors, each portion of the plurality of corresponding multi-dimensional vectors containing a plurality of corresponding multi-dimensional vectors; and
determining one or more classifications for one or more respective portions of the electronic documents based upon the calculated distances, properties of the multi-dimensional vectors, and properties of the defined multi-dimensional vector space. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54)
-
-
55. A system comprising:
-
a processor;
a network interface coupled to the processor; and
a machine-readable medium having stored thereon a set of instructions which when executed cause the system to perform a method comprising;
reducing each of a plurality of electronic documents to a corresponding multi-dimensional vector based upon the defined multi-dimensional vector space;
calculating a distance between each corresponding multi-dimensional vector of one or more portions of the plurality of corresponding multi-dimensional vectors, each portion of the plurality of corresponding multi-dimensional vectors containing a plurality of corresponding multi-dimensional vectors; and
determining one or more classifications for one or more respective portions of the electronic documents based upon the calculated distances, properties of the multi-dimensional vectors, and properties of the defined multi-dimensional vector space. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81)
-
Specification