Method, apparatus, and system for clustering and classification
First Claim
Patent Images
1. A computer method for detecting a document having identified attributes comprising:
- (a) converting a binary coded message into numeric values;
(b) computing a hashing vector based upon the numeric values provided to a mathematical function;
(c) comparing a difference between a hashing vector and a stored vector.
10 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a method, apparatus and system for classification and clustering electronic data streams such as email, images and sound files for identification, sorting and efficient storage. The method further utilizes learning machines in combination with hashing schemes to cluster and classify documents. In one embodiment hash apparatuses and methods taxonomize clusters. In yet another embodiment, clusters of documents utilize geometric hash to contain the documents in a data corpus without the overhead of search and storage.
-
Citations
18 Claims
-
1. A computer method for detecting a document having identified attributes comprising:
-
(a) converting a binary coded message into numeric values; (b) computing a hashing vector based upon the numeric values provided to a mathematical function; (c) comparing a difference between a hashing vector and a stored vector. - View Dependent Claims (2, 3)
-
-
4. A computer method for comparing a plurality of documents comprising the steps of:
-
(a) receiving a first document having coded elements into a random access memory; (b) converting the coded elements into a number between two limits; (c) loading a data register serially from the random access memory with at least two adjacent data elements from the document; (d) computing a vector corresponding to at least two associated adjacent data elements and a uniform filter; (e) loading the one data register serially from a means for storing with a next adjacent data element from the document; (f) computing a vector corresponding to at least two associated adjacent data elements and a uniform filter; (g) repeating the steps (e) through (f) until elements from the first document have a corresponding vector; (h) summing each associated vector element to form an associated hashing vector elements; and (i) comparing the hashing vector with an archive of hashing vectors to determine similarity.
-
-
5. A computer method for detecting transmission of a cluster of email, comprising the steps of:
-
(a) receiving one or more email messages; (b) generating hash values, based on one or more portions of the plurality of email messages; (c) generating an associated bit mask value based on one or more portions of the plurality of email messages; (d) determining whether the generated hash values and the associated bit mask values match corresponding hash values and associated bit mask values related to one or more prior email messages in the cluster. - View Dependent Claims (6, 7)
-
-
8. A system for detecting transmission of potentially unwanted e-mails, comprising:
-
means for observing a plurality of e-mails; a means for creating a hashing vector for one or more portions of the plurality of emails, a means to generate hash values, a means to generate bit masks and a means for determining whether the generated hash values and associated bit mask values match hash values and associated bit mask values related to prior emails; and a means for determining that the plurality of emails are potentially unwanted e-mails.
-
-
9. A computer method for improving the accuracy of text classification by operating within an unsure region comprising the steps of:
utilizing a K-NN processor to determine the document having the greatest similarity to the text. - View Dependent Claims (10, 11)
- 12. A computer method for storing email messages comprising the steps of utilizing a stackable hash process to determine the cluster wherein said cluster determines a delta-storage of the email.
- 14. A method for creating an accumulation of documents stored as a cluster by utilizing a process to create a hashing vector to determine whether to add a document to a cluster.
- 16. A computer method for creating an accumulation of documents stored as a set of clusters comprising the step of utilizing a stackable ash to determine whether to add a document to the set of clusters.
Specification