Method, apparatus, and system for clustering and classification
First Claim
1. A computer method for labeling an electronic communication data stream comprising the steps of associating an electronic data stream with a predefined class by one or more learning machines including when the electronic communication data stream is ambiguous, comparing the outputs from the learning machines with stored predefined output to determine the label to associate with the electronic communication data stream, and labeling the electronic data stream.
12 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a method, apparatus and system for classification and clustering electronic data streams such as email, images and sound files for identification, sorting and efficient storage. The inventive systems disclose labeling a document as belonging to a predefined class though computer methods that comprise the steps of identifying an electronic data stream using one or more learning machines and comparing the outputs from the machines to determine the label to associate with the data. The method further utilizes learning machines in combination with hashing schemes to cluster and classify documents. In one embodiment hash apparatuses and methods taxonomize clusters. In yet another embodiment, clusters of documents utilize geometric hash to contain the documents in a data corpus without the overhead of search and storage.
342 Citations
16 Claims
- 1. A computer method for labeling an electronic communication data stream comprising the steps of associating an electronic data stream with a predefined class by one or more learning machines including when the electronic communication data stream is ambiguous, comparing the outputs from the learning machines with stored predefined output to determine the label to associate with the electronic communication data stream, and labeling the electronic data stream.
-
12. A computer method for text-classification, the method comprising:
- combining SVM, NB, K-NN, naive-bayes and NN processes to optimize a machine-learning utility of text-classification comparing output of the optimized machine-learning utility to stored text-classifications, and classifying text based on the comparison.
-
13. A computer method for labeling an electronic data stream as belonging to a predefined class comprising the steps of identifying an electronic data stream by one or more learning machines including when the electronic data stream is ambiguous, comparing the outputs from the learning machines with stored predefined output to determine the label to associate with the electronic data stream, pre-defining a label for email users by processing and analyzing aggregate data compiled from an email content and label, and labeling the electronic data stream.
-
14. A computer method for labeling an electronic communication data stream as belonging to a predefined class comprising the steps of identifying an electronic communication data stream by one or more learning machines including when the electronic communication data stream is ambiguous, comparing the outputs from the learning machines with stored predefined output to determine the label to associate with the electronic communication data stream, deciding whether to use a uniform filter or a stackable hash to determine a cluster for the electronic communication data stream, and labeling the electronic data stream.
-
15. A computer method for labeling an electronic communication data stream as belonging to a predefined class comprising the steps of identifying an electronic communication data stream by one or more learning machines including when the electronic communication data stream is ambiguous, comparing the outputs from the learning machines to determine the label to associate with the electronic communication data stream, deciding whether to use a uniform filter or a stackable hash to determine a cluster for a document having identified attributes email, and labeling the electronic communication data stream.
-
16. A computer method for labeling an electronic communication data stream as belonging to a predefined class comprising the steps of identifying an electronic communication data stream by one or more learning machines including when the electronic communication data stream is ambigiuous, comparing the outputs from the learning machines with stored predefined output to determine the label to associate with the electronic communication data stream, determining an acceptable level of accuracy after use of a K-NN methods to divide space into one or more classes, and labeling the electronic communication data stream.
Specification