HYBRID HUMAN MACHINE LEARNING SYSTEM AND METHOD
First Claim
1. A computer-implemented-method for analyzing sentiment bearing documents in a hybrid system, comprising:
- sampling a document from the database according to a predetermined selection criteria;
tagging each sample document from the sample documents, each document having one or more pieces of text;
presenting each piece in the document to a group of humans, each human scoring a different attribute associated with an inquiry from the hybrid system;
determining whether the responses from the group of humans for each attribute collectively meet a predetermined threshold;
storing an aggregated score if there is a human agreement from the group of humans for a particular attribute; and
providing an aggregated score as a feedback to a machine learning system for adaptive adjustment of a model associated with the machine learning system for automatic tagging of unsampled documents.
5 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present invention provide a system, method, and article of hybrid human machine learning system with tagging and scoring techniques for sentiment magnitude scoring of textual passages. The combination of machine learning systems with data from human pooled language extraction techniques enable the present system to achieve high accuracy of human sentiment measurement and textual categorization of raw text, blog posts, and social media streams. This information can then be aggregated to provide brand and product strength analysis. A data processing module is configured to get streaming data and then tag the streaming data automatically using the machine learning output. A crowdsourcing module is configured to select a subset of social media posts that have been previously stored in the database, and present the social media posts on the web, which then tags each social media with a selected set of attributes. A score aggregator module configured to provide a score based on a user'"'"'s feedback for each social media post.
-
Citations
20 Claims
-
1. A computer-implemented-method for analyzing sentiment bearing documents in a hybrid system, comprising:
-
sampling a document from the database according to a predetermined selection criteria; tagging each sample document from the sample documents, each document having one or more pieces of text; presenting each piece in the document to a group of humans, each human scoring a different attribute associated with an inquiry from the hybrid system; determining whether the responses from the group of humans for each attribute collectively meet a predetermined threshold; storing an aggregated score if there is a human agreement from the group of humans for a particular attribute; and providing an aggregated score as a feedback to a machine learning system for adaptive adjustment of a model associated with the machine learning system for automatic tagging of unsampled documents.
-
-
2. The method of claim 1, wherein the aggregated score comprises a piece level score aggregation.
-
3. The method of claim 1, wherein the aggregated score comprises an item level score aggregation.
-
4. The method of claim 1, wherein the aggregated score comprises stream level score aggregation.
-
5. The method of claim 1, further comprising data processing to tag unsampled documents using the model in the machine learning system.
-
6. The method of claim 1, wherein the piece comprises a keyword, a phrase, a sentence, or a paragraph.
-
7. The method of claim 1, wherein the unsampled data comprises raw data, historical data, and new data.
-
8. The method of claim 1, wherein the documents comprises social media posts, electronic messages and speech-to-text messages.
-
9. The method of claim 1, wherein the one or more attributes comprises a spam type, a category types, an industry type, and a sentiment magnitude.
-
10. The method of claim 1, wherein the threshold comprises a number and a weighted value.
-
11. A hybrid system for analyzing sentiment bearing documents, comprising:
-
a sampling component configured to sample documents from the database according to a predetermined selection criteria; a tagging component configured to tag each sample document from the sample documents, each document having one or more pieces of text, the tagging component configured to present each piece in the document to a group of humans, each human scoring a different attribute associated with an inquiry from the hybrid system; a score aggregation module configured to determine whether the responses from the group of humans for each attribute collectively meet a predetermined threshold, the score aggregation module configured to receive an aggregated score if there is a human agreement on a particular attribute; and a machine learning module configured to receive an aggregated score as a feedback adaptively adjustment of a model associated with the machine learning module for automatic tagging of unsampled documents.
-
-
12. The system of claim 11, further comprising a data processing module configured to tag unsampled documents using the model in the machine learning module.
-
13. The system of claim 11, wherein the aggregated score comprises a piece level score aggregation.
-
14. The system of claim 11, wherein the aggregated score comprises an item level score aggregation.
-
15. The system of claim 11, wherein the aggregated score comprises stream level score aggregation.
-
16. The system of claim 11, wherein the piece comprises a keyword, a phrase, a sentence, or a paragraph.
-
17. The system of claim 11, wherein the unsampled data comprises raw data, historical data, and new data.
-
18. The system of claim 11, wherein the documents comprises social media posts, electronic messages and speech-to-text messages.
-
19. The system of claim 11, wherein the one or more attributes comprises a spam type, a category types, an industry type, and a sentiment magnitude.
-
20. The method of claim 11, wherein the threshold comprises a number and a weighted value.
Specification