Real-time data mining
First Claim
1. A computer-implemented method comprising:
- using a processor to mine user related content information, wherein the information is mined from an information repository;
filtering the mined user related content information from the information repository, wherein the filtering comprises identifying a subset of the mined user related content information comprising information related to a predetermined category;
identifying, using a cosine similarity measure, a plurality of words having a similarity to a seed set of words by analyzing the subset, using a plurality of analyzers, wherein each analyzer is configured to capture a plurality of representational variations, from the information repository, related to the seed set of words;
classifying, based on the analyzing, the plurality of representational variations, wherein the classifying comprises ranking the filtered user related content information;
combining the classified plurality of representational variations of the user related content information from each of the plurality of analyzers, wherein the combining comprises identifying a relevancy of a representational variation to a user intent based upon the ranking of the filtered user related content information; and
training a classifier for characterizing real-time intention content from information repositories using the combined plurality of representational variations.
2 Assignments
0 Petitions
Accused Products
Abstract
A significant recent trend in the internet and mobile telephony has been the dominance of user generated content. As such, in mobile technology have permitted users to upload content onto the internet, whereby sites provide an easily accessible and manageable medium for users to share their thoughts and form a portal for media-rich exchanges. It has been found that much of what is exchanged by users in such settings is context-sensitive, ranging from users'"'"' moods and opinions, to communication about users'"'"' plans. Broadly contemplated herein, in accordance with at least one embodiment of the invention, is the employment of data mining in information repositories settings to efficiently classify an information stream in real-time and thereby discern user intent.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
using a processor to mine user related content information, wherein the information is mined from an information repository; filtering the mined user related content information from the information repository, wherein the filtering comprises identifying a subset of the mined user related content information comprising information related to a predetermined category; identifying, using a cosine similarity measure, a plurality of words having a similarity to a seed set of words by analyzing the subset, using a plurality of analyzers, wherein each analyzer is configured to capture a plurality of representational variations, from the information repository, related to the seed set of words; classifying, based on the analyzing, the plurality of representational variations, wherein the classifying comprises ranking the filtered user related content information; combining the classified plurality of representational variations of the user related content information from each of the plurality of analyzers, wherein the combining comprises identifying a relevancy of a representational variation to a user intent based upon the ranking of the filtered user related content information; and training a classifier for characterizing real-time intention content from information repositories using the combined plurality of representational variations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
at least a processor and a memory configured for; mining user related content information, wherein the information is mined from an information repository; filtering the mined user related content information from the information repository, wherein the filtering comprises identifying a subset of the mined user related content information comprising information related to a predetermined category; identifying, using a cosine similarity measure, a plurality of words having a similarity to a seed set of words by analyzing the subset using a plurality of analyzers, wherein each analyzer is configured to capture a plurality of representational variations, from the information repository, related to the seed set of words; classifying, based on the analyzing, the plurality of representational variations, wherein the classifying comprises ranking the filtered user related content information; combining the classified plurality of representational variations of the user content information from each of the plurality of analyzers, wherein the combining comprises identifying a relevancy of a representational variation to a user intent based upon the ranking of the filtered user related content information; and training a classifier for characterizing real-time intention content from information repositories using the combined plurality of representational variations. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification