System and method for generating personal vocabulary from network data
First Claim
1. A method, comprising:
- receiving data propagating in a network environment at a streaming database feeder;
ignoring Joint Photographic Experts Group (JPEG) documents in the data;
updating tags for each user in the network environment using a user-sub stream created for the user by the streaming database feeder, wherein each user-sub stream includes at least a portion of the data propagating in the network environment, wherein the tags are words and phrases that are associated with each user, wherein the data includes documents and, for at least a portion of the documents in the data, each original document is copied to create an anonymous document and a document that contains selected words within the data based on a whitelist, wherein the whitelist includes a plurality of designated words to be tagged, wherein documents that include data in a blacklist are dropped, and wherein the anonymous documents contain a concept field and some of the data in the anonymous documents is selected for the whitelist, and wherein the document that contains selected words does not include the concept field;
assigning a weight to the selected words based on at least one characteristic associated with the data;
associating the selected words to an individual, wherein the weight for a selected word is higher if the individual propagates the data; and
generating a resultant composite of the selected words that are tagged.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided in one example and includes receiving data propagating in a network environment, and identifying selected words within the data based on a whitelist. The whitelist includes a plurality of designated words to be tagged. The method further includes assigning a weight to the selected words based on at least one characteristic associated with the data, and associating the selected words to an individual. A resultant composite is generated for the selected words that are tagged. In more specific embodiments, the resultant composite is partitioned amongst a plurality of individuals associated with the data propagating in the network environment. A social graph can be generated that identifies a relationship between a selected individual and the plurality of individuals based on a plurality of words exchanged between the selected individual and the plurality of individuals.
-
Citations
20 Claims
-
1. A method, comprising:
-
receiving data propagating in a network environment at a streaming database feeder; ignoring Joint Photographic Experts Group (JPEG) documents in the data; updating tags for each user in the network environment using a user-sub stream created for the user by the streaming database feeder, wherein each user-sub stream includes at least a portion of the data propagating in the network environment, wherein the tags are words and phrases that are associated with each user, wherein the data includes documents and, for at least a portion of the documents in the data, each original document is copied to create an anonymous document and a document that contains selected words within the data based on a whitelist, wherein the whitelist includes a plurality of designated words to be tagged, wherein documents that include data in a blacklist are dropped, and wherein the anonymous documents contain a concept field and some of the data in the anonymous documents is selected for the whitelist, and wherein the document that contains selected words does not include the concept field; assigning a weight to the selected words based on at least one characteristic associated with the data; associating the selected words to an individual, wherein the weight for a selected word is higher if the individual propagates the data; and generating a resultant composite of the selected words that are tagged. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. Logic encoded in one or more non-transitory media that includes code for execution and when executed by a processor is operable to perform operations comprising:
-
receiving data propagating in a network environment at a streaming database feeder; ignoring Joint Photographic Experts Group (JPEG) documents from the data; updating tags for each user in the network environment using a user-sub stream created for the user by the streaming database feeder, wherein each user-sub stream includes at least a portion of the data propagating in the network environment, wherein the tags are words and phrases that are associated with each user, wherein the data includes documents and, for at least a portion of the documents in the data, each original document is copied to create an anonymous document and a document that contains selected words within the data based on a whitelist, wherein the whitelist includes a plurality of designated words to be tagged, wherein documents that include data in a blacklist are dropped, and wherein the anonymous documents contain a concept field and some of the data in the anonymous documents is selected for the whitelist, and wherein the document that contains selected words does not include the concept field; assigning a weight to the selected words based on at least one characteristic associated with the data; associating the selected words to an individual, wherein the weight for a selected word is higher if the individual propagates the data; and generating a resultant composite of the selected words that are tagged. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. An apparatus, comprising:
-
a memory element configured to store data; a processor operable to execute instructions associated with the data; a network sensor configured to interface with the memory element and the processor, the network sensor being configured to; receive data propagating in a network environment at a streaming database feeder; ignore Joint Photographic Experts Group (JPEG) documents from the data; update tags for each user in the network environment using a user-sub stream created for the user by the streaming database feeder, wherein each user-sub stream includes at least a portion of the data propagating in the network environment, wherein the tags are words and phrases that are associated with each user, wherein the data includes documents and, for at least a portion of the documents in the data, each original document is copied to create an anonymous document and a document that contains selected words within the data based on a whitelist, wherein the whitelist includes a plurality of designated words to be tagged, wherein documents that include data in a blacklist are dropped, and wherein the anonymous documents contain a concept field and some of the data in the anonymous documents is selected for the whitelist, and wherein the document that contains selected words does not include the concept field; and a weighting module configured to; assign a weight to the selected words based on at least one characteristic associated with the data, wherein the selected words are associated to an individual and the weight for a selected word is higher if the individual propagates the data, and wherein a resultant composite of the selected words that are tagged is generated. - View Dependent Claims (18, 19, 20)
-
Specification