System and method for collecting and processing information of an internet user via IP-web correlation
First Claim
1. A method for collecting and processing information of a target comprising a user of a communication network, comprising:
- obtaining a first identifier of the target, wherein the first identifier comprises a first domain and a first handle;
identifying a first Internet site based on the first domain;
accessing, in the first Internet site and based on the first handle, a first public webpage associated with the target;
selecting a first parsing rule from a plurality of parsing rules that is associated with the first Internet site;
extracting, by a central processing unit (CPU) of a computer, a second identifier from the first public webpage using the first parsing rule;
determining, by the CPU and in response to extracting the second identifier, a similarity measure by comparing the second identifier to the first identifier using a pre-determined algorithm;
determining, in response to the similarity measure exceeding a pre-determined threshold and without based on any prior association between the first identifier and the second identifier, that the first identifier and the second identifier identify the same person as the user of the communication network based on a pre-determined criterion; and
collecting information of the target based on the second identifier for including in target data of the target.
2 Assignments
0 Petitions
Accused Products
Abstract
A method is provided for collecting and processing information of a target who is a user of a communication network. The method includes obtaining a first identifier of the target, accessing, based on a handle of the first identifier, a first public webpage associated with the target in a first Internet site identified based on a domain of the first identifier, extracting content of the first public webpage for including in target data of the target, obtaining a third identifier of the target, intercepting a document associated with the target from a private portion of communication network traffic identified based on a domain of the third identifier, extracting content of the document for including in the target data, determining a second identifier by searching the target data, associating the second identifier with the target based on a pre-determined criterion, and collecting information of the target based on the second identifier.
-
Citations
34 Claims
-
1. A method for collecting and processing information of a target comprising a user of a communication network, comprising:
-
obtaining a first identifier of the target, wherein the first identifier comprises a first domain and a first handle; identifying a first Internet site based on the first domain; accessing, in the first Internet site and based on the first handle, a first public webpage associated with the target; selecting a first parsing rule from a plurality of parsing rules that is associated with the first Internet site; extracting, by a central processing unit (CPU) of a computer, a second identifier from the first public webpage using the first parsing rule; determining, by the CPU and in response to extracting the second identifier, a similarity measure by comparing the second identifier to the first identifier using a pre-determined algorithm; determining, in response to the similarity measure exceeding a pre-determined threshold and without based on any prior association between the first identifier and the second identifier, that the first identifier and the second identifier identify the same person as the user of the communication network based on a pre-determined criterion; and collecting information of the target based on the second identifier for including in target data of the target. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer readable medium, embodying instructions when executed by the computer to collect and process information of a target comprising a user of a communication network, the instructions comprising functionality for:
-
obtaining a first identifier of the target, wherein the first identifier comprises a first domain and a first handle; identifying a private portion of communication network traffic based on the first domain; intercepting, from the private portion of communication network traffic and based on the first handle, a document associated with the target; selecting a first parsing rule from a plurality of parsing rules that is associated with the private portion of the communication network traffic; extracting a second identifier from the document using the first parsing rule; determining, in response to extracting the second identifier, a similarity measure by comparing the second identifier to the first identifier using a pre-determined algorithm; determining, in response to the similarity measure exceeding a pre-determined threshold and without based on any prior association between the first identifier and the second identifier, that the first identifier and the second identifier identify the same person as the user of the communication network based on a pre-determined criterion; and collecting information of the target based on the second identifier. - View Dependent Claims (18)
-
-
19. A system for collecting and processing information of a target comprising a user of a communication network, comprising:
-
a repository storing a target profile of the target and target data of the target, wherein the target profile comprises a list of identifiers associated with the target, wherein the list of identifiers associated with the target comprises a list of identifiers belonging to the target and a list of identifiers belonging to associates of the target; a target data population engine comprising; a web crawler configured to extract contents of Internet web pages based on the identifiers associated with the target for including in the target data of the target, wherein the contents of the Internet web pages are extracted using a plurality of parsing rules corresponding to the Internet web pages; a target data analysis engine comprising; an identifier retrieval engine configured to associate an identifier of the identifiers with the target as belonging to the target; and an association retrieval engine configured to associate another identifier of the identifiers with the target as belonging to an associate of the target; a processor; and memory storing instructions when executed by the processor comprising functionalities for; obtaining a first identifier of the target, wherein the first identifier comprises a first domain and a first handle; identifying a first Internet site based on the first domain; accessing, in the first Internet site and based on the first handle, a first public webpage associated with the target; selecting a first parsing rule from a plurality of parsing rules that is associated with the first Internet site; extracting a second identifier from the first public webpage using the first parsing rule; determining, in response to extracting the second identifier, a similarity measure by comparing the second identifier to the first identifier using a pre-determined algorithm; determining, in response to the similarity measure exceeding a pre-determined threshold and without based on any prior association between the first identifier and the second identifier, that the first identifier and the second identifier identify the same person as the user of the communication network based on a pre-determined criterion; and collecting information of the target based on the second identifier for including in target data of the target. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification