Method and system for extracting user behavior features to personalize recommendations
First Claim
1. A method comprising:
- obtaining clickstream data of a current user, the clickstream data including a plurality of clickstream data points at a website;
dividing the plurality of clickstream data points into one or more sessions, difference between click times of any two adjacent sessions sequentially sorted by their respective click times being less than or equal to a preset time threshold;
computing a click path correlation between the current user and other users using the clickstream data, the computing the click path correlation including;
generating a clickstream path tree in each session using a current webpage and its source webpage, the clickstream path tree including a node and a path, the node being the current webpage, and the path indicating a connection between the current webpage and its source webpage;
generating a weighted-directed graph by merging one or more clickstream path trees;
assigning a hierarchical weight to each merged node; and
assigning a proportional weight to each merged path;
selecting X other users whose click path correlations with the current user rank among the highest, X being a positive integer;
configuring a comprehensive weight in connection to each of preset tags of webpages visited by the selected X other users; and
computing a user correlation between the current user and the X other users based on the preset tags and comprehensive weights.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for extracting user features based on user behaviors. The method uses webpage clickstream data of a current user to compute a path correlation between the current user and other users, selects a number of other users whose path correlation with the current user ranks among the highest, and then configures weights in connection to preset tags of websites visited by the selected other users, and computes a user correlation between the current user and the selected other users based on the preset tags and the weights. The method constructs weighted-directed graphs of webpage click paths based on click stream data, and converts computing user correlation to computing a similarity of weighted-directed graphs. The method further combines computing correlation of webpage tags to discover the user'"'"'s clicking habits and personal preferences, and improve the accuracy and efficiency of user clustering.
20 Citations
20 Claims
-
1. A method comprising:
-
obtaining clickstream data of a current user, the clickstream data including a plurality of clickstream data points at a website; dividing the plurality of clickstream data points into one or more sessions, difference between click times of any two adjacent sessions sequentially sorted by their respective click times being less than or equal to a preset time threshold; computing a click path correlation between the current user and other users using the clickstream data, the computing the click path correlation including; generating a clickstream path tree in each session using a current webpage and its source webpage, the clickstream path tree including a node and a path, the node being the current webpage, and the path indicating a connection between the current webpage and its source webpage; generating a weighted-directed graph by merging one or more clickstream path trees; assigning a hierarchical weight to each merged node; and assigning a proportional weight to each merged path; selecting X other users whose click path correlations with the current user rank among the highest, X being a positive integer; configuring a comprehensive weight in connection to each of preset tags of webpages visited by the selected X other users; and computing a user correlation between the current user and the X other users based on the preset tags and comprehensive weights. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method comprising:
-
obtaining user information including a user identifier; selecting Z other users who have a user similarity ranked among the highest to the user, where Z is a positive integer; and making a recommendation to the user based on information of the selected Z other users, wherein the user similarity is generated by a process comprising; obtaining clickstream data of the user, the clickstream data including a plurality of clickstream data points; dividing the plurality of clickstream data points into one or more sessions, a difference between click times of any two adjacent sessions sequentially sorted by their respective click times being less than or equal to a preset time threshold; computing a click path correlation between the user and other users using the clickstream data, the computing the click path correlation including; generating a clickstream path tree in each session using a current webpage and its source webpage, the clickstream path tree including a node and a path, the node being the current webpage, and the path indicating a connection between the current webpage and its source webpage; generating a weighted-directed graph by merging one or more clickstream path trees; assigning a hierarchical weight to each merged node; and assigning a proportional weight to each merged path; selecting X other users whose click path correlations with the current user rank among the highest, where X is a positive integer; configuring a comprehensive weight in connection to each of preset tags of webpages visited by the selected X other users; and computing a user correlation between the current user and the X other users based on the preset tags and comprehensive weights.
-
-
12. A computer-based apparatus comprising:
-
one or more processors; and one or more memories stored thereon computer-executable instructions that when executed by the one or more processors, cause the one or more processors to perform acts comprising; obtaining clickstream data of a current user, the clickstream data including a plurality of clickstream data points; dividing the plurality of clickstream data points into one or more sessions, difference between click times of any two adjacent sessions sequentially sorted by their respective click times being less than or equal to a preset time threshold; computing a click path correlation between the current user and other users using the clickstream data, the computing the click path correlation including; generating a clickstream path tree in each session using a current webpage and its source webpage, the clickstream path tree including a node and a path, the node being the current webpage, and the path indicating a connection between the current webpage and its source webpage; generating a weighted-directed graph by merging one or more clickstream path trees; assigning a hierarchical weight to each merged node; and assigning a proportional weight to each merged path; selecting X other users whose click path correlation with the current user ranks among the highest, where X is a positive integer; configuring a comprehensive weight in connection to each of preset tags of webpages visited by the selected X other users; and computing a user correlation between the current user and the X other users based on the preset tags and comprehensive weights. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification