Identifying and reporting on frequent sequences of events in usage data
First Claim
1. A computer-implemented method for identifying sequences of Web pages that are frequently visited in order non-consecutively during user browsing sessions, the Web pages of a sequence visited in order during a user browsing session when each Web page of the sequence is visited before a next Web page of the sequence is visited, the Web pages of a sequence visited non-consecutively during a user browsing session when at least one intervening Web page that is not part of the sequence is visited between the visits to the Web pages of the sequence, the method comprising:
- receiving a Web server log generated by a Web server serving a Website having Web pages, the Web server log reflecting usage of the Website by users;
identifying multiple user browsing sessions from the received Web server log, each identified user browsing session indicating a series of Web pages from the Website that were consecutively visited in order by a user; and
identifying sequences of the Web pages of the Website that were frequently visited in order non-consecutively in the identified user browsing sessions bydetermining a minimum threshold number of user browsing sessions;
determining multiple Web pages that are each visited during more of the identified user browsing sessions than the minimum threshold;
creating a tree data structure having a root node and having a child node of the root node for each of the determined Web pages, each of the determined Web pages represented by one of the children nodes, the children nodes forming a current lowest level of the tree data structure;
repeatedly expanding the tree data structure by adding a new lowest level of nodes that are children nodes to the nodes of a previous lowest level of the tree data structure, the added children nodes such that each of the determined Web pages has a node that represents that determined Web page that is added as a child node to each of the nodes of the previous level, each of the added children nodes having an associated sequence of Web pages consisting of the determined Web pages that are represented by the nodes in a path from the root node to that node;
determining the nodes of the tree data structure whose associated sequence of Web pages is visited in order non-consecutively during more of the identified user browsing sessions than the minimum threshold; and
removing the nodes of the tree data structure that are not among the determined nodes,such that after creation of the tree data structure is completed, the sequences of Web pages that are associated with the nodes remaining in the tree data structure are the identified sequences of the Web pages that were frequently visited in order non-consecutively during the identified user browsing sessions.
10 Assignments
0 Petitions
Accused Products
Abstract
A method, system, and computer-readable medium for identifying sequences of interaction events of interest that frequently occur is described. In particular, techniques are described for receiving multiple groups each having related interaction events in serial or sequential order, and for determining sequences of interaction events that frequently occur in the multiple groups. Reports can also be generated and provided that include information about the determined frequent sequences. The techniques can at times be used to provide a service to customers in which logs containing data about interaction events related to that customer (e.g., usage events for a provided service or of a provided Website) are received or obtained, in which frequent sequences in the log data are identified, and in which reports are provided to representatives of the customer about the frequent sequences (e.g., remotely over the Web based on interactive specifications).
87 Citations
43 Claims
-
1. A computer-implemented method for identifying sequences of Web pages that are frequently visited in order non-consecutively during user browsing sessions, the Web pages of a sequence visited in order during a user browsing session when each Web page of the sequence is visited before a next Web page of the sequence is visited, the Web pages of a sequence visited non-consecutively during a user browsing session when at least one intervening Web page that is not part of the sequence is visited between the visits to the Web pages of the sequence, the method comprising:
-
receiving a Web server log generated by a Web server serving a Website having Web pages, the Web server log reflecting usage of the Website by users; identifying multiple user browsing sessions from the received Web server log, each identified user browsing session indicating a series of Web pages from the Website that were consecutively visited in order by a user; and identifying sequences of the Web pages of the Website that were frequently visited in order non-consecutively in the identified user browsing sessions by determining a minimum threshold number of user browsing sessions; determining multiple Web pages that are each visited during more of the identified user browsing sessions than the minimum threshold; creating a tree data structure having a root node and having a child node of the root node for each of the determined Web pages, each of the determined Web pages represented by one of the children nodes, the children nodes forming a current lowest level of the tree data structure; repeatedly expanding the tree data structure by adding a new lowest level of nodes that are children nodes to the nodes of a previous lowest level of the tree data structure, the added children nodes such that each of the determined Web pages has a node that represents that determined Web page that is added as a child node to each of the nodes of the previous level, each of the added children nodes having an associated sequence of Web pages consisting of the determined Web pages that are represented by the nodes in a path from the root node to that node; determining the nodes of the tree data structure whose associated sequence of Web pages is visited in order non-consecutively during more of the identified user browsing sessions than the minimum threshold; and removing the nodes of the tree data structure that are not among the determined nodes, such that after creation of the tree data structure is completed, the sequences of Web pages that are associated with the nodes remaining in the tree data structure are the identified sequences of the Web pages that were frequently visited in order non-consecutively during the identified user browsing sessions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method for identifying frequent sequences of interaction events, the method comprising:
-
from at least one interaction log that contains data reflecting interactions with at least one executing application, identifying multiple interaction sessions each indicating an ordered series of related interaction events; and identifying sequences of interaction events that frequently occur in the identified interaction sessions by determining multiple interaction events that are each present in at least a first threshold number of the identified interaction sessions; for each of the determined interaction events, identifying a sequence consisting of that interaction event as being a sequence of interaction events of length 1 that frequently occurs, the length 1 being a current longest length of identified sequences; and repeatedly identifying sequences of interaction events of increasing lengths that frequently occur by, for each of the identified sequences of a current longest length, generating candidate sequences of interaction events by adding one of the determined interaction events at the end of the identified sequence; identifying the newly generated candidate sequences that are present in at least a second threshold number of the identified interaction sessions as being sequences of interaction events that frequently occur, each of the identified newly generated sequences having a current longest length that is one greater than the previous longest length; if none of the newly generated candidate sequences are identified as sequences of interaction events that frequently occur, ending the repeated identification of the sequences; and
wherein the identifying of the sequences of interaction events of length 1 includes creating a data structure with a root element and multiple children elements of the root element, each child element representing one of the determined interaction events and being associated with the identified sequence of length 1 that consists of that one determined interaction event, the children elements forming a current lowest level of the data structure, and wherein the generating of candidate sequences of a length one greater than the current longest length includes expanding the data structure by adding a new lowest level of elements to the data structure such that the added elements are children elements to the elements of a previous lowest level of the data structure, each of the added children elements representing one of the determined interaction events and having an associated sequence of interaction events that is one of the generated candidate sequences. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer-readable medium containing instructions that when executed cause a computing device to identify frequent sequences of interaction events by:
-
identifying multiple interaction sessions each indicating an ordered series of related interaction events; and identifying sequences of interaction events that frequently occur in the identified interaction sessions by determining multiple interaction events that are each present in at least a first threshold number of the identified interaction sessions; for each of the determined interaction events, identifying a sequence consisting of that interaction event as being a sequence of interaction events of length 1 that frequently occurs, the length 1 being a current longest length of identified sequences; and repeatedly identifying sequences of interaction events of increasing lengths that frequently occur by, for each of the identified sequences of a current longest length, generating candidate sequences by adding one of the determined interaction events at the end of the identified sequence; identifying the newly generated candidate sequences that are present in at least a second threshold number of the identified interaction sessions as being sequences of interaction events that frequently occur, each of the identified newly generated sequences having a current longest length that is one greater than the previous longest length; if none of the newly generated candidate sequences are identified as sequences of interaction events that frequently occur, ending the repeated identification of the sequences; and
wherein the identifying of the sequences of interaction events of length 1 includes creating a data structure with a root element and multiple children elements of the root element, each child element representing one of the determined interaction events and being associated with the identified sequence of length 1 that consists of that one determined interaction event, the children elements forming a current lowest level of the data structure, and wherein the generating of candidate sequences of a length one greater than the current longest length includes expanding the data structure by adding a new lowest level of elements to the data structure such that the added elements are children elements to the elements of a previous lowest level of the data structure, each of the added children elements representing one of the determined interaction events and having an associated sequence of interaction events that is one of the generated candidate sequences. - View Dependent Claims (23, 24, 25)
-
-
26. A computing device for identifying frequent sequences of interaction events, comprising:
-
an interaction event supplier component capable of identifying multiple interaction sessions each indicating an ordered series of related interaction events from at least one interaction log; and a frequent sequence analyzer component capable of identifying sequences of interaction events that frequently occur in the identified interaction sessions by determining multiple interaction events that are each present in at least a first threshold number of the identified interaction sessions; for each of the determined interaction events, identifying a sequence consisting of that interaction event as being a sequence of interaction events of length 1 that frequently occurs, the length 1 being a current longest length of identified sequences; and repeatedly identifying sequences of interaction events of increasing lengths that frequently occur by, for each of the identified sequences of a current longest length, generating candidate sequences by adding one of the determined interaction events at the end of the identified sequence; identifying the newly generated candidate sequences that are present in at least a second threshold number of the identified interaction sessions as being sequences of interaction events that frequently occur, each of the identified newly generated sequences having a current longest length that is one greater than the previous longest length; if none of the newly generated candidate sequences are identified as sequences of interaction events that frequently occur, ending the repeated identification of the sequences; and
wherein the identifying of the sequences of interaction events of length 1 includes creating a data structure with a root element and multiple children elements of the root element, each child element representing one of the determined interaction events and being associated with the identified sequence of length 1 that consists of that one determined interaction event, the children elements forming a current lowest level of the data structure, and wherein the generating of candidate sequences of a length one greater than the current longest length includes expanding the data structure by adding a new lowest level of elements to the data structure such that the added elements are children elements to the elements of a previous lowest level of the data structure, each of the added children elements representing one of the determined interaction events and having an associated sequence of interaction events that is one of the generated candidate sequences. - View Dependent Claims (27, 28, 29)
-
-
30. A computer-implemented method for identifying frequent sequences of interaction events, the method comprising:
-
from at least one interaction log that contains data reflecting interactions with at least one executing application, identifying multiple interaction sessions each indicating an ordered series of related interaction events; and identifying sequences of interaction events that frequently occur in the identified interaction sessions by determining sequences of interaction events each having a specified number of multiple interaction events such that the determined sequences are each present in at least a first threshold number of the identified interaction sessions; generating candidate sequences of interaction events having varying lengths, each generated candidate sequence such that each subsequence of the candidate sequence that is of a length that is the specified number is one of the determined sequences, by adding at least one interaction event at the end of each of the determined sequences and by repeatedly generating additional sequences by adding at least one interaction event at the end of previously generated sequences; identifying the generated candidate sequences that are present in at least a second threshold number of the identified interaction sessions as being sequences of interaction events that frequently occur;
wherein the generating of the candidate sequences of interaction events includes creating a multi-level data structure having a root element and a plurality of other elements, a first of the levels of the data structure having elements that are children elements of the root element, each of the other levels having elements that are children elements of elements of a previous level, and each of the other elements representing one of the interaction events and being associated with one of the generated candidate sequences; andwherein each element other than the root element has an associated sequential path of elements between the root element and that element, a first element in each sequential path being a child element of the root element, each element in each sequential path other than the first element being a child element of the previous element in the sequential path, and wherein the sequence of interaction events that is associated with each node other than the root node consists of a sequence of the interaction events represented by the elements in the path associated with that node followed by the interaction event represented by that node. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37)
-
-
38. A computer-readable medium containing instructions that when executed cause a computing device to identify frequent sequences of interaction events by:
-
from at least one interaction log that contains data reflecting interactions with at least one executing application, identifying multiple interaction sessions each indicating an ordered series of related interaction events; and identifying sequences of interaction events that frequently occur in the identified interaction sessions by determining sequences of interaction events each having a specified number of multiple interaction events such that the determined sequences are each present in at least a first threshold number of the identified interaction sessions; generating candidate sequences of interaction events of varying lengths, each generated candidate sequence such that each subsequence of the candidate sequence that is of a length that is the specified number is one of the determined sequences, by adding at least one interaction event at the end of each of the determined sequences and by repeatedly generating additional sequences by adding at least one interaction event at the end of previously generated sequences; identifying the generated candidate sequences that are present in at least a second threshold number of the identified interaction sessions as being sequences of interaction events that frequently occur;
wherein the generating of the candidate sequences of interaction events includes creating a multi-level data structure having a root element and a plurality of other elements, a first of the levels of the data structure having elements that are children elements of the root element, each of the other levels having elements that are children elements of elements of a previous level, and each of the other elements representing one of the interaction events and being associated with one of the generated candidate sequences; andwherein each element other than the root element has an associated sequential path of elements between the root element and that element, a first element in each sequential path being a child element of the root element, each element in each sequential path other than the first element being a child element of the previous element in the sequential path, and wherein the sequence of interaction events that is associated with each node other than the root node consists of a sequence of the interaction events represented by the elements in the path associated with that node followed by the interaction event represented by that node. - View Dependent Claims (39)
-
-
40. A computing device for identifying frequent sequences of interaction events, comprising:
-
an interaction event supplier component capable of identifying multiple interaction sessions each indicating an ordered series of related interaction events from at least one interaction log; and a frequent sequence analyzer component capable of identifying sequences of interaction events that frequently occur in the identified interaction sessions by determining sequences of interaction events each having a specified number of multiple interaction events such that the determined sequences are each present in at least a first threshold number of the identified interaction sessions; generating candidate sequences of interaction events of varying lengths, each generated candidate sequence such that each subsequence of the candidate sequence that is of a length that is the specified number is one of the determined sequences, by adding at least one interaction event at the end of each of the determined sequences and by repeatedly generating additional sequences by adding at least one interaction event at the end of previously generated sequences; and identifying the generated candidate sequences that are present in at least a second threshold number of the identified interaction sessions as being sequences of interaction events that frequently occur. - View Dependent Claims (41)
-
-
42. A data structure stored in a memory, to be accessed by an application program being executed on a data processing system for use in identifying frequent sequences of interaction events, the data structure containing a root element and a plurality of other elements, a first level of the data structure having elements that are children elements of the root element, each of multiple other levels having elements that are children elements of elements of a previous level, each of the plurality of other elements representing one of the interaction events and being associated with a candidate sequence, so that after the elements are scored while validating the data structure against groups of related interaction events, the candidate sequences that are associated with elements whose score is above a threshold can be selected as the identified frequent sequences of interaction events;
- wherein each element other than the root element has an associated sequential path of elements between the root element and that element, a first element in each sequential path being a child element of the root element, each element in each sequential path other than the first element being a child element of the previous element in the sequential path, and wherein the candidate sequence of interaction events that is associated with each node other than the root node consists of a sequence of the interaction events represented by the elements in the path associated with that node followed by the interaction event represented by that node; and
for each interaction event represented by a child element of the root element, a linked list data structure that is associated with that interaction event such that the linked list data structure includes entries for at least one of the other elements, so that as the data structure is validated against one of the groups of interaction events by selecting in order each interaction event in the group, if the selected interaction event is one of the interaction events having an associated linked list data structure, the score of each of the elements having entries in the associated linked list data structure can be incremented. - View Dependent Claims (43)
- wherein each element other than the root element has an associated sequential path of elements between the root element and that element, a first element in each sequential path being a child element of the root element, each element in each sequential path other than the first element being a child element of the previous element in the sequential path, and wherein the candidate sequence of interaction events that is associated with each node other than the root node consists of a sequence of the interaction events represented by the elements in the path associated with that node followed by the interaction event represented by that node; and
Specification