Method and system for collecting and analyzing time-series data

US 9,037,698 B1
Filed: 03/14/2006
Issued: 05/19/2015
Est. Priority Date: 03/14/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented data collection and analysis method comprising:

receiving information from a user computer concerning at least one of a desired data analysis datapoint or a desired data index datapoint, wherein a datapoint comprises a datakey which provides information to allow the datapoint to be properly routed, a data value which provides data to be processed and a data interval which provides a time interval associated with the datapoint, the information being received at a host computer, the host computer being one of a plurality of host computers configured to collect and analyze data received from a plurality of source computers;

adding the information concerning at least one of the desired data analysis datapoint or the desired data index datapoint to a data structure, the data structure representing a list of at least one of data analysis datapoints or data index datapoints by the plurality of host computers;

adding additional information to the data structure concerning at least one of intermediate data analysis datapoints or intermediate data index datapoints, at least one of the intermediate data analysis datapoints or the intermediate data index datapoints being precursor inputs needed to generate at least one of the desired data analysis datapoints or the desired data index datapoints, the additional information being added based on the information received from the user computer and based on stored information; and

maintaining at least partially consistent copies of the data structure across the plurality of host computers, wherein the data structure stores information indicating how routing of at least one of the intermediate data analysis datapoints or the intermediate data index datapoints should be performed between the plurality of host computers and wherein the routing of at least one of the intermediate data analysis datapoints or the intermediate data index datapoints is performed by a datapoint router associated with each copy of the data structure.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented data processing method comprises receiving information from a user computer concerning a desired output to be generated, adding the information concerning the desired output to be generated to a data structure, and adding additional information to the data structure concerning intermediate outputs to be generated. The information concerning the desired output to be generated is received at a host computer. The host computer is one of a plurality of host computers configured to collect and analyze data received from a plurality of source computers. The data structure represents a list of outputs to be generated by the plurality of host computers. The intermediate outputs are precursor inputs needed to generate the desired output. The additional information is added to the data structure based on the information received from the user computer and based on stored information.

Citations

43 Claims

1. A computer-implemented data collection and analysis method comprising:
- receiving information from a user computer concerning at least one of a desired data analysis datapoint or a desired data index datapoint, wherein a datapoint comprises a datakey which provides information to allow the datapoint to be properly routed, a data value which provides data to be processed and a data interval which provides a time interval associated with the datapoint, the information being received at a host computer, the host computer being one of a plurality of host computers configured to collect and analyze data received from a plurality of source computers;
  
  adding the information concerning at least one of the desired data analysis datapoint or the desired data index datapoint to a data structure, the data structure representing a list of at least one of data analysis datapoints or data index datapoints by the plurality of host computers;
  
  adding additional information to the data structure concerning at least one of intermediate data analysis datapoints or intermediate data index datapoints, at least one of the intermediate data analysis datapoints or the intermediate data index datapoints being precursor inputs needed to generate at least one of the desired data analysis datapoints or the desired data index datapoints, the additional information being added based on the information received from the user computer and based on stored information; and
  
  maintaining at least partially consistent copies of the data structure across the plurality of host computers, wherein the data structure stores information indicating how routing of at least one of the intermediate data analysis datapoints or the intermediate data index datapoints should be performed between the plurality of host computers and wherein the routing of at least one of the intermediate data analysis datapoints or the intermediate data index datapoints is performed by a datapoint router associated with each copy of the data structure.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. A method as defined in claim 1, wherein the information is added to the data structure in the form of a calculation descriptor, and wherein the additional information is added to the data structure in the form of a plurality of additional calculation descriptors.
  - 3. A method as defined in claim 2, wherein the list of at least one of data analysis datapoints or data index datapoints is represented as a list of calculation descriptors.
  - 4. A method as defined in claim 2, further comprising initiating processes on the plurality of host computers, wherein individual processes are initiated responsive to the addition of a calculation descriptor in the data structure.
  - 5. A method as defined in claim 1, further comprising providing users with a library of generic pre-programmed process functions, the process functions being selectable to specify information concerning at least one of the desired data analysis datapoint or the desired data index datapoint.
  - 6. A method as defined in claim 5, further comprisingreceiving a user selection of one of the preprogrammed process functions, andprompting the user to provide additional information useable to customize the process function for use in producing at least one of the desired data analysis datapoint or the desired data index datapoint.
  - 7. A method as defined in claim 6, wherein the additional information comprises information concerning the precursor inputs to be used by the process function.
  - 8. A method as defined in claim 6, wherein the additional information comprises information concerning the type of data analysis datapoint or data index datapoint to be generated.
  - 9. A method as defined in claim 5, further comprising providing users with the ability to program custom process functions by modifying the pre-programmed process functions.
  - 10. A method as defined in claim 5, further comprising providing users with the ability to program custom process functions by programming new process functions without reference to the pre-programmed process functions.
  - 11. A method as defined in claim 1, further comprising providing a graphical user interface accessible to users to specify at least one of the desired data analysis datapoints or the desired data index datapoints.
  - 12. A method as defined in claim 11, wherein the information is added to the data structure in the form of a calculation descriptor, and wherein the graphical user interface is configured to provide an interface to receive information useable to create calculation descriptors.
  - 13. A method as defined in claim 12, wherein the interface is a web-based interface.
  - 14. A method as defined in claim 1, further comprising parallelizing the generation of at least one of the desired data analysis datapoint or the desired data index datapoint specified by the information received from the user computer, including decomposing the generation of at least one of the desired data analysis datapoint or the desired data index datapoint into constituent components and allocating the constituent components across the plurality of host computers.
  - 15. A method as defined in claim 13, wherein a plurality of partitions are allocated across the plurality of hosts, and wherein parallelizing includes performing a computations on data elements of the data received from the plurality of data source computers, the computations generating partition numbers for routing the data to one or more of the plurality of partitions.

16. A system for collecting and analyzing time-series data from a plurality of data source computers, comprising:
- a data structure comprising a list of calculation descriptors inserted by a plurality of user computers, the calculation descriptors describing at least one of desired data analysis datapoints or desired data index datapoints of the system, wherein a datapoint comprises a datakey which provides information to allow the datapoint to be properly routed, a data value which provides data to be processed and a data interval which provides a time interval associated with the datapoint; and
  
  a plurality of computer-implemented nodes having access to the data structure, the data structure comprising a plurality of copies maintained at each of the plurality of computer-implemented nodes, consistency of the copies of the data structure being maintained through a gossip protocol, the plurality of computer-implemented nodes comprising a plurality of processes, the plurality of processes being configured to generate at least one of the desired data analysis datapoints or the desired data index datapoints, and the processes being configured to insert additional calculation descriptors in the data structure to prompt the generation of additional at least one of data analysis datapoints or desired data index datapoints needed to generate at least one of the desired data analysis datapoints or the desired data index datapoints described by the calculation descriptors inserted by the plurality of user computers.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
- - 17. A system as defined in claim 16, wherein the processes comprise time-series processes, wherein at least one of the desired data analysis datapoints or the desired data index datapoints is generated by the time-series processes, and wherein the time-series processes are created based on the calculation descriptors listed in the data structure.
  - 18. A system as defined in claim 17, wherein the time-series processes have a one-to-one correspondence with the calculation descriptors.
  - 19. A system as defined in claim 18, wherein the additional calculation descriptors are inserted by the time-series processes, and wherein the insertion of the additional calculation descriptors prompts corresponding additional time-series processes to be created.
  - 20. A system as defined in claim 19, wherein the additional time-series processes are created until time-series processes exist to generate each of at least one of the additional data analysis datapoints or additional data index datapoints needed to generate at least one of the desired data analysis datapoints or the desired data index datapoints.
  - 21. A system as defined in claim 16, wherein a copy of the data structure is maintained at each of the plurality of nodes.
  - 22. A system as defined in claim 21, wherein consistency of the copies of the data structure is at least partially maintained.
  - 23. A system as defined in claim 16, wherein the number of data source computers is in excess of one thousand.
  - 24. A system as defined in claim 16, wherein the number of data source computers is in excess of five thousand.

25. A system for collecting and analyzing time-series data from a plurality of data source computers, comprising:
- a calculation table comprising a list of calculation descriptors inserted by a plurality of user computers, the calculation descriptors describing at least one of desired data analysis datapoints or desired data index datapoints of the system, wherein a datapoint comprises a datakey which provides information to allow the datapoint to be properly routed, a data value which provides data to be processed and a data interval;
  
  a plurality of computer-implemented nodes having access to the calculation table, wherein the calculation table comprises a plurality of copies maintained at the plurality of computer-implemented nodes and wherein consistency of the copies of the calculation table is maintained through a gossip protocol; and
  
  a plurality of computer-implemented partitions owned by the plurality of computer-implemented nodes, the plurality of computer-implemented partitions comprising time-series processes that process the time-series data;
  
  wherein the time-series processes comprise first time-series processes configured to generate at least one of the desired data analysis datapoints or desired data index datapoints described by the calculation descriptors inserted by the plurality of user computers;
  
  wherein the first time-series processes have precursor inputs;
  
  wherein the first time-series processes are configured to insert additional calculation descriptors in the calculation table; and
  
  wherein the time-series processes comprise additional time-series processes created in response to the insertion of the additional calculation descriptors in the calculation table by the first time-series processes, the additional time-series processes being configured to generate the precursor inputs for the first time-series processes based on the additional calculation descriptors.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 26. A system as defined in claim 25, wherein the time-series processes have a one-to-one correspondence with the calculation descriptors.
  - 27. A system as defined in claim 25, wherein the calculation table is accessible to the plurality of nodes by way of the plurality of copies, such that each node accesses one or more local copies of the calculation table.
  - 28. A system as defined in claim 25 wherein, for each of the plurality of calculation descriptors, information is stored indicating what data is needed to produce at least one of the desired data analysis datapoint or data index datapoint specified by the calculation descriptor.
  - 29. A system as defined in claim 25, wherein the calculation table stores information indicating how data routing should be performed between the plurality of partitions.
  - 30. A system as defined in claim 29, further comprising a datapoint router associated with the calculation table.
  - 31. A system as defined in claim 25, wherein users may specify processing to be performed by the plurality of nodes by providing information useable to generate calculation descriptors for insertion into the calculation table.
  - 32. A system as defined in claim 25, further comprising a stored library of generic pre-programmed process functions selectable by users to specify information for creation of the calculation descriptors.
  - 33. A system as defined in claim 32, further comprising interface logic, and wherein the interface logic is configured to prompt the user to provide additional information used to customize the time-series process function for use in generating at least one of the desired data analysis datapoint or data index datapoint, the user being prompted upon selecting one of the preprogrammed process functions.
  - 34. A system as defined in claim 33, wherein the additional information comprises information concerning data inputs to be used by the process function.
  - 35. A system as defined in claim 33, wherein the additional information comprises information concerning the type of data analysis datapoint or data index datapoint to be generated.
  - 36. A system as defined in claim 32, wherein the interface logic provides the user with the ability to program custom process functions by modifying existing process functions in the library of generic pre-programmed process functions.
  - 37. A system as defined in claim 25, further comprising logic configured to provide a graphical user interface, the graphical user interface being configured to receive information useable to create calculation descriptors.

38. A non-transitory computer readable medium having computer executable instructions that direct a computing system comprising a plurality of host computers to:
- receive information from a user computer concerning at least one of a desired data analysis datapoint or a desired data index datapoint, wherein a datapoint comprises a datakey which provides information to allow the datapoint to be properly routed, a data value which provides data to be processed and a data interval, the information being received at a host computer, the host computer being one of the plurality of host computers configured to collect and analyze data received from a plurality of source computers;
  
  add the information concerning at least one of the desired data analysis datapoint or the desired data index datapoint to a data structure, the data structure representing a list of at least one of data analysis datapoints or data index datapoints by the plurality of host computers;
  
  add additional information to the data structure concerning at least one of intermediate data analysis datapoints or intermediate data index datapoints, at least one of the intermediate data analysis datapoints or the intermediate data index datapoints being precursor inputs needed to generate at least one of the desired data analysis datapoints or the desired data index datapoints, the additional information being added based on the information received from the user computer and based on stored information; and
  
  maintain at least partially consistent copies of the data structure across the plurality of host computers, wherein the data structure stores information indicating how routing of at least one of the intermediate data analysis datapoints or the intermediate data index datapoints should be performed between the plurality of host computers and wherein the routing of at least one of the intermediate data analysis datapoints or the intermediate data index datapoints is performed by a datapoint router associated with each copy of the data structure.
- View Dependent Claims (39, 40, 41, 42)
- - 39. A non-transitory computer readable medium as defined in claim 38, wherein the information is added to the data structure in the form of a calculation descriptor, and wherein the additional information is added to the data structure in the form of a plurality of additional calculation descriptors.
  - 40. A non-transitory computer readable medium as defined in claim 39, wherein the list of at least one of data analysis datapoints or data index datapoints is represented as a list of calculation descriptors.
  - 41. A non-transitory computer readable medium as defined in claim 39, wherein the computer executable instructions further direct the computing system to initiate processes on the plurality of host computers, wherein individual process are initiated responsive to the addition of a calculation descriptor in the data structure.
  - 42. A non-transitory computer readable medium as defined in claim 38, wherein the computer executable instructions further direct the computing system to provide users with a library of generic pre-programmed process functions, the process functions being selectable to specify information concerning at least one of the desired data analysis datapoint or desired data index datapoint.

43. A system for collecting and analyzing time-series data from a plurality of data source computers, comprising:
- a calculation table comprising a list of calculation descriptors inserted by a plurality of user computers, the calculation descriptors describing at least one of desired data analysis datapoints or desired data index datapoints of the system, wherein a datapoint comprises a datakey which provides information to allow the datapoint to be properly routed, a data value which provides data to be processed and a data interval;
  
  a plurality of computer-implemented nodes having access to the calculation table;
  
  a plurality of computer-implemented partitions owned by the plurality of computer-implemented nodes, the plurality of computer-implemented partitions comprising time-series processes that process the time-series data, wherein the calculation table stores information indicating how data routing should be performed between the plurality of computer-implemented partitions; and
  
  a datapoint router associated with the calculation table;
  
  wherein the time-series processes comprise first time-series processes configured to generate at least one of the desired data analysis datapoints or desired data index datapoints described by the calculation descriptors inserted by the plurality of user computers;
  
  wherein the first time-series processes have precursor inputs;
  
  wherein the first time-series processes are configured to insert additional calculation descriptors in the calculation table; and
  
  wherein the time-series processes comprise additional time-series processes created in response to the insertion of the additional calculation descriptors in the calculation table by the first time-series processes, the additional time-series processes being configured to generate the precursor inputs for the first time-series processes based on the additional calculation descriptors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Nordstrom, Paul G., Thompson, Aaron C.
Primary Examiner(s)
Snyder, Steven

Application Number

US11/375,067
Time in Patent Office

3,353 Days
Field of Search

710/30
US Class Current

709/224
CPC Class Codes

G06F 11/00   Error detection; Error corr...

G06F 11/1448   Management of the data invo...

G06F 11/30   Monitoring

G06F 11/3006   where the computing system ...

G06F 11/301   where the computing system ...

G06F 11/3058   Monitoring arrangements for...

G06F 11/3442   for planning or managing th...

G06F 11/3447   Performance evaluation by m...

G06F 11/3495   for systems

G06F 16/22   Indexing; Data structures t...

G06F 16/951   Indexing; Web crawling tech...

G06F 2201/835   Timestamp

G06F 2201/875   Monitoring of systems inclu...

Method and system for collecting and analyzing time-series data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for collecting and analyzing time-series data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links