Big data analytics are used to process a variety of data sources to perform certain procedures or analyses. This processing generates output datasets and informational products that may need to be kept up to date to ensure accuracy for those who depend on the results.
As data from input data sources changes over time, and as new observations are made and new features or values are stored, big data analytic processing must be repeated to generate results for the latest set of data. These results can either replace prior outputs or be appended to existing outputs to establish a representation of this analysis performed over time.
By scheduling a big data analytic to run periodically or at a recurring time, you can ensure that the analytic is run at the appropriate frequency or interval to generate up-to-date outputs and informational products for use in your organization.
Consider the following examples:
- A transportation organization wants to generate a daily or weekly email report that indicates the total mileage driven by each of its vehicles or employees over that period of time
- An environmental group wants to calculate statistics on one or multiple attributes from sensor readings across a region once a week to understand how environmental patterns change over time or change depending on conditions.