Perform near real-time analysis

One of the most commons reasons to use recurring big data analytics is to perform processing in near real-time. For example, a big data analytic configured to run every few minutes or hours which processes only the most recent features written and stored in a feature layer.

As another example, consider a real-time analytic configured to receive data from a feed that collects vehicle location updates every 10 seconds. This real-time analytic writes event data to a Feature Layer (new) output and calculates a date field (such as process_timestamp) with the time in which an event was processed using the Arcade Date() function. To complement this real-time analytic, a scheduled recurring big data analytic can be configured that uses the output of the real-time analytic as its data source. In this recurring big data analytic, a Feature Layer source would be configured to ingest the feature layer output created by the real-time analytic. When configuring the feature layer source, on the Filter Data tab, a where clause can be specified which leverages a date field of when the observation was processed as well as the global variables of the analytic to only load in the most recent features since the last time the big data analytic was run.

This where clause expression would look like process_timestamp >= $analytic.AnalyticLastScheduledStartTime and process_timestamp < $analytic.AnalyticScheduledStartTime, assuming that the name of the date field created in the real-time analytic was process_timestamp.

Configure feature layer to load a subset of features based off a processed date field and global variables

The big data analytic would be configured to run at the desired repeat interval such as every 5 minutes. Using the global analytics specified above, only the most recent features that had not yet been processed would be analyzed by the big data analytic.