Real-time analytics perform processing on data ingested via a feed, analyzing each message as it is received. Real-time analytics are used especially for transforming data, geofencing, and incident detection. Analytics conclude with one or more outputs such as storing data in a feature layer or sending an email alert.
Examples of real-time analysis
- As an emergency operations manager, you track and archive the current locations of your field crews in real-time, send alerts if crew is inside a restricted zone, and calculate the distance of the field crews from their assigned base of operations.
- As a supply chain analyst at an oil and gas company, you connect to an Automatic Identification System (AIS) data stream to monitor your vessels, calculate expected arrival information, and understand when vessels are either inside or outside areas of interest.
- As an environmental scientist managing a large number of sensors, you archive observations for later processing in a big data analytic.
Components of a real-time analytic
There are four components of a real-time analytic:
- A feed is a real-time stream of data coming into ArcGIS Velocity. Feeds typically connect to external sources of observational data such as Internet of Things (IoT) platforms, message brokers, or third-party APIs. Feeds parse incoming tabular, point, polyline, or polygon data and expose it for analysis and visualization.
- A data source is used to load static or near real-time data in a big data analytic. In real-time analytics, data sources load data used in conjunction with tools that require an ancillary spatial or tabular dataset to enrich, filter, join to, or calculate distance from events.
- Data sources in a real-time analytic are only used as a secondary dataset in applicable tools such as Join Features, Filter by Geometry, Calculate Distance, and more.
- Tools process or analyze events coming in from feeds. Include none or multiple tools in a real-time analytic depending on the use case.
- Tools can be connected to each other where the output of one tool represents the input of another tool.
- Not all tools available in big data analytics are available in real-time analytics. This is because some tools such as Find Hot Spots analyze an entire set of data at once. Real-time analytics, by contrast, operate on each incoming event as it is received.
- An output defines what should be done with each event as it is processed by a real-time analytic.
- Many output options are available including storing features to a new or existing feature layer, sending an email, sending messages to Kafka or RabbitMQ, and more. For additional information, see Fundamentals of analytic outputs.
- The events received from a tool or feed can be sent to multiple outputs.
Stateless versus stateful processing
Most tools in real-time analytics function in a stateless manner, meaning they operate on each observation received and do not maintain in-memory records of any previous observations. However, several of the available tools function in a stateful manner, on tracks rather than on individual observations.
Stateful tools gather multiple consecutive observations per track to compare spatial and or attribute conditions in each track and detect changes. When an observation is received for each track, it is added to a small cache of observations for that track. This is used, for example, to detect if the track has entered or exited a geofence by comparing the most recent observation to the previous one.
The available stateful tools include the following:
Stateful tools cannot maintain an indefinite number of observations in memory, so to avoid over-consumption of memory resources, the cache for each track is periodically purged of observations that are older than a specified age.
Some of the stateful tools allow you to specify a purging duration using the Target Time Window parameter. When purging happens, observations older than the value specified in the Target Time Window parameter are purged from memory. Note that purging only affects observations in memory that were retained for purposes of stateful processing. Purging does not affect any observations sent to outputs and will not delete the data.
The Target Time Window parameter should be set to a value equal to or greater than the longest anticipated period of time between observations for any single track. For example, if vehicles report their locations every 5 minutes and you are using the Filter by Geometry tool to detect when each vehicle enters a certain area, you would set the Target Time Window value on the filter to be slightly more than 5 minutes to ensure multiple observations are received before being purged. Setting it to less than 5 minutes results in a cache containing only one observation per track, eliminating the ability to determine that a vehicle's spatial relationship to the geofence has changed from outside to inside. The Calculate Motion Statistics, Detect Incidents, Filter by Geometry, and Join Features tools all have the Target Time Window parameter.
Geofencing is a quintessential form of real-time spatial analysis in which features (often track points) are assessed against areas of interest (often polygon areas). Most commonly, point-based observations are analyzed to determine if they have entered or exited a virtual perimeter.
In several real-time and big data analytic tools, geofencing can be performed to identify certain spatial relationships that may occur between features in a target feed or data source and a set of spatial join features, or geofences. The features used as geofences must be connected to the join port of the geofencing tool. Geofences can be points, lines, or polygons. The spatial relationships available will depend on the geometry type of the input target and join data.
Real-time and big data analytic tools that support geofencing include the following:
For additional details and example use cases, see Geofencing analysis.
In several real-time analytic tools, dynamic geofencing can be performed to identify spatial relationships between features in a target feed and a set of features in another join feed (the geofences), both of which are updating in real-time or near real-time. The tool performing the geofencing uses the most recent observation of any given track ID as geofences.
- If a feed is connected to the join port, the join features (the geofences) are continuously refreshed based on the incoming features in the join feed. In this case, geofencing will be performed dynamically based on the changing features in both the target and join feeds.
- With dynamic geofencing, the Join Time Window parameter is required.
- If the join feed does not have a field tagged END_TIME, and the last known observation for a join feature is older than the specified join time window, the observations will be purged from the tool's memory and will not be included in the analysis.
- If the join feed has a field tagged END_TIME, the feature will age out of the geofence store according to the value in the field tagged as END_TIME or at the close of the join time window, whichever comes first.
Real-time analytic tools that support dynamic geofencing include the following:
The maximum size of geofences supported in real-time analytics cannot exceed 768 MB.
For additional details and example use cases, see Geofencing analysis.