About data retention

When you store output features in a feature layer, ArcGIS Analytics for IoT manages data according to a set of data retention policies. Data retention generally refers to the length of time for which data is actively maintained in the feature layer.

For more information on writing data to a new feature layer in Analytics for IoT, see Feature layer (new). For writing to an existing feature layer, see Feature layer (existing).

Purpose of data retention

Data retention allows feature layers to be maintained at a given size even as real-time data streams continuously add new features. This ensures the underlying dataset does not grow indefinitely, especially as older data becomes less relevant for understanding trends and viewing the latest activity.

Data retention is not intended to be used for limiting the features available to specific time frames. Data retention ensures data is retained in the feature layer for at least the specified period. At any given time there could be data older than the specified period, as the data removal process runs on a periodic schedule. To ensure your maps display a specified time period of data, the best practice is to query data accordingly in client applications.

How data retention works

Whenever you define an output feature layer in a real-time or big data analytic, you can specify the data retention period to apply to that feature layer. For example, you may only want to keep weather data for the past day, but maintain a history of your fleet/vehicle positions for up to 6 months. You can also optionally export older data to a feature layer archive (cold store), which can be accessed when you need to run analysis and ask questions of the historical data.

Data retention options for output feature layers

When a data retention period is set for a feature layer, on a regular basis, features older than the specified time period are removed from the underlying dataset. If you chose to export data, these features are exported to the feature layer archive (cold store) before they are removed. For the purposes of data retention, feature age is based on the timestamp of when the data was created in the underlying dataset, which may or may not be the same as the start time of the feature. Data retention is performed based on creation-time in order to apply a consistent approach across all datasets, including those which may represent interval data or do not have date/time information in the feature record.

Note:

If you select the Do not export data option for Data Export (archiving data), data that is purged cannot be recovered.

Data retention is only required when you are storing data that will accumulate in size over time. This is evaluated based on your settings for Data Storage Method and how you choose to preserve data between analytic runs.

Data storage options for output feature layers

For example, if you choose to Add New Features (as opposed to only keeping the latest feature) and you specify to Keep existing features and schema if the analytic is restarted, then your incoming data will grow over time and a data retention period is required.

If however, you choose to only Keep Latest Feature, then you are only storing the latest observation of each track. This data may grow as new sensors are deployed in your organization, but generally will stabilize at a maximum size. In this case, a data retention period is not required and you can select the No Purge option. Feature layers created with the No Purge option will retain data indefinitely.

Archiving data (feature layer archive retention)

When a data retention period is required for a feature layer, you can optionally export older data to a feature layer archive (cold store). When enabled, data older than the retention period is exported in Parquet data format to an archive that is maintained by Analytics for IoT. Data in the archive is maintained for a maximum of 1 year following the date it was exported.

For example, if you select a One Year data retention period, and choose to export the older data to the archive, Analytics for IoT will effectively maintain two years of your data. If you select a One Month data retention period, and choose to export the older data to the archive, Analytics for IoT will effectively maintain 1 month and 1 year of your data.

Data retention export options for output feature layers

Data that is exported to the archive is not displayed in the feature layer. To work with features exported to the archive, import them using the Feature Layer (archive) data source type in a big data analytic.

You can also export older data to your own cloud stores such as Amazon S3 or Azure Blob Storage if you need to retain your data indefinitely.