Introduction to data retention

When you store output features in a feature layer, ArcGIS Velocity manages data according to a set of data retention policies. Data retention generally refers to the length of time that data is actively maintained in the feature layer.

For more information about writing data to a new feature layer in Velocity, see Feature Layer (new). For writing to an existing feature layer, see Feature Layer (existing).

Purpose of data retention

Using data retention, feature layers can be maintained at a given size, even as real-time data streams continuously adding features. This ensures that the underlying dataset does not grow indefinitely, especially as older data becomes less relevant for understanding trends and viewing the latest activity.

Data retention is not intended to be used for limiting the features available to specific time frames. Data retention ensures that data is retained in the feature layer for at least the specified period. At any given time, there may be data older than the specified period, as the data removal process runs on a periodic schedule. To ensure that maps display a specified time period of data, the best practice is to query data accordingly in client applications.

Data retention process

When you define an output feature layer in a real-time or big data analytic, you can specify the data retention period to apply to that feature layer. For example, you may want to keep weather data for the past day but maintain a history of the fleet or vehicle positions for up to six months. You can also export older data to a feature layer archive (cold store) that can be accessed when you need to run analysis on the historical data.

Data retention options for output feature layers

When a data retention period is set for a feature layer on a regular basis, features older than the specified time period are removed from the underlying dataset. If you export the data, these features are exported to the feature layer archive (cold store) before they are removed. For data retention, feature age is based on the time stamp of when the data was created in the underlying dataset, which may or may not be the same as the start time of the feature. Data retention is performed based on creation time to apply a consistent approach across all datasets, including those that may represent interval data or do not have date or time information in the feature record.

Note:

If you choose the Do not export data option for the Data export (feature layer archive) parameter, data that is removed cannot be recovered.

Data retention is only required when you are storing data that will accumulate in size over time. This is evaluated based on the Data storage method settings and how you preserve data between analytic runs.

Data storage options for output feature layers

For example, if you choose the Add new features option (as opposed to only keeping the latest feature) and you choose the Keep existing features and schema option if the analytic is restarted, the incoming data will grow over time and a data retention period is required.

If, however, you choose the Keep latest feature option, you are only storing the latest observation of each track. This data may grow as new sensors are deployed in your organization, but it generally stabilizes at a maximum size. In this case, a data retention period is not required and you can choose the No purge option. Feature layers created with the No purge option retain data indefinitely.

Data storage and retention options for Keep latest feature

Archive data (feature layer archive retention)

When a data retention period is required for a feature layer, you can export older data to a feature layer archive (cold store). When this option is enabled, data older than the retention period is exported in Parquet data format to an archive that is maintained by Velocity. Data in the archive is maintained for a maximum of one year following the date it was exported, or up to the overall maximum size of the feature archive (whichever is less).

For example, if you choose the 1 year data retention period, and choose to export the older data to the archive, Velocity maintains up to two years of data. If you choose the 1 month data retention period, and choose to export the older data to the archive, Velocity maintains up to one month and one year of data.

Data retention export options for output feature layers

Data that is exported to the archive is not displayed in the feature layer. To work with features exported to the archive, import them using the Feature Layer (archive) data source type in a real-time or big data analytic. You can then use the Merge Layers tool to merge the data from the feature layer and feature layer (archive) to a single pipeline for additional analysis.

Additionally, you can also export older data to your own cloud stores such as Amazon S3 or Azure Blob Storage if you need to retain the data indefinitely.