Introduction to data retention

When you store output features in a feature layer, ArcGIS Velocity manages data according to a set of data retention policies. Data retention generally refers to the length of time that data is actively maintained in the feature layer.

For more information about writing data to a new feature layer in Velocity, refer to Feature Layer (new). For writing to an existing feature layer, refer to Feature Layer (existing).

Purpose of data retention

Using data retention, feature layers can be maintained at a given size, even as real-time data streams continuously add features. This ensures that the underlying dataset does not grow indefinitely, especially as older data becomes less relevant for understanding trends and viewing the latest activity.

Data retention is not intended to be used for limiting the features available to specific time frames. Data retention ensures that data is retained in the feature layer for at least the specified period. At any given time, there can be data older than the specified period, as the data removal process runs on a periodic schedule. To ensure that maps display a specified time period of data, the best practice is to query data accordingly in client applications.

Data retention process

When you define an output feature layer in a real-time or big data analytic, you can specify the data retention period to apply to that feature layer. For example, you may want to keep weather data for the past day but maintain a history of the fleet or vehicle positions for up to six months. You can also export older data to a feature layer archive that can be accessed when you want to run analysis on the historical data.

Data retention options for output feature layers

When a data retention period is set for a feature layer on a regular basis, features older than the specified time period are deleted from the underlying dataset. If you export the data, these features are exported to the feature layer archive before they are deleted. For data retention, feature age is based on the timestamp of when the data was created in the underlying dataset, which may or may not be the same as the start time of the feature. Data retention is performed based on creation time to apply a consistent approach across all datasets, including those that can represent interval data or do not have date or time information in the feature record.

Note:

If you choose the Do not export data option for the Data export (feature layer archive) parameter, data that is deleted cannot be recovered.

Data retention is only required when you are storing data that accumulates in size over time. This is evaluated based on the Data storage method settings and how you preserve data between analytic runs.

Data storage options for output feature layers

For example, if you choose the Add new features option (as opposed to only keeping the latest feature) and you choose the Keep existing features and schema option, if the analytic is restarted, the incoming data grows over time. Therefore, you must specify a value for the Data retention (time period for retaining data) parameter.

If, however, you choose the Keep latest feature option, you are only storing the latest observation of each track. This data can grow as new sensors are deployed in your organization, but it generally stabilizes at a maximum size. If you choose the Keep existing features and schema option, you must specify a value for the Data retention (time period for retaining data) parameter. If you choose the Replace existing features and schema option, the Data retention (time period for retaining data) parameter does not apply.

Data storage and retention options for Keep latest feature

Archive data (feature layer archive retention)

When a data retention period is required for a feature layer, you can export older data to a feature layer archive. When this option is enabled, data older than the retention period is exported in Parquet data format to an archive that is maintained by Velocity. Data in the archive is maintained for a maximum of one year following the date it was exported, or up to the overall maximum size of the feature archive (whichever is less).

For example, if you choose the 1 year data retention period, and choose to export older data to the archive, Velocity maintains up to two years of data. If you choose the 1 month data retention period, and choose to export older data to the archive, Velocity maintains up to one month and one year of data.

Data retention export options for output feature layers

Data that is exported to the archive is not displayed in the feature layer. To work with features exported to the archive, import them using the Feature Layer (archive) source type in a big data or real-time analytic. In big data analytics, you can then use the Merge Layers tool to merge the data from the feature layer and feature layer archive to a single pipeline for additional analysis. In real-time analytics, the Feature Layer (archive) source can be used to bring in features exported to the archive as a source for real-time analytics.

Additionally, you can export older data to your own cloud stores such as Amazon S3 or Azure Blob Storage if you need to retain the data indefinitely.