ArcGIS Velocity ingests data into real-time and big data analytics using feeds or data sources. A feed is a real-time stream of data whereas a data source loads static or near real-time data. For more information, see What is a feed? and What is a data source?
Velocity provides a streamlined and contextual workflow designed around optimizing user experience when configuring input data in a feed or data source. This configuration workflow is common across the various feed and source types.
Configuring input data when establishing a new feed or data source involves the following:
- Set connection and configuration properties
- Confirm schema
- Identify key fields
- Schedule polling interval (only for certain feeds)
- Save (only for feeds)
Set connection and configuration properties
The first step when configuring any feed or data source is to define the required connection and configuration properties so that Velocity can connect to the data. The properties available vary depending upon the feed or data source type selected.
For example, when configuring a Kafka feed, simply enter the Broker and Topic to connect to the data. Alternatively, when configuring an Amazon S3 data source, you must enter all relevant connection properties in order to establish a successful connection.
In the next step, Velocity validates the connection using the configuration properties provided. Then Velocity attempts to sample the data and derive the schema of the data. If the connection is not successful and data is not successfully derived, update the configuration properties accordingly and try again.
The Confirm Schema step displays the returned schema as well as a sample of the data. Depending on the format of the data, additional properties are exposed to adjust data parsing into a valid schema.
On the Confirm Schema step, you can review and adjust the field names, field types, and data formats. Additionally, you can derive the data again to acquire new samples or derive schema after adjustments to the data format or data format properties. This ensures Velocity understands the format of the data being ingested by the feed or data source.
Automatic sampling and schema derivation
On the Confirm Schema step, Velocity connects to your specified feed or data source using the specified connection and configuration properties you set in the previous step and retrieves sample data.
From the sample of data, Velocity automatically derives the data format and the schema of the which consists of the field names and field types. For some data formats, geometry and datetime key fields are also identified.
Changing field types and field names
Velocity displays field types and field names as identified by schema derivation based off the acquired data sample.
From the derived schema, you can make adjustments as necessary including:
- Changing field types:
- Use the drop-down to the left of a field name to change the field type.
- Note that, you cannot change field types when using certain feed or data source types such as feature layer feeds or data sources.
- Note that, you should use caution when changing the field type, for example:
- Any field type can be changed to a String. However, you cannot change a string field containing letters to an Integer, doing so would cause an error on data ingestion.
- Changing fields from a float type (Float32 or Float64) to an integer type (Int32 or Int64) is not recommended. Changing field types is not intended for on-the-fly conversion of numerical values. For some formats, downgrading from a float to an integer can cause the value to be skipped entirely.
- Changing field names:
- Modify the field name as necessary.
- Disabling field(s):
- To disable a field, uncheck the box next to the field type. The field will be ignored when data is ingested from the source.
- As a best practice, disable any fields not needed for your workflows for velocity and volume performance considerations.
Modifying the data format properties and resampling the schema should be performed prior to adjusting the field types and field names. If the data format or data format properties are changes and schema derivation is required, any changes you have made will be overwritten.
Changing data format and data format properties
Velocity can consume data from various feed and data source types in a variety of data formats. Some feed or data source types such as Website (Poll) can consume data in various formats. Other feed or data source types such as Feature Layer have a fixed data format.
The supported data formats include:
- Shapefile (big data analytics only)
- Parquet (big data analytics only)
Velocity will automatically attempt to derive the format of the data. However, you can change the derived data format as necessary.
Additionally, some data formats have various properties available for adjusting how Velocity parses the data into a schema. For example, the delimited data format has two properties, field delimiter and header row.
For details on the different data formats and properties associated with each data format, see Supported data formats.
Changing data format properties and deriving schema
Using the derived data sample, Velocity attempt to define the format, schema, and properties of the data.
You can choose to modify data format properties or even specify a different data format. To do this, simply change any data format property as desired and then click Derive schema to derive the data again according to the changes made. The properties will update accordingly again based upon the derived data.
For example, if connecting to a JSON source with multi-level nested JSON, if you only want to ingest data from a certain JSON node, or if you wished to flatten multi-level JSON to retrieve all attribute values, you could utilize the root node and flatten properties to configure Velocity to interact with your JSON data exactly as desired.
Sampled data is not returned
If sampled data is not returned in Velocity, try any one of the following options:
- Verify your connection and configuration properties are correct.
- Click Derive schema again to resample when data is flowing or data is available.
- Provide your own samples by pasting records. Samples can be reviewed for their data format and to derive a valid schema.
- Manually define the format and schema of the data.
Identify key fields
The next step in configuring input data for your new feed or data source is to identify key fields. Key fields are utilized in order to parse feature geometry from fields, construct dates from strings, specify start and end time fields, and to designate a field as a Track ID.
For many feed and data source types, you will need to define how Velocity determines the geometry of features from observations or records. Geometry can be defined using a single geometry field or X/Y fields. Alternatively, you can simply load tabular data without location and not specify any geometry fields.
For details on configuring the Location properties, see Define location properties.
Date & Time
The features in a feed or data source may or may not have datetime fields available. If you specify that the data has date fields, you may also need to specify the date format. The two choices are Epoch Values or Other (String). If Other (String) is chosen, a Date Formatting string must be specified so Velocity knows how to parse the string into a date.
Additionally, you will have the option to choose a Start Time field. It is not required to set a start time or end time to analyze and process data. However, some tools in real-time and big data analytics require a start time or a start time and end time to be identified in order to perform temporal analysis.
For details on the configuration of the Date & Time properties, see Define date and time properties.
The Track ID key field is a unique identifier in the data that relates features to specific entities. For example, a truck might be identified by its license plate number or an aircraft by an assigned flight number. These identifiers can be used as Track ID's to track the features associated with a particular real-world entity or set of incidents.
It is not required to set a Track ID field to analyze and process data. However, some tools in real-time and big data analytics require a Track ID to be identified on the feed or data source.
Schedule polling interval
While many feeds are streaming in nature, some feed types require retrieving the data at regular intervals. The defined interval determines how often the feed will reach out to the source to retrieve data. The following feed types allow you to set a polling interval:
- Feature Layer
- Website (Poll)
For details on the configuration and important considerations of the feed polling interval, see Schedule feed polling interval.
When configuring a feed, the final step is to provide a feed name and optionally a feed summary. When complete, save the feed.