Configure input data

ArcGIS Velocity ingests data for real-time and big data analytics using feeds or data sources. A feed is a real-time stream of data; a data source loads static or near real-time data once when the real-time analytic starts, making it available for rapid joins, enrichment, and geofencing. For more information, see Work with feeds and Work with data sources.

A feed should be used to leverage real-time data or used as join data for analytic tools in real-time analytics.

Velocity provides a streamlined and contextual workflow to optimize the user experience when configuring input data from a feed or data source. This configuration workflow is common across the various feed and source types.

Set connection and configuration parameters

The first step when configuring a feed or data source is to define the required connection and configuration parameters so that Velocity can connect to the data. The available parameters depend on the feed or data source type.

For example, when configuring a Kafka feed, fill in the Broker and Topic parameters to connect to the data. When configuring an Amazon S3 data source, you must provide all relevant connection parameter values to establish a successful connection.

configuration parameters

Next, Velocity validates the connection using the configuration parameters provided. Then Velocity attempts to sample the data and derive the schema of the data. If the connection is not successful and data is not successfully derived, update the configuration parameters accordingly and try again.

Confirm schema

The Confirm Schema step displays the returned schema as well as a sample of the data. Depending on the data format, additional parameters are available to adjust data parsing to a valid schema.

For the Confirm Schema step, you can review and adjust the field names, field types, and data formats. Additionally, you can derive the data again to acquire new samples or derive schema after adjustments to the data format or the data format parameters. This ensures that Velocity can identify the format of the data being ingested by the feed or data source.

HTTP Poller Confirm Schema step

Automatic sampling and schema derivation

For the Confirm Schema step, Velocity connects to the specified feed or data source using the connection and configuration parameters you set in the previous step and retrieves sample data.

From the sample data, Velocity automatically derives the data format and the schema, which consists of the field names and field types. For some data formats, geometry and date and time key fields are also identified.

Change field types and field names

Velocity displays field types and field names as identified by schema derivation based on the acquired data sample.

You can make the following adjustments to the derived schema:

  • Change field types
    • Use the drop-down arrow next to the field name to change the field type.
    • You cannot change field types when using certain feed or data source types such as feature layer feeds or data sources.
    • Use caution when changing the field type because of the following:
      • Any field type can be changed to a string type field; however, if you attempt to change a string type field containing letters to an integer type field, an error occurs during data ingestion.
      • Changing fields from a float type (Float32 or Float64) to an integer type (Int32 or Int64) is not recommended. Changing field types is not intended for on-the-fly conversion of numerical values. For some formats, downgrading from a float to an integer can cause the value to be skipped entirely.
  • Change field names
    • Modify the field name as necessary.
  • Disable fields
    • To disable a field, uncheck the check box next to the field type. The field will be ignored when data is ingested from the source.
    • As a best practice, disable any unnecessary fields for velocity and volume performance considerations.

Note:

Modify the data format parameters and resample the schema before adjusting the field types and field names. If the data format or data format parameters are changed and schema derivation is required, any changes you made will be overwritten.

Change the data format and data format parameters

Velocity can consume data from various feed and data source types in a variety of data formats. Some feed and data source types such as HTTP Poller can consume data in various formats. Other feed and data source types such as Feature Layer have a fixed data format.

The following are supported data formats:

  • Delimited
  • JSON
  • GeoJSON
  • EsriJSON
  • RSS
  • GeoRSS
  • Shapefile (big data analytics only)
  • Parquet (big data analytics only)

Velocity automatically attempts to derive the format of the data. However, you can change the derived data format as necessary.

Additionally, some data formats have parameters that you can adjust regarding how Velocity parses the data into a schema. For example, the delimited data format has two parameters, field delimiter and header row.

For details on the various formats and parameters associated with each data format, see Supported data formats.

Change data format parameters and derive schema

Using the derived data sample, Velocity attempts to define the format, schema, and parameters of the data.

You can modify the data format parameters or specify a different data format. To do this, change the data format property and click Derive schema to derive the data again according to the changes you made. The parameters update accordingly based on the derived data.

For example, if you're connecting to a JSON source with multilevel nested JSON and you only want to collect data from a specific JSON node, or you want to flatten multilevel JSON to retrieve all attribute values, you can use the root node and flatten parameters to configure Velocity to interact with the JSON data directly.

Sampled data is not returned

If sampled data is not returned in Velocity, try any of the following options:

  • Verify that the connection and configuration parameters are correct.
  • Click Derive schema to resample when data is flowing or available.
  • Provide your own samples by copying records. Samples can be reviewed for their data format and to derive a valid schema.
  • Manually define the format and schema of the data.

Identify key fields

The next step in configuring input data for the new feed or data source is to identify key fields. Key fields are used to parse feature geometry from fields, construct dates from strings, specify start and end time fields, and designate a field as a Track ID.

Location

For many feed and data source types, you must define how Velocity determines the geometry of features from observations or records. Geometry can be defined using a single geometry field or X/Y fields. Alternatively, you can load tabular data without location and not specify geometry fields.

For details on configuring the location parameters, see Location parameters.

Date and time

The features in a feed or data source may have date and time fields available. If you specify that the data has date fields, you may also need to specify the date format. The two options are Epoch Values and Other (String). If you choose Other (String), you must specify a Date Formatting string value so Velocity can parse the string into a date.

Additionally, you can choose a Start Time key field value. You do not need to set a start time or end time to analyze and process data. However, some tools in real-time and big data analytics require a start time or a start time and an end time to be identified to perform temporal analysis.

For details about the configuration of the date and time parameters, see Date and time parameters.

Tracking

The Track ID key field is a unique identifier in the data that relates features to specific entities. For example, a truck might be identified by its license plate number or an aircraft by an assigned flight number. These identifiers can be used as Track IDs to track the features associated with a particular real-world entity or a set of incidents.

You do not need to set a Track ID field to analyze and process data. However, some tools in real-time and big data analytics require a Track ID to be identified for the feed or data source.

Schedule polling interval

While many feeds stream data, some feed types require the data to be retrieved at regular intervals. The defined interval determines how often the feed connects to the source to retrieve data. You can set a polling interval for the following feed types:

For details on the configuration and considerations of the feed polling interval, see Schedule a feed polling interval.

Save

The final step is to provide a feed name and, optionally, a feed summary; then save the feed.