ArcGIS Data Pipelines performs batch processing on stored vector and tabular data such as data in a feature layer, or a cloud or object store such as Amazon S3 and Google BigQuery. Data Pipelines provides data preparation and engineering capabilities so you can blend and build data and integrate it into ArcGIS. The processing that can be performed uses tools grouped into the following categories as toolsets:
- Clean—Clean the data. For example, you can remove unnecessary fields. You can also modify the fields or fill missing values.
- Construct—Create fields that are derived from existing fields or properties of the layer. For example, you can add and calculate a new field; standardize, transform, or reclassify an existing field; and add a field based on the input layer's geometry.
- Format—Change the format of the fields or reorganize the fields in the table or feature class. For example, you can convert time fields, encode categorical fields, or reduce the dimensions of existing fields.
- Integrate—Integrate or add data from another data source to the input table or feature class. For example, you can join fields or add fields by enriching the data.
- Output datasets—Choose the output type to write and store the result.
Examples
The following are example scenarios in which Data Pipelines can be used:
- As a data scientist, you can combine disparate datasets and calculate variables as fields using ArcGIS Arcade functions.
- As a GIS analyst, you can build and share reproducible data preparation workflows.
- As an environmental scientist, you can combine and standardize field information that is stored as a collection of .csv files.
Tools
The tables in the sections below describe the tools in the various categories in the Data Pipelines editor.
Clean
The following tools are in the Clean category:
Tool | Description |
---|---|
The Filter by attribute tool returns a subset of a dataset based on a query. The output is a new dataset containing only the records that meet the condition specified in the query. | |
The Filter by extent tool returns a subset of a dataset based on a specified spatial extent. The output is a new dataset containing only the records that are geographically within the specified extent. | |
The Removes duplicate tool removes duplicate records based on one or more key fields. The output is a new dataset with no duplicate records. | |
The Select fields tool maintains one or more specified fields in the output dataset. The output is a new dataset containing only the specified fields. | |
The Simplify geometry tool simplifies the complexity of polylines or polygons by removing unnecessary vertices and maintaining only the most critical vertices. |
Construct
The following tools are in the Construct category:
Tool | Description |
---|---|
The Calculate field tool calculates field values for a new or existing field. You can use Arcade functions to define the calculation expression. | |
The Create date time tool creates a date field using existing field values. | |
The Create geometry tool creates a geometry field using one or more fields. |
Format
The following tools are in the Format category:
Tool | Description |
---|---|
The Map fields tool transforms a dataset's schema by matching it to a target schema. | |
The Pivot tool converts a long dataset to a wide dataset by using distinct values from an existing field to create new fields. | |
The Project geometry tool projects a geometry field to a new spatial reference. | |
The Unnest field tool returns values stored in array, map, or struct fields as new fields or rows. | |
The Update fields tool updates a field name or field type. |
Integrate
The following tools are in the Integrate category:
Tool | Description |
---|---|
The Dissolve tool finds polygons or polylines that overlap or share a common boundary or share common attributes, and merges them to form a single polygon or polyline. | |
The Join tool joins datasets based on the specified relationships. Datasets can be joined using matching attributes, spatial relationships, temporal relationships, or any combination of the three. | |
The Merge tool combines one or more datasets into a single, new dataset. You can combine point, line, polygon, or tabular datasets. |
Output dataset
The following output dataset is supported:
Tool | Description |
---|---|
The Feature layer output writes data pipeline datasets to a hosted feature layer or hosted table. You can create a feature layer or table, replace the data in an existing feature layer or table, or add and update records in an existing feature layer or table. |