Data processing—ArcGIS Data Pipelines

ArcGIS Data Pipelines performs batch processing on stored vector and tabular data such as data in a feature layer, or a cloud or object store such as Amazon S3 and Google BigQuery. Data Pipelines provides data preparation and engineering capabilities so you can blend and build data and integrate it into ArcGIS. The processing that can be performed uses tools grouped into the following categories as toolsets:

Clean—Clean the data. For example, you can remove unnecessary fields. You can also modify the fields or fill missing values.
Construct—Create fields that are derived from existing fields or properties of the layer. For example, you can add and calculate a new field; standardize, transform, or reclassify an existing field; and add a field based on the input layer's geometry.
Format—Change the format of the fields or reorganize the fields in the table or feature class. For example, you can convert time fields, encode categorical fields, or reduce the dimensions of existing fields.
Integrate—Integrate or add data from another data source to the input table or feature class. For example, you can join fields or add fields by enriching the data.
Output datasets—Choose the output type to write and store the result.

Examples

The following are example scenarios in which Data Pipelines can be used:

As a data scientist, you can combine disparate datasets and calculate variables as fields using ArcGIS Arcade functions.
As a GIS analyst, you can build and share reproducible data preparation workflows.
As an environmental scientist, you can combine and standardize field information that is stored as a collection of .csv files.

Tools

The tables in the sections below describe the tools in the various categories in the Data Pipelines editor.

Clean

The following tools are in the Clean category:


Tool	Description
Clip	The Clip tool extracts input records that overlay the clip records.
Filter by attribute	The Filter by attribute tool returns a subset of a dataset based on a query. The output is a new dataset containing only the records that meet the condition specified in the query.
Filter by extent	The Filter by extent tool returns a subset of a dataset based on a specified spatial extent. The output is a new dataset containing only the records that are geographically within the specified extent.
Remove duplicates	The Removes duplicate tool removes duplicate records based on one or more key fields. The output is a new dataset with no duplicate records.
Select fields	The Select fields tool maintains one or more specified fields in the output dataset. The output is a new dataset containing only the specified fields.
Simplify geometry	The Simplify geometry tool simplifies the complexity of polylines or polygons by removing unnecessary vertices and maintaining only the most critical vertices.

Construct

The following tools are in the Construct category:


Tool	Description
Calculate field	The Calculate field tool calculates field values for a new or existing field. You can use Arcade functions to define the calculation expression.
Create date time	The Create date time tool creates a date field using existing field values.
Create geometry	The Create geometry tool creates a geometry field using one or more fields.

Format

The following tools are in the Format category:


Tool	Description
Map fields	The Map fields tool transforms a dataset's schema by matching it to a target schema.
Pivot	The Pivot tool converts a long dataset to a wide dataset by using distinct values from an existing field to create new fields.
Project geometry	The Project geometry tool projects a geometry field to a new spatial reference.
Unnest field	The Unnest field tool returns values stored in array, map, or struct fields as new fields or rows.
Update fields	The Update fields tool updates a field name or field type.

Integrate

The following tools are in the Integrate category:


Tool	Description
Dissolve	The Dissolve tool finds polygons or polylines that overlap or share a common boundary or share common attributes, and merges them to form a single polygon or polyline.
Join	The Join tool joins datasets based on the specified relationships. Datasets can be joined using matching attributes, spatial relationships, temporal relationships, or any combination of the three.
Merge	The Merge tool combines one or more datasets into a single, new dataset. You can combine point, line, polygon, or tabular datasets.
Summarize attributes	The Summarize attributes tool aggregates records and calculates statistics. You can aggregate all records, or you can aggregate based on matching values from one or more fields.

Outputs

The following output type is supported:


Tool	Description
Feature layer	The Feature layer output writes data pipeline datasets to a hosted feature layer or hosted table. You can create a feature layer or table, replace the data in an existing feature layer or table, or add and update records in an existing feature layer or table.

Feedback on this topic?