Skip To Content

Data processing

ArcGIS Data Pipelines performs batch processing on stored vector and tabular data such as data in a feature layer, or a cloud or object store such as Amazon S3 and Google BigQuery. Data Pipelines provides data preparation and engineering capabilities so you can blend and build data and integrate it into ArcGIS. The processing that can be performed uses tools grouped into the following categories as toolsets:

  • Clean—Clean the data. For example, you can remove unnecessary fields. You can also modify the fields or fill missing values.
  • Construct—Create fields that are derived from existing fields or properties of the layer. For example, you can add and calculate a new field; standardize, transform, or reclassify an existing field; and add a field based on the input layer's geometry.
  • Format—Change the format of the fields or reorganize the fields in the table or feature class. For example, you can convert time fields, encode categorical fields, or reduce the dimensions of existing fields.
  • Integrate—Integrate or add data from another data source to the input table or feature class. For example, you can join fields or add fields by enriching the data.
  • Output datasets—Choose the output type to write and store the result.

Examples

The following are example scenarios in which Data Pipelines can be used:

  • As a data scientist, you can combine disparate datasets and calculate variables as fields using ArcGIS Arcade functions.
  • As a GIS analyst, you can build and share reproducible data preparation workflows.
  • As an environmental scientist, you can combine and standardize field information that is stored as a collection of .csv files.

Tools

The tables in the sections below describe the tools in the various categories in the Data Pipelines editor.

Clean

The following tools are in the Clean category:

ToolDescription

Filter by attribute

The Filter by attribute tool returns a subset of a dataset based on a query. The output is a new dataset containing only the records that meet the condition specified in the query.

Filter by extent

The Filter by extent tool returns a subset of a dataset based on a specified spatial extent. The output is a new dataset containing only the records that are geographically within the specified extent.

Remove duplicates

The Removes duplicate tool removes duplicate records based on one or more key fields. The output is a new dataset with no duplicate records.

Select fields

The Select fields tool maintains one or more specified fields in the output dataset. The output is a new dataset containing only the specified fields.

Simplify geometry

The Simplify geometry tool simplifies the complexity of polylines or polygons by removing unnecessary vertices and maintaining only the most critical vertices.

Construct

The following tools are in the Construct category:

ToolDescription

Calculate field

The Calculate field tool calculates field values for a new or existing field. You can use Arcade functions to define the calculation expression.

Create date time

The Create date time tool creates a date field using existing field values.

Create geometry

The Create geometry tool creates a geometry field using one or more fields.

Format

The following tools are in the Format category:

ToolDescription

Map fields

The Map fields tool transforms a dataset's schema by matching it to a target schema.

Project geometry

The Project geometry tool projects a geometry field to a new spatial reference.

Unnest field

The Unnest field tool returns values stored in array, map, or struct fields as new fields or rows.

Update fields

The Update fields tool updates a field name or field type.

Integrate

The following tools are in the Integrate category:

ToolDescription

Join

The Join tool joins datasets based on the specified relationships. Datasets can be joined using matching attributes, spatial relationships, temporal relationships, or any combination of the three.

Merge

The Merge tool combines one or more datasets into a single, new dataset. You can combine point, line, polygon, or tabular datasets.

Output dataset

The following output dataset is supported:

ToolDescription

Feature layer

The Feature layer output writes data pipeline datasets to a hosted feature layer or hosted table. You can create a feature layer or table, replace the data in an existing feature layer or table, or add and update records in an existing feature layer or table.


In this topic
  1. Examples
  2. Tools