The data pipelines you create in the ArcGIS Data Pipelines app are stored as items in your content. You'll use the Data Pipelines editor to create and edit data pipelines. The following sections outline the data pipeline editor and explain how to create and run a data pipeline in the editor.
Data pipeline elements
The following are the elements of a data pipeline:
- Inputs
- An input is used to load data into the data pipeline for downstream processing. There are many input source types available. For more information about sources and source types, see Dataset configuration.
- There can be multiple data sources in a single data pipeline. At least one is required in a data pipeline workflow.
- Tools
- Tools process data that has been loaded from input datasets.
- There can be multiple tools in a single data pipeline.
- Tools can be connected to each other when the output of one tool represents the input of the next tool.
- To learn more about the available tools and how to use them, see Data processing.
- Outputs
- An output defines what will be done with the results of the data pipeline.
- You can output data pipeline results to a new feature layer, replace the data in an existing feature layer, or add to and update the existing data in a feature layer.
- There can be multiple outputs in a single data pipeline.
- You can configure multiple outputs for a single tool result or input dataset. At least one is required to run a data pipeline.
- To learn more about writing results, see Feature layer.
Data pipeline workflow
The data pipeline workflow is composed of the elements outlined above: connect to existing data, perform data engineering, and write the newly prepared data. When a data pipeline is run, it generates one or more outputs. All output results are available in your content.
![Data pipeline workflow Data pipeline workflow](GUID-4925187E-A0BA-42B8-9177-B97F35752865-web.png)
Connect to the data
The first step in creating a data pipeline is to connect to the data. On the editor toolbar, under Inputs, choose the source type to connect to. For example, choose Feature layer and browse to the layer, or choose Amazon S3 and browse to the data store item representing the bucket and folder containing the dataset. To learn more about connecting to data and how to optimize read performance, see Dataset configuration.
Perform data processing
The second step is to process the input data. On the editor toolbar, under Tools, choose the process to complete on the dataset. For example, to calculate locations for CSV data and filter the locations for a specific area of interest, you can use the Create geometry and Filter by extent tools.
To specify the dataset to use as input to a tool, do one of the following:
- Draw a line by dragging the pointer from the connector of one element to the other.
- Use the input dataset parameter to identify the input dataset.
Processing the data is optional. After connecting to the dataset, you can write it out as a feature layer with no processing.
To improve the performance of the data pipeline processing, you can limit the amount of data you are working with using one or a combination of the following tools:
- Select fields—Maintain only the fields of interest. For example, you have a census dataset with fields for the years 2000 and 2010 but you are only interested in the year 2010. Select only the fields that represent 2010 values.
- Filter by attribute—Maintain a subset of records that contain certain attribute values. For example, filter an earthquakes dataset for earthquakes with a magnitude greater than 5.5.
- Filter by extent—Maintain a subset of records within a certain spatial extent. For example, filter a dataset of United States flood hazard areas to the extent of another dataset that represents a state boundary.
Preview data pipeline elements
Use preview to investigate the data at any step of the workflow. Preview includes the following methods to inspect data:
- Table preview
—Display a tabular representation of the data.
- Map preview
—Display the locations of the dataset on a map. In map preview, you can pan, zoom, and inspect attributes.
- Schema
—View the schema of the dataset.
- Messages
—Review messages returned from the preview action.
Previews show up to 8,000 data records.
When previewing date time fields, the values are shown in the time zone of your browser. When writing the values to a feature layer, they are stored in UTC.
Previewing datasets with complex geometries can consume a large amount of the available memory. If memory thresholds are exceeded, map previews may not render, or the status may change to reconnecting while it recovers. To improve preview performance, you can do the following:
- For any geometry type, consider adding a filter to the dataset using the Filter by attribute tool or the Filter by extent tool.
- For polygon geometries, consider generalizing the geometries using the Simplify geometry tool.
To write the full dataset to a feature layer, ensure that you remove the filtering or simplification tool before running the data pipeline.
Run a data pipeline
Use the Run button in the canvas action bar to run the configured processes. To run a data pipeline, at least one output feature layer element must be configured. Job results and messages can be accessed from the latest run details console. You can click a result to open the item page.
To run a data pipeline on an automated schedule, you can create a task. To learn more about creating scheduled tasks for data pipelines, see Schedule a data pipeline task.
Add notes to a data pipeline
Add notes to document your workflow. You can add a note to a specific element in the canvas or to the data pipeline generally.
To add a note to a specific element, select the element, and click the Notes button on the element action bar. Once an element note has been added, you can click the Notes button again to view or edit the note. To view all element notes, click the Notes button on the editor toolbar and select Element notes. Here, you can delete an element note, or click a note to open it in the canvas, where you can view or edit it. You can only have one note per element. Element notes are limited to 16,000 characters.
To add a note to the data pipeline generally, click the Notes button on the editor toolbar and select General notes. Here, you can create, view, edit, or delete the general note. You can only have one general note per data pipeline. General notes are limited to 16,000 characters.
When copying an element, notes are not copied with it.
When you save an existing data pipeline with notes as a new item, the notes are saved with the new item.