Dataset configuration—ArcGIS Data Pipelines

ArcGIS Data Pipelines can work with data from a variety of external data sources, as well as data from ArcGIS Online. The types of data that can be used are grouped into categories:

File—Upload files to your content and use them as input datasets, or connect to files using a URL to a public dataset.
Cloud storage—Connect to and read from external cloud storage.
Database—Connect to and read from external cloud databases.
ArcGIS—Read from feature layers or hosted tables that are available to you in ArcGIS Online.

Inputs

To add data to your canvas, select the data source type in the Inputs section, and fill in the parameters. Input parameters are specific to the type of source you are connecting to; see the specific input dataset topic for more information.

The following tables describe the input datasets in the various categories in the Data Pipelines editor.

File

The following input types are in the File category:


Input type	Description
File	Use records in files as input to ArcGIS Data Pipelines.
Public URL	Use records from a public URL as input to ArcGIS Data Pipelines.

Cloud storage

The following input types are in the Cloud storage category:


Input type	Description
Amazon S3	Use records from files stored in an Amazon S3 bucket as input to ArcGIS Data Pipelines.
Microsoft Azure Storage	Use records from files stored in a Microsoft Azure Storage container as input to ArcGIS Data Pipelines.

Database

The following input types are in the Database category:


Input type	Description
Google BigQuery	Use records from a Google BigQuery table as input to ArcGIS Data Pipelines.
Snowflake	Use records from a Snowflake table as input to ArcGIS Data Pipelines.

ArcGIS

The following input types are in the ArcGIS category:


Input type	Description
Feature layer	Use records in hosted feature layers or tables as input to ArcGIS Data Pipelines.

Additional options

In addition to the specific parameters for each input dataset type, all types support the Use caching parameter. Use this parameter to store a copy of the dataset. The cached copy is only maintained while at least one browser tab open to the editor is connected. This may improve the performance when interactively previewing and processing data. If the source data has been updated since it was cached, turn off the Use caching option and preview or run the process again. When all browser tabs open to the editor are inactive, the cached copy will no longer exist.

Feedback on this topic?