Skip To Content

Dataset configuration

ArcGIS Data Pipelines can work with data from a variety of external data sources, as well as data from ArcGIS Online. The types of data that can be used are grouped into categories:

  • File—Upload files to your content and use them as input datasets, or connect to files using a URL to a public dataset.
  • Cloud storage—Connect to and read from external cloud storage.
  • Database—Connect to and read from external cloud databases.
  • ArcGIS—Read from feature layers or hosted tables that are available to you in ArcGIS Online.

Inputs

To add data to your canvas, select the data source type in the Inputs section, and fill in the parameters. Input parameters are specific to the type of source you are connecting to; see the specific input dataset topic for more information.

The following tables describe the input datasets in the various categories in the Data Pipelines editor.

File

The following input types are in the File category:

Input typeDescription

File

Use records in files as input to ArcGIS Data Pipelines.

Public URL

Use records from a public URL as input to ArcGIS Data Pipelines.

Cloud storage

The following input types are in the Cloud storage category:

Input typeDescription

Amazon S3

Use records from files stored in an Amazon S3 bucket as input to ArcGIS Data Pipelines.

Microsoft Azure Storage

Use records from files stored in a Microsoft Azure Storage container as input to ArcGIS Data Pipelines.

Database

The following input types are in the Database category:

Input typeDescription

Google BigQuery

Use records from a Google BigQuery table as input to ArcGIS Data Pipelines.

Snowflake

Use records from a Snowflake table as input to ArcGIS Data Pipelines.

ArcGIS

The following input types are in the ArcGIS category:

Input typeDescription

Feature layer

Use records in hosted feature layers or tables as input to ArcGIS Data Pipelines.

Additional options

In addition to the specific parameters for each input dataset type, all types support the Use caching parameter. Use this parameter to store a copy of the dataset. The cached copy is only maintained while at least one browser tab open to the editor is connected. This may improve the performance when interactively previewing and processing data. If the source data has been updated since it was cached, turn off the Use caching option and preview or run the process again. When all browser tabs open to the editor are inactive, the cached copy will no longer exist.


In this topic
  1. Inputs
  2. Additional options