Skip To Content

Dataset configuration

Data Pipelines can work with data from a variety of external data sources, as well as data from ArcGIS Online. The types of data that can used are grouped into categories:

  • File—Upload files to your content and use them as input datasets, or connect to files using a URL to a public dataset.
  • Cloud storage—Connect to and read from external cloud storage.
  • Database—Connect to and read from external cloud databases.
  • ArcGIS—Read from feature layers or hosted tables that are available to you in ArcGIS Online.

Inputs

To add data to your canvas, select the data source type in the Inputs section, and fill out the parameters. Input parameters are specific to the type of source you are connecting to; see the specific input dataset topic for more information.

The following tables describe the input datasets in the various categories in the Data Pipelines editor.

File

The following input types are in the File category:

Input typeDescription

File

Use records in files as input to Data Pipelines.

Public URL

Use records from a public URL as input to Data Pipelines.

Cloud storage

The following input types are in the Cloud storage category:

Input typeDescription

Amazon S3

Use records from files stored in an Amazon S3 bucket as input to Data Pipelines.

Microsoft Azure Blob

Use records from files stored in a Microsoft Azure Blob storage container as input to Data Pipelines.

Database

The following input types are in the Database category:

Input typeDescription

Google BigQuery

Use records from a Google BigQuery table as input to Data Pipelines.

Snowflake

Use records from a Snowflake table as input to Data Pipelines.

ArcGIS

The following input types are in the ArcGIS category:

Input typeDescription

Feature layer

Use records in feature layers as input to Data Pipelines.

Dataset options

In addition to the specific parameters for each input dataset type, all types support the following options:

  • Label—The name displayed when the input dataset element is added to the canvas.
  • Use caching—Store a copy of the dataset in the Data Pipelines session. The cached copy is only maintained while at least one session is active. If the source data has been updated since it was cached in an active session, turn off the Use caching option and preview or run the process again. When all sessions are inactive, the cached copy will no longer exist. See Sessions for more information.


In this topic
  1. Inputs
  2. Dataset options