ArcGIS Data Pipelines can work with data from a variety of external data sources, as well as data from ArcGIS Online. The types of data that can be used are grouped into categories:
- File—Upload files to your content and use them as input datasets, or connect to files using a URL to a public dataset.
- Cloud storage—Connect to and read from external cloud storage.
- Database—Connect to and read from external cloud databases.
- ArcGIS—Read from feature layers or hosted tables that are available to you in ArcGIS Online.
Inputs
To add data to your canvas, select the data source type in the Inputs section, and fill in the parameters. Input parameters are specific to the type of source you are connecting to; see the specific input dataset topic for more information.
The following tables describe the input datasets in the various categories in the Data Pipelines editor.
File
The following input types are in the File category:
Input type | Description |
---|---|
Use records in files as input to ArcGIS Data Pipelines. | |
Use records from a public URL as input to ArcGIS Data Pipelines. |
Cloud storage
The following input types are in the Cloud storage category:
Input type | Description |
---|---|
Use records from files stored in an Amazon S3 bucket as input to ArcGIS Data Pipelines. | |
Use records from files stored in a Microsoft Azure Storage container as input to ArcGIS Data Pipelines. |
Database
The following input types are in the Database category:
Input type | Description |
---|---|
Use records from a Google BigQuery table as input to ArcGIS Data Pipelines. | |
Use records from a Snowflake table as input to ArcGIS Data Pipelines. |
ArcGIS
The following input types are in the ArcGIS category:
Input type | Description |
---|---|
Use records in hosted feature layers or tables as input to ArcGIS Data Pipelines. |
Additional options
In addition to the specific parameters for each input dataset type, all types support the Use caching parameter. Use this parameter to store a copy of the dataset. The cached copy is only maintained while at least one browser tab open to the editor is connected. This may improve the performance when interactively previewing and processing data. If the source data has been updated since it was cached, turn off the Use caching option and preview or run the process again. When all browser tabs open to the editor are inactive, the cached copy will no longer exist.