Answers to frequently asked questions about Data Pipelines are provided.
- What is Data Pipelines?
- This release of Data Pipelines is in beta. What does that mean?
- Does Data Pipelines charge credits?
- Is Data Pipelines available in ArcGIS Enterprise?
- How do I access Data Pipelines?
- How can I get started with Data Pipelines?
- What data can I use in Data Pipelines?
- Can I use ArcGIS Living Atlas layers as input to my data pipeline?
- Can I connect to my datasets on the Google Cloud platform?
- My data was updated in its source location. How do I sync my dataset in my data pipeline?
- Where can I store my Data Pipelines results? Can I store them in Amazon S3?
- How many features can I write to a feature layer or table using Data Pipelines?
- Can I geocode addresses using Data Pipelines?
- What tools are coming in future releases?
- Can I share a data pipeline?
- Is there a way to undo or redo an action in the Data Pipelines editor?
- Is there a way to copy and paste elements in a diagram?
- Which languages is Data Pipelines available in?
- How do I stay up to date with the known limitations in Data Pipelines beta?
- Can I schedule a data pipeline run?
- How is Data Pipelines different from ArcGIS Velocity?
Data Pipelines is an ArcGIS Online capability that allows you to connect to and process data from various sources. You can perform data preparation and save the results to your Web GIS to complete your organization's workflows. All of this is completed using an intuitive interface where you can construct, run, save, share, and reproduce your data preparation workflows.
When software is in beta, components and features may have incomplete functionality or documentation, may undergo minor unannounced changes, and are subject to change without notice.
To access Data Pipelines, your organization must enable access to apps while they are in beta. See Blocked Esri apps for more information.
If you have issues or experience problems with any of the beta functionality, leave a comment in the Data Pipelines Community.
Yes. Credit consumption is based on compute resource usage time. See Sessions for more information on compute resources and sessions.
Credits are consumed at a rate of 30 credits per hour, calculated per minute. Credits are consumed when a compute resource is running. Compute resources are running in the following scenarios:
- In the editor while the session status shows Connected. Credits are not consumed when any other status is shown.
- When scheduled data pipeline tasks are running. Credits are consumed based on the time it takes to run the data pipeline.
- After all browser tabs with editor sessions have been closed for at least 15 minutes.
- After 60 minutes of inactivity in all browser tabs with editor sessions. The session status will show as Disconnected.
- When a scheduled data pipeline task run is complete.
For the beta release of Data Pipelines, you can access it in the following ways:
- Use the app launcher and choose Data Pipelines (beta).
- Use direct URLs to the app. The following URLs are available:
- App landing page—https://arcgis.com/apps/datapipelines/
- Editor for a new data pipeline—https://arcgis.com/apps/datapipelines/editor
- Editor for an existing data pipeline—https://arcgis.com/apps/datapipelines/editor?item=<data pipeline item id>. The data pipeline item ID is the ID associated with the item in your content.
- Manage scheduled tasks page—https://arcgis.com/apps/datapipelines/tasks
To access Data Pipelines, the following requirements must be met:
- Your organization must have access to apps that are currently in beta. See Configure security settings to learn more about blocked apps.
- Your user account must have the required privileges. See Requirements to learn more about the privileges required to access Data Pipelines.
If you are unsure whether you or your organization meets the requirements above, contact your organization administrator.
To get started with Data Pipelines, see Tutorial: Create a data pipeline. The tutorial outlines the key components for using Data Pipelines, including connecting to and processing data, running a data pipeline, and more.
For more resources to get started, see the Esri Community blog posts.
The following types of data are supported as input:
- Amazon S3
- Feature layers
- Files from public URLs
- Files uploaded to content
- Google BigQuery
- Microsoft Azure Blob Storage
- Snowflake
See the linked input data type documentation to learn more about supported file types and how to connect to an input dataset.
Yes. You can use ArcGIS Living Atlas feature layers as input. To add a layer to a diagram, see Feature layer. By default, the feature layer browse dialog box opens to My content. To search for an ArcGIS Living Atlas layer, switch to Living Atlas on the dialog box.
No, not yet. In future releases, the following additional types of external data sources will be supported:
- Google Cloud platform
- Microsoft Azure Cosmos DB for PostgreSQL
- Data returned from API requests
The data sources in this list are not guaranteed for a specific release, and data sources that are not listed here may be added. If you have suggestions for data sources that will improve your workflows, leave a comment in the Data Pipelines Community forums.
If the data is regularly updating in the source location and you want to use it in a data pipeline, it is recommended you do not use the Use caching parameter for inputs. If you do not use caching, Data Pipelines reads the latest data every time you request a preview or run. If you use caching, only the data available at the time you cached is used.
If you created an output feature layer and need to update it with the latest data, use the Replace or Add and update options in the Feature layer tool, and run the data pipeline again. You can automate rerunning a data pipeline by scheduling a task for the data pipeline item. To learn more about automating data pipeline workflows, see Schedule a data pipeline task.
The following tools may be included in future releases:
- Find and replace—Search fields for specific values and replace them with a new value.
- Geocode addresses—Use string addresses from a table or file to return the geocoded results.
- Flatten field—Flatten array type fields into a new field.
- Explode field—Explode map and array type fields into new rows.
The tools in this list are not guaranteed for any release, and tools that are not listed here may be added. If you have suggestions for tools that will improve your workflows, leave a comment in the Data Pipelines Community forums.
Yes. You can share data pipeline items with groups in your organization or with the public. Only the owner of the item can edit data pipeline items. Use shared update groups so everyone in the group can edit and save the data pipeline. If a data pipeline is shared with a group that does not have shared update capabilities, you can save the data pipeline as an editable copy in your content using the Save As option on the editor toolbar.
For the beta release, the Data Pipelines app will be translated to all languages supported by ArcGIS Online. The Data Pipelines app will appear in the language specified in your user settings.
For the beta release, the web help for Data Pipelines will be available in the following languages:
- English
- French
- German
- Japanese
- Simplified Chinese
- Spanish
In a future release, the web help will be translated to additional languages.
Visit the Data Pipelines Community to find posts about known issues or limitations. You can also create a post with your inquiry.
Yes. You can create tasks for data pipeline items to run your workflows on a schedule. To learn more about creating data pipeline tasks, see Schedule a data pipeline.
There are certain similarities between Data Pipelines and Velocity in ArcGIS Online. Both applications allow you to connect to external data sources and import the data into ArcGIS Online for use across the ArcGIS system. However, they serve distinct purposes. Velocity is specifically designed for real-time and big data processing, efficiently handling high-speed data streams from sensors and similar sources. It also is focused on enabling analytics such as device tracking, incident detection, and pattern analysis. Data Pipelines is primarily a data integration application that focuses on data engineering tasks, particularly for non-sensor-based data streams. While Velocity is used for handling real-time data, Data Pipelines is used for managing and optimizing data that requires updates on a less frequent basis.