The File output writes data pipeline datasets to a CSV, GeoJSON or Apache Parquet file in your content. You can create a new file, or overwrite an existing file.
Parameters
The following table describes the parameters used in the File output:
| Parameter | Description |
|---|---|
Input dataset | The dataset that will be written as a file. |
Output method | Specifies the method that will be used to write the output results. The options are Create (the default) and Overwrite. |
File format | The format of the file that will be written. The options are CSV, GeoJSON and Apache Parquet. |
Geometry field | The geometry field that will be represented in the resulting GeoJSON file. This parameter is not available for CSV or Aparche Parquet file formats. |
Geospatial format | The geospatial format used to write an Apache Parquet file output. Currently the only option is GeoParquet 1.1. This parameter is not available for CSV or GeoJSON file formats. |
Primary geometry field | The primary geometry field used to write an Apache Parquet file output. A primary geometry field is required for the GeoParquet 1.1 specifications. This parameter is not available for CSV or GeoJSON file formats. |
Title | The title of the output file item. |
Overwrite if item already exists | Specifies whether an existing item with the provided output name will be overwritten (checked). Overwrite is enabled by default. |
Folder | The output folder where the file will be saved. |
File item | An existing file to be overwritten the next time the data pipeline runs. This option is only available if the Output method parameter is set to Overwrite. |
Usage notes
To run a data pipeline, at least one output must be configured.
Use the Input dataset parameter to identify the dataset that will be written as a file.
Use the Output method parameter to specify how the data pipeline results will be written. The options are as follows:
- Create—A new file will be created in your content. You can optionally use the Overwrite if item already exists parameter to overwrite the file each time the data pipeline is run.
- Overwrite—An existing file will be completely overwritten.
The following parameters are available when the Create output method is selected:
- The File format parameter determines the type of file that will be written. The options are CSV, GeoJSON and Apache Parquet.
- The Geometry field parameter is only available for GeoJSON file formats and is used to determine the geometry of the resulting GeoJSON. GeoJSON files only support the WGS84 (4326) coordinate system. If the geometry field uses a coordinate system other than WGS84 (4326), it will be automatically projected before the data is written.
- The Primary geometry field parameter determines which geometry field will be used when writing an Apache Parquet file. This parameter is only available for Apache Parquet file formats and is required per the GeoParquet 1.1 specifications. Currently, GeoParquet 1.1 is the only supported option for the Geospatial format parameter.
- The Title parameter specifies the name of the item that will be created or overwritten.
- The Overwrite if item already exists parameter allows you to rerun the data pipeline without changing the output name. This operation will completely overwrite a file with the specified title that is stored in the specified folder. This parameter is enabled by default.
- The Folder parameter determines which folder the newly created result will be stored in, or which folder the item to overwrite is currently stored in.
When the Overwrite output method is selected, the File item parameter is available. Use this parameter to browse to and select the item to overwrite. You can overwrite items you own, or items that are shared with you in a shared update group. If the selected item is of type GeoJSON or Apache Parquet, a geometry field parameter will be available to specify which geometry from the input dataset will be used when overwriting the existing file.
For scheduling automated updates to files, it is recommended that you use the Overwrite output method instead of the Overwrite if item already exists option.
To modify the item properties, such as the summary or tags, browse to your portal content page and edit the item directly.
Limitations
The following are known limitations of the File output:
- CSV file outputs will always be tabular. To store string representations of a geometry in a CSV file, use the Update fields tool to convert the geometry field to a string. To drop geometry fields from a dataset before writing it to a CSV file, use the Select fields tool.
- GeoJSON file outputs require a geometry and will always store geometries in the WGS84 (4326) coordinate system to abide by official GeoJSON specifications. If the geometry field uses a coordinate system other than WGS84 (4326), it will be automatically projected before the item is written. If the coordinate system is a custom WKT string, it may fail to be transformed to WGS1984 (4326). Use the Feature layer output instead.
- Apache Parquet file outputs will always store the geometry in a binary column. Some custom coordinate systems (WKT) are not supported for Apache Parquet file outputs and the file may fail to write. Use the Feature layer output instead.
- Overwrite operations do not rollback when a failure to write occurs, which may result in the loss of the item until the data pipeline is run again.
- Not all field types supported from input sources will be maintained when writing results to files. Field types will be automatically converted for compatibility with the selected file format.
- You cannot use the Overwrite option for files you do not own unless you are an administrator, or the file is shared with a shared update group that you are a member of.
- When using a combination of the Create and Overwrite if item already exists output options, the following limitations apply:
- This option is not recommended for scheduled or automated runs. For more reliable scheduled updates of files, use the Overwrite output method and explicitly browse to the File item value to overwrite.
- The item to overwrite must be owned by you and must be stored in the specified folder. If the file has the same name in another folder, it will not be overwritten.
- Overwrite operations do not rollback when a failure to write occurs, which may result in the loss of the item until the data pipeline is run again.
Licensing requirements
The following licensing and configurations are required:
- Creator or Professional user type
- Publisher, Facilitator, or Administrator role, or an equivalent custom role
To learn more about Data Pipelines requirements, see Requirements.
Related topics
See the following for additional information:
- To learn how to update items on an automated schedule, see Schedule a data pipeline task.
- To learn more about items, see Items supported in ArcGIS Online.