Use Microsoft Azure Storage records—ArcGIS Data Pipelines

Use records from files stored in a Microsoft Azure Storage container as input to ArcGIS Data Pipelines.

Usage notes

Keep the following in mind when working with Microsoft Azure Storage:

To use a dataset from Azure Storage, you must first create a data store item. Data store items securely store credentials and connection information so the data can be read by Data Pipelines. To create a data store, follow the steps in the Connect to Azure Storage section below.
To change the data store item you configured, use the Data store item parameter to remove the currently selected item, and choose one of the following options:
- Add data store—Create a new data store item.
- Select item—Browse your content to select an existing data store item.
Use the Dataset path parameter to specify the name of the dataset, or the name of the folder that contains the dataset. For example, you can specify dataset paths in the following ways:
- Reference a single file by specifying the path to that file such as Hurricanes.shp or CustomerInfo.csv.
- Reference a folder containing multiple datasets by specifying a path such as MyFolder/. All files in the folder must have the same schema and file type.
- Reference specific file types from a folder that contains multiple files and formats by specifying a path such as MyFolder/*.parquet. In this example, only the Parquet files will be read. All of Parquet files in the folder must have the same schema.
- Reference multiple files and nested folders using glob patterns by specifying a path such as MyFolder/**/*.geojson. In this example, any subfolders within MyFolder and any GeoJSON files within those subfolders will be loaded.
The dataset path must also be relative to the container and folder that were specified when creating the data store item. For example, if the full dataset path is https://myaccount.blob.core.windows.net/my-container/my-folder/my-subfolder/file.csv, and the data store item specifies my-container for the container, and my-folder for the folder, the dataset path should be my-subfolder/file.csv.
Use the File format parameter to specify the file format of the dataset specified in the Dataset path parameter. The following format options are supported:
- CSV or delimited (for example, .csv, .tsv, and .txt)
- Parquet (.parquet)
- GeoParquet (.parquet)
- JSON (for example, .json or a .txt file containing data formatted as JSON )
- GeoJSON (for example, .json and .geojson, or a .txt file containing data formatted as GeoJSON)
- Shapefile (.shp)
- File Geodatabase (.gdb)
- ORC (.orc)
If the CSV or delimited format option is specified, the following dataset definition parameters are available:
- Delimiter—The delimiter used to split field (or column) and record (or row) values. You can choose from the following options or enter your own value:
  - Comma (,)—Field and record values are separated by commas (,). This is the default.
  - Tab (\t)—Field and record values are separated by tabs (\t).
  - Pipe (|)—Field and record values are separated by pipes (|).
  - Semicolon (;)—Field and record values are separated by semicolons (;).
  - Space ( )—Field and record values are separated by spaces ( ).
  If you are entering your own value it must be one or two characters in length, including spaces. Delimiters longer than two characters are not supported.
- Has header row—Specifies whether the dataset contains a header row. The default is true. If set to false, the first row of the dataset will be considered a record.
- Has multiline data—Specifies whether the dataset has records that contain new line characters. The default is false. If set to true, data that contains multiline data will be read and formatted correctly.
- Character encoding—Specifies the encoding type used to read the specified dataset. The default is UTF-8. You can choose from the available encoding options or specify an encoding type. Spaces are not supported in encoding values. For example, specifying a value of ISO 8859-8 is invalid and must be specified as ISO-8859-8.

Fields is available to configure field names and types when the data format value is CSV or delimited. The Configure schema button opens a dialog box containing the dataset fields with the following options:

Include or drop fields—You can remove fields by checking the check box next to the field. By default, all fields are included.
Field name—The name of the field as it will be used in Data Pipelines. This value can be edited. By default, this value will be the same as the field in the source dataset unless the source name contains invalid characters or is a reserved word. Invalid characters will be replaced with an underscore (_), and reserved words will be prefixed with an underscore (_).
Field type—The field type as it will be used in Data Pipelines.

Removing or modifying fields in Data Pipelines will not modify the source data.

The following table describes the available field types:


Field type	Description
String	String fields support a string of text characters.
Small integer	Small integer fields support whole numbers between -32768 and 32767.
Integer	Integer fields support whole numbers between -2147483648 and 2147483647.
Big integer	Big integer fields support whole numbers between -9223372036854776000 and 9223372036854776000.
Float	Float fields support fractional numbers between approximately -3.4E38 and 3.4E38.
Double	Double fields support fractional numbers between approximately -2.2E308 and 1.8E308.
Date	Date fields support values in the format yyyy-MM-dd HH:mm:ss, for example, a valid value is 2025-12-31 13:30:30. If the date values are stored in a different format, use the Create date time tool to calculate a date field.
Date only	Date fields support values in the format yyyy-MM-dd, for example, a valid value is 2025-12-31. If the date only values are stored in a different format, use the values as input to the Calculate field tool to calculate a date only field.
Boolean	Boolean fields support values of True and False. If a field contains integer representations of Boolean values (0 and 1), use the Update fields tool to cast the integers to Boolean values instead.

If the JSON format option is specified, the Root property parameter is available. You can use this parameter to specify a property in the JSON to read data from. You can reference nested properties using a decimal separator between each property, for example, property.subProperty. By default, the full JSON file will be read.
If the GeoJSON format option is specified, the Geometry type parameter is available. This parameter is optional. By default, the geometry type in the GeoJSON file is used. If the GeoJSON file contains more than one geometry type, you must specify the value for this parameter. Mixed geometry types are not supported and only the specified type will be used. The options are Point, Multipoint, Polyline, and Polygon. A geometry field containing the locations of the GeoJSON data will be automatically calculated and added to the input dataset. The geometry field can be used as input to spatial operations or to enable geometry on the output result.
If the File Geodatabase format option is specified, the Feature class or table name parameter is available. Use this parameter to specify the name of the feature class or table you want to use as input. Only point, multipoint, polyline, and polygon feature classes and tables are supported. Datasets such as raster, mosaic, trajectory, and others are not supported. Advanced feature types such as geometric network features are not supported.
To improve the performance of reading input datasets, consider the following options:
- Use the Use caching parameter to store a copy of the dataset. The cached copy is only maintained while at least one browser tab open to the editor is connected. This may make it faster to access the data during processing. If the source data has been updated since it was cached, uncheck this parameter and preview or run the tool again.
- After configuring an input dataset, configure any of the following tools that limit the amount of data being processed:
  - Filter by attribute—Maintain a subset of records that contain certain attribute values.
  - Filter by extent—Maintain a subset of records within a certain spatial extent.
  - Select fields—Maintain only the fields of interest.
  - Clip—Maintain a subset of records that intersect with specific geometries.

Connect to Azure Storage

To use data stored in Azure Storage, complete the following steps to create a data store item in the Data Pipelines editor:

On the Data Pipelines editor toolbar, click Inputs and choose Microsoft Azure Storage.
The Select a data store connection dialog box appears.
Choose Add a new data store.
Click Next.
The Add a connection to a data store dialog box appears.
Select the authentication type used to access the data.
Provide the authentication values.
The authentication values vary depending on the authentication type selected.
Provide the name of the container that the data is stored in.
Optionally, provide the path to a folder within the container to register it.
Click Next.
The item details pane appears.
Click Create connection to create the data store item.
The Select datasets dialog box appears.
Provide the path to the dataset to use as input to the data pipeline.
Provide the file format of the dataset specified in the previous step.
Click Add.
A Microsoft Azure Storage element is added to the canvas.

Limitations

The following are known limitations:

Your credentials must have at least READ and LIST permissions. These permissions allow access to the specified container and read the datasets within it.
If you specify a folder containing multiple files that represent a single dataset, all files identified in the Azure Storage folder must have the same schema and geometry type.
Zipped files (.zip) are not supported.
Esri JSON files (.esrijson) are not supported.
The Azure data being input to Data Pipelines must have Enable soft delete for blobs disabled.
If the dataset includes field names with spaces or invalid characters, the names are automatically updated to use underscores. For example, a field named Population 2022 is renamed Population_2022, and a field named %Employed is renamed _Employed.
To use a data store item to connect to external data sources, you must be the owner of the data store item. Data store items are private and cannot be shared.

Licensing requirements

The following licensing and configurations are required:

Creator or Professional user type
Publisher, Facilitator, or Administrator role, or an equivalent custom role

To learn more about Data Pipelines requirements, see Requirements.

Feedback on this topic?

Usage notes

Connect to Azure Storage

Limitations

Licensing requirements

In this topic