Skip To Content

File

Use records in files as input to ArcGIS Data Pipelines.

Usage notes

Keep the following in mind when working with files:

  • Using file input allows you to load data from files available in ArcGIS Online content.
  • When you add a file input to the canvas, the Select a file dialog box will appear with the following options:
    • Browse to existing files—Browse content for a previously uploaded item. You can browse your content, content that has been shared with you, and content that is available to your organization and ArcGIS Online.
    • Upload a new file—Upload a file from disk or choose from a list of cloud-hosted options. See Add files as items for more information.
    Data Pipelines does not support all of the file types that can be uploaded directly to your content. See the file format information below for the supported formats.
  • The File format parameter populates automatically with the format of the file you select. The following format options are supported:
    • CSV or delimited—A file containing delimited values (.csv)
    • Shapefile—A zipped folder containing a set of related files that make up the shapefile (.shp)
    • GeoJSON—An open standard geospatial data interchange format that represents simple geographic features and their nonspatial attributes (.geojson or .json)
    • Parquet—A highly compressed column-oriented tabular, nonspatial storage and sharing format (.parquet)
    • File Geodatabase—A zipped file geodatabase (.gdb)
  • If the CSV or delimited format option is specified, the following dataset definition parameters are available:
    • Delimiter—The delimiter used to split field (or column) and record (or row) values. The default is comma delimited (,). Other common delimiter formats include, but are not limited to, tab (\t), semicolon (;), vertical bar (|), and forward and backward slashes (/ and \).
    • Has header row—Specifies whether the dataset contains a header row. The default is true. If set to false, the first row of the dataset will be considered a record.
    • Has multiline data—Specifies whether the dataset has records that contain new line characters. The default is false. If set to true, data that contains multiline data will be read and formatted correctly.
    • Character encoding—Specifies the encoding type used to read the specified dataset. The default is UTF-8. You can choose from the available encoding options, or specify an encoding type. Spaces are not supported in encoding values. For example, specifying a value of ISO 8859-8 is invalid and must be specified as ISO-8859-8.
  • Fields is available to configure field names and types when the data format value is CSV or delimited. The Configure schema button opens a dialog box containing the dataset fields with the following options:
    • Include or drop fields—You can remove fields by checking the check box next to the field. By default, all fields are included.
    • Field name—The name of the field as it will be used in Data Pipelines. This value can be edited. By default, this value will be the same as the field in the source dataset unless the source name contains invalid characters or is a reserved word. Invalid characters will be replaced with an underscore (_), and reserved words will be prefixed with an underscore (_).
    • Field type—The field type as it will be used in Data Pipelines. This value can be edited.
    The following table describes the available field types:

    Field typeDescription

    String

    String fields support a string of text characters.

    Small integer

    Small integer fields support whole numbers between -32768 and 32767.

    Integer

    Integer fields support whole numbers between -2147483648 and 2147483647.

    Big integer

    Big integer fields support whole numbers between -9223372036854776000 and 9223372036854776000.

    Float

    Float fields support fractional numbers between approximately -3.4E38 and 3.4E38.

    Double

    Double fields support fractional numbers between approximately -2.2E308 and 1.8E308.

    Date

    Date fields support values in the format yyyy-MM-dd HH:mm:ss, for example, a valid value is 2022-12-31 13:30:30. If the date values are stored in a different format, use the Create date time tool to calculate a date field.

    Boolean

    Boolean fields support values of True and False. If a field contains integer representations of Boolean values (0 and 1), use the Update fields tool to cast the integers to Boolean values instead.

  • If the GeoJSON format option is specified, the Geometry type parameter is available. This parameter is optional. By default, the geometry type in the GeoJSON file is used. If the GeoJSON file contains more than one geometry type, you must specify the value for this parameter. Mixed geometry types are not supported and only the specified type will be used. The options are Point, Multipoint, Polyline, and Polygon. A geometry field containing the locations of the GeoJSON data will be automatically calculated and added to the input dataset. The geometry field can be used as input to spatial operations or to enable geometry on the output result.
  • If the File Geodatabase format option is specified, the Feature class or table name parameter is available. Use this parameter to specify the name of the feature class or table you want to use as input. Only point, multipoint, polyline, and polygon feature classes and tables are supported. Datasets such as raster, mosaic, trajectory, and others are not supported. Advanced feature types such as geometric network features are not supported.
  • To improve the performance of reading input datasets, consider the following options:
    • Use the Use caching parameter to store a copy of the dataset. The cached copy is only maintained while at least one browser tab open to the editor is connected. This may make it faster to access the data during processing. If the source data has been updated since it was cached, uncheck this parameter and preview or run the tool again.
    • After configuring an input dataset, configure any of the following tools that limit the amount of data being processed:

Limitations

The following are known limitations:

  • Excel (.xlsx) files are not supported in Data Pipelines.
  • Text files (.txt), ORC files (.orc), JSON files (.json), GeoParquet (.geoparquet), and EsriJSON files (.esrijson) are not supported for file upload. To learn more about supported items in ArcGIS Online, see What can you add to ArcGIS Online.
  • If you have a .txt file that contains delimited values, save it as a .csv file and upload it in that format.
  • If the dataset includes field names with spaces or invalid characters, the names are automatically updated to use underscores. For example, a field named Population 2022 is renamed Population_2022, and a field named %Employed is renamed _Employed.

Licensing requirements

The following licensing and configurations are required:

  • Creator or Professional user type
  • Publisher, Facilitator, or Administrator role, or an equivalent custom role

To learn more about Data Pipelines requirements, see Requirements.

Related topics

See Dataset configuration for more information.