Amazon S3

The Amazon S3 source reads records from files stored in an Amazon S3 bucket and performs analysis in ArcGIS Velocity.

Examples

The following are example uses of the Amazon S3 source:

  • A researcher wants to load hundreds of delimited text files from an Amazon S3 bucket into Velocity to perform analysis.
  • A GIS department stores commonly used boundary shapefiles in an Amazon S3 bucket and wants to load the county boundary shapefile into Velocity as an aggregation boundary.

Usage notes

Keep the following in mind when working with the Amazon S3 source:

  • All files identified in the Amazon S3 bucket by the naming pattern specified in the Dataset parameter must have the same schema and geometry type. If specifying a folder name for the Dataset parameter, all files in the directories must have the same file type and schema.
  • The secret access key is encrypted the first time the analytic is saved and is stored in an encrypted state.
  • When specifying the folder path, use forward slashes (/).
  • After configuring source connection properties, see Configure input data to learn how to define the schema and the key properties.
  • When using the Public access mode to connect to public Amazon S3 buckets using Velocity, the public Amazon S3 bucket must have the List action granted to Everyone (public access) under the bucket access control list.
  • Certain Amazon S3 actions are required for the user policy associated with the provided Amazon key for Velocity to successfully connect to an Amazon S3 bucket as well as to the data in the provided bucket and the folder path.
    • The s3:ListBucket action is required for the specified bucket.
    • The s3:GetObject action is required on the specified folder path and subresources (arn:aws:s3:::yourBucketName/*) for an Amazon S3 source to read data.

Parameters

ParameterDescriptionData type

Access key

The Amazon access key ID for the S3 bucket, for example, AKIAIOSFODNN7EXAMPLE.

Velocity uses the access key to load specified data sources into the app.

For details on Amazon access keys, see Accessing AWS using your AWS credentials in the AWS documentation.

String

Secret key

The Amazon secret access key for the S3 bucket, for example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY.

Velocity uses the access key to load specified data sources into the app.

The secret access key is encrypted the first time the analytic is saved and is stored in an encrypted state.

For details on Amazon secret access keys, see Accessing AWS using your AWS credentials in the AWS documentation.

String

S3 bucket name

The name of the Amazon S3 bucket containing the files to read.

String

Folder path

The folder path of the folder containing the files to load into Velocity.

  • If you're loading files from the root level of an Amazon S3 bucket, enter a single forward slash (/).
  • If you're loading files from a folder in the Amazon S3 bucket, enter a forward slash followed by the path to the folder, for example, /gis_data_folder/folder_containing_desired_dataset.

String

Dataset

The name of the file to read if you are loading a single file, or a pattern indicating a set of files, followed by the file type extension.

To build a pattern indicating a set of files, use an asterisk (*) as a wildcard either on its own or in conjunction with a partial file name.

All files identified by the naming pattern must have the same schema and geometry type.

Alternatively, if loading multiple files or nested folders, you can also specify the containing folder name as the dataset name instead of a file name with extension. If specifying a containing folder name as the dataset, you cannot use wildcards or restrict file types. All files from the specified folder will be ingested, and they should all have the same file type.

The following are examples:

  • A single file in a folder—filename.csv
  • All files in a folder—*.shp
  • Select files in a folder—sensor_data_201*.json
  • All files from a directory or a directory of directories (subdirectories):—containingFolderName

String

Load recent files only

Specifies whether the Amazon S3 source loads all files or only the files created or modified since the last run of the analytic.

  • The default is false, which means that each time the analytic runs, all files in the specified bucket and path with the provided dataset name are loaded.
  • When set to true, only files that were modified or created are loaded at each run of the analytic.

The parameter can only be set to true for scheduled big data analytics.

For the first run of a scheduled big data analytic with the parameter set to true, big data analytics do not load any files and the analytic run will complete. Subsequent analytic runs load files with a last modified date since the last scheduled run of the analytic.

Boolean

Considerations and limitations

There are several considerations to keep in mind when using the Amazon S3 source:

  • All files identified in the Amazon S3 bucket by the naming pattern in the dataset property must have the same schema and geometry type.
  • Ingesting JSON with an array of objects referenced by a root node is currently not supported for Amazon S3 or Azure Blob Storage.