Skip To Content

Imagery storage in the cloud

Cloud storage options

There are two broad categories of virtual storage—file storage and object storage.

Object storage, the most common type of cloud storage, uses metadata to organize and access pieces of data in a storage pool.

File storage—such as what you use on your local machine—stores data hierarchically and allows machines on the same network to share files using Server Message Block (SMB) network protocol (for example, C:/ ). It’s possible to use file storage on the disk attached to the virtual machine (the disk can be shared), but this solution doesn’t scale well.

Object storage is distinct from file storage in a number of ways:

  • It’s relatively inexpensive.
  • Many machines can simultaneously and efficiently access the same storage.
  • It’s REST based (allowing HTTP requests).
  • It’s nearly unlimited in size (object storage can be many petabytes).
  • It has high latency (an object storage request might take 40 milliseconds to return, which doesn’t scale in applications that make thousands of requests to read rasters).

Caching is used to mitigate the high latency of object storage. Ideally, you’ll use cloud storage to store massive volumes of data, and then cache data as it’s accessed on the ephemeral disks. Since people tend to revisit the same data, this will improve performance.

Access to imagery stored in the cloud

There are four ways to access imagery stored in the cloud when using cloud storage:

  • VSI file handlers (//vsicurl, //vsis3, //vsiaz)—You can access imagery using native access within ArcGIS using VSI file handlers to directly access the data. Imagery accessed this way will behave as if they’re local files, will require multiple requests, and will have limited caching. Another limitation of using VSI file handlers is that only a single cloud storage access policy can be defined so all data needs to be from a single organization.
  • Cloud storage connection (ACS) files—Just like you can create a connection to a database, you can create a connection to cloud storage using cloud storage connection files (ACS files) in ArcGIS Pro. When you create the connection, you enter security credentials to be stored in the ACS file. You can then access that cloud storage in a way that looks very similar to a local file system, browsing and selecting files to add to ArcGIS Pro. You can use multiple cloud security profiles to access imagery from one machine. With ACS files, data that is accessed is cached in the server’s temp directory for a short period of time, reducing repeated data requests for frequently accessed imagery.
    Note:

    The temp directory on the server, called localTempFolder, is located in the server admin system properties. For example, for https://zero:6443/arcgis/admin/system/properties, it would be defined as {"localTempFolder":"E:/Temp/data"}. For desktop, the temp location is defined as an environmental variable called TempFolder.

  • Raster proxies—These are small XML files which embed information about the raster file in cloud storage; ArcGIS treats raster proxies as rasters, then accesses the actual data from the cloud only as needed. They can reference most GDAL-readable formats, can have any file extension, and can be referenced or embedded in a mosaic dataset or used directly in ArcGIS.
    • When accessed, the pixels that have been read and the index to the tiles are cached locally so subsequent requests don’t have to go back to the cloud. You’ll need to consider the cache location and manage cache periodically.

      Diagram showing how raster proxies work

  • Cloud raster store—This is created with ArcGIS Server Manager and defines a cloud location to store rasters. These are typically used with ArcGIS Image Server for the output of raster analytics or for image hosting. If you use the ArcGIS Image Server hosting capabilities, the rasters will be stored in the cloud raster store in CRF format.

References to cloud-based imagery in a mosaic dataset

When you create a mosaic dataset, it typically references a file on disk, which isn't suitable in the cloud. The following are options for adding rasters to a mosaic dataset that don’t reference files on disk:

  • A cloud storage connection (ACS) file—Create the ACS file, then use it to add data to a mosaic dataset (you can create a connection to the cloud store, or add all the files accessed by the ACS file directly to the mosaic dataset—on the Add Rasters dialog box, under input data, choose File, change File List (*.csv) to All Files (*.*), and browse to the ACS file).
    Note:
    As of ArcGIS Pro 2.5, publishing a mosaic dataset that accesses data in the cloud via an ACS file requires that the mosaic dataset be published by reference, the ACS file be copied manually to the server, and the mosaic dataset item paths be repaired to point to the ACS file on the server.
  • Embed raster proxies in a mosaic dataset. The raster proxy text string is embedded into the mosaic dataset—no external file is referenced. There are two ways to do this:
    • Create the mosaic dataset using raster proxy files, and use MDTools (part of MDCS) to embed them in the mosaic dataset.
    • Use OptimizeRasters to create the raster proxies as a table, and use that table to create your mosaic dataset.
  • File share to raster proxies—Raster proxies are small XML files that include a reference to a cloud storage location. Since the files are so small, they can be placed in the same file structure on the authoring machine and the server.
  • VSI file handlers (//vsicurl, //vsis3, //vsiaz)—Using the Table raster type, you can use these paths to reference rasters directly in the mosaic dataset.
  • A file share—In this case, the same path for authoring the raster must be available to the server, which is not suitable for cloud storage.

Cloud security for imagery

There are multiple ways of handling access and security.

Public buckets

  • No restrictions—You can have a public bucket that anyone can read.
  • Public, no-list permissions with obfuscated files—You can have a public bucket without a no-list option. Users with the URL to the file can access it, but if you go to the bucket and query that’s there it won’t return anything. If the path of the file is also obfuscated, the URL is impossible to guess. However, if someone gives the URL to someone else, that person can also access it. In that case, it’s not possible to restrict access without removing the file. (The level of security is analogous to allowing someone to download a copy of the file through a secured connection, which they could then possibly share with someone else.)
Note:

Public buckets often make use of Requestor Pays, where the user pays for egress, which requires an account with the cloud storage provider.

Secure buckets

  • Access control list (ACL)—File-level permissions for specific users or system processes.
  • Role-based access control (RBAC)—Can set permissions by user; can use presigned URLs (token-based access); access control lists to define permissions at the file level; can use bucket policies that provide fine-grained control that can use canonical ID (canonical ID is given, and all data is shared with whatever system has that ID).
  • Token-based control—This includes presigned URLs, AWS’s query string request authentication, and Azure’s Shared Access Signature (SAS).
  • Bucket policies—Nuanced access control. You can set this according to canonical IDs, IP addresses, and so on.

Note that cross-origin resource sharing can be an issue. If you have data in the cloud that you want web apps to be able to access directly, you need to be thoughtful about these settings so you don’t prevent access.

Which security option to use is dependent on many factors, but each level of security can affect performance. Typically, the public and obfuscated public options provide faster performance, since they do not require additional security checks.

Performance optimization with cloud storage

Performance in the cloud can be affected by the volume of data read, how efficient the process is, latency, bandwidth, and data structure. ArcGIS does a large amount of back-end optimization to improve performance, including minimizing the number of requests to cloud storage, caching when required, and so on.

Implementing general image management best practices will also improve performance.

File format

It is important to make sure you have data structured correctly to minimize requests that will slow down processing. Different file types—simple TIFF files, netCDF, GriB, different varieties of geoTIFF, COG, MRF, and CRF—all have advantages and disadvantages:

  • Tiling enables partial access to the file, reducing data transfer for large datasets.
  • Compression reduces storage and transfer but has additional compute requirements.
  • Some raster data structures are more complex, decreasing performance by requiring multiple requests to access.
  • Pyramids provide faster access at smaller scales.

Following are summaries of common raster formats.

TIFF or GeoTIFF (Untiled)

Diagram showing how untiled TIFFs are structured

  • Popular format for imagery and rasters.
  • Supports different bit depths and numbers of bands.
  • Includes additional metadata in tags internal to the file.
  • Can include georeferencing information embedded as tags (sometimes called GeoTIFF).
  • Often doesn’t include pyramids and doesn’t use compression.
  • TIFF files from data providers are often in the simplest form and inefficient to access, both generally and in the cloud.

Tiled TIFF

Diagram showing how tiled TIFF files are structured

  • Type of TIFF or GeoTIFF.
  • Pixels are structured into tiles to optimize access, especially for large files. This minimizes the number of disk access requests to get a subset of pixels.
  • Tiling is done by including an index to the tiles, which is stored as part of the tags.
  • Optional JPEG or LZW/Deflate compression can reduce file sizes.
  • Optional pyramids (sometimes referred to as reduced-resolution datasets or overviews) increase access efficiency at smaller scales. These pyramids increase the file size by 30 to 50 percent depending on the compression and type of data.

COG

Diagram showing how Cloud Optimized GeoTIFFs are structured

  • Cloud Optimized GeoTIFF (COG) is a type of tiled TIFF where pyramids are required and the index and pyramids are moved to the beginning of the file.
  • This file restructuring can provide a slight performance improvement in applications that only view the image at small scales or need to crawl for the metadata.
  • Creating COG files takes longer than tiled TIFF because the pyramids and tags are moved to the start of the files.
  • Performance improvements are most noticeable when ArcGIS Pro accesses data locally from the cloud or a slow network. When using local storage, or if both ArcGIS Pro and your data are in the cloud, performance improvements are negligible compared to tiled TIFF.

MRF

Diagram showing how untiled MRFs are structured

  • MRF is a tile-based format developed by NASA for storing and accessing rasters more efficiently and improving performance, especially in cloud storage.
  • The data is tiled and has pyramids (such as tiled TIFF or COG).
  • The pyramids, index, and metadata can be stored as separate files, which can improve access speed when the small index and metadata files are stored separately in fast-access ephemeral storage.
  • MRF supports Limited Error Rate Compression (LERC) in addition to JPEG or LZW/Deflate. LERC provides better and faster compression and decompression than LZW/Deflate. It also supports controlled lossy compression (important for large-bit-depth rasters such as elevation data or digital camera imagery). LERC saves additional storage space while speeding up data access.
  • The way NoData is managed helps remove artifacts at the edges of some images (a result of LERC compression and the way the JPEG tiles are stored).

netCDF, HDF, or GriB

Diagram showing how multidimensional file formats are structured

  • These file types are used to store multidimensional data.
  • Metadata and data are spread among multiple files, so you need to access many files to read a given subset.
  • Accessing these file types from the cloud results in poor performance.

CRF

  • This format is optimized for storing large rasters in cloud storage.
  • The raster is split into bundles, each of which has its own index and a set number of tiles.
  • This structure enables separate processes to write to different bundles in parallel.
  • The tile structure is built into the directory structure.
  • The format is best suited for large rasters, since the file is divided into multiple directories and files.
  • When accessed in ArcGIS Pro, each required bundle is read and cached locally.
  • CRF is not accessed through GDAL and can result in a large number of files.

Transposed CRF

  • This is a type of CRF option that is optimized for multidimensional data.
  • Conceptually, it creates a copy of the data cube turned on its side to optimize time series queries.
  • GDAL has no suitable API for additional dimensions.

JP2

  • This format is optimized for high compression.
  • There are many types of JP2, some more optimized than others for storage size or access.
  • It uses wavelet compression, which typically provides higher compression than JPEG and supports additional bit depths but is relatively slow to access, especially from cloud storage.

Compression

Choosing a compression format means balancing the following factors:

  • The reduction in required storage
  • Loss of data (lossy, controlled lossy, or lossless compression)
  • Time required to write the compressed file
  • Time required to read the compressed file

Different types of compression balance these factors differently. For lossless or controlled lossy compression, LERC generally provides the best compression and is fast to both compress and decompress. LERC has the most value when used with floating-point data, such as elevation. Deflate often provides good compression. For lossy compression, JPEG provides fast compression and decompression while maintaining most of the image information.

Compression performance is dependent on the data source, but the following table shows typical results for a variety of common compression types:

Table showing the time to write, time to read, and resulting size of various raster compression formats

File conversion

Converting imagery to an optimized format is best done before or during the upload process.

There are various ways you can convert imagery into an optimized format and upload simultaneously:

  • The Export Raster pane in ArcGIS Pro
  • Copy Raster geoprocessing tool in ArcGIS Pro
  • GDAL
  • OptimizeRasters, an open-source tool from Esri that uses GDAL behind the scenes, available from GitHub. (It simplifies the process of optimizing data as you copy it to the cloud.)

File transfer

There are also additional upload options that will transfer your files without optimizing the format:

  • ArcGIS Enterprise portal (goes to raster store).
  • OptimizeRasters can be used to upload data to the cloud without file conversion.
  • Third-party tools such as Cloudberry and Amazon CLI.
  • White glove services, where the cloud-storage company ships you a disk for you to copy data, you send it back, and the company uploads it to the cloud for you. It’s a quick way to move large amounts of data into the cloud.