The optimal way to manage your multidimensional data will depend on the data itself, the intended use of the data, and many other variables. This section discusses these considerations to help you make the best choices for your project.
Supported multidimensional data types
Many scientific datasets—including climate, ocean, and atmospheric data—are captured at multiple times, depths, or heights. These data are often stored in netCDF, HDF, or GRIB file formats, which represent the spatial, temporal, and other multidimensional characteristics of the data. One file stores multiple variables, each of which can be a 2D, 3D, or 4D array. You can use ArcGIS Pro or ArcGIS Notebooks to display and analyze these datasets or to publish them as web-accessible image services. Multidimensional raster types are also available to create mosaic datasets with these file formats.
The specifications for these file formats are flexible, so data may be formatted using a variety of conventions. In general, ArcGIS supports the following conventions:
- netCDF in CF-1 convention
- HDF (HDF4 and HDF5) in HDFEOS convention
- GRIB written in NCEP library
If you are a data producer and would like your data to be supported natively in ArcGIS, we suggest that you follow the above-mentioned conventions.
Managing multidimensional data
ArcGIS Pro can access netCDF, GRIB, and HDF formats directly for display and analysis. If you want to publish multidimensional datasets as image services, you’ll need to first manage your files using either a mosaic dataset or by converting to cloud raster format (CRF).
A mosaic dataset manages a collection of images by storing image metadata and paths to the images as a catalog in a geodatabase. When used to manage multidimensional rasters using the appropriate raster type, the mosaic dataset also manages variables and the dimensions, where each row in the catalog describes a 2D array with the corresponding variable name and dimension values. ArcGIS Pro can display any slice (2D array) using a multidimension filter or a 3D or 4D array as a cube using a Voxel layer.
Cloud raster format (CRF) is an Esri proprietary file format optimized for storing both regular and multidimensional raster data for distributed computing. CRF stores pixels as 128x128 tiles organized as bundle files in separate folders, which supports multithreaded write.
With multidimensional CRFs, a transpose can be built to make analysis across time or depth—retrieving time series, aggregating by dimension, trend analysis, etc.—more efficient.
Note that if your data includes multiple files, you’ll likely manage them with a mosaic dataset first, then use this to create the CRF.
Publishing a multidimensional image service
If you plan to publish your multidimensional data as an image service, there are additional considerations to keep in mind, including whether you will manage your data using a mosaic dataset or CRF; whether or not to build a transpose CRF; how many services you will publish; and compression.
Mosaic dataset or CRF
When publishing a multidimensional service, your service requirements will determine whether you should manage data using a mosaic dataset or CRF. Is the service for cataloging, viewing, or analysis (using raster analytics)? Will the service be on-premises or in the cloud? Do you need to update the service? How frequently?
The following table compares some of the capabilities of multidimensional mosaic datasets and multidimensional CRF.
Features | CRF (ArcGIS Pro 2.4+) | Mosaic dataset |
---|---|---|
Able to store multiple variables? | Yes | Yes |
Support for setting a default variable? | Yes (ArcGIS Pro 2.7+) | Yes |
Able to include processing templates? | Yes (ArcGIS Pro 2.7+) | Yes |
Speed when generating a temporal profile | Transpose CRF is faster than a mosaic dataset | Slow if the number of slices is large |
Speed performing multidimensional analysis | Transpose CRF is faster than a mosaic dataset | Slow if the number of slices is large |
Storage requirements | CRF supports LERC and LZ77 compression, helping minimize storage requirements. If you’re building a transpose CRF, it will write a temporary file in the output folder; you should have enough disk space to accommodate twice the size of the transpose. | Mosaic datasets don’t store pixels. Storage requirements depend on the source formats of the data. |
Ability to update data | You can append and update data; you can’t insert or delete. | You can append, update, insert, or delete data through table operations. After any change, you’ll have to rebuild the multidimensional info. |
Support for discontinuous datasets | Area between datasets will be stored, which will require extra storage | Since mosaic datasets don’t store pixels, no extra storage is required |
Calculations along dimensions (such as generating a temporal profile) are significantly faster when using transpose CRF than with either conventional CRF or a mosaic dataset. However, building or updating the transpose requires additional time, and the transpose requires additional storage. If the end user of the dataset will not need to perform dimension-wise operations (using the trend tools, generating temporal profiles, etc., ), and will only display one slice at a time, it's not necessary to build a transpose.
For many use cases, multidimensional CRF has advantages over multidimensional mosaic datasets. CRF is recommended in the following scenarios:
- You need to publish an image service in the cloud and place the data in cloud storage. CRF performs faster than using a mosaic dataset to manage netCDF, GRIB, and HDF formats in cloud storage.
- The service is designed for multidimensional analysis, such as generating temporal profiles or performing trend analysis. Transpose CRF will enable faster multidimensional analysis.
- The data for an on-premise service is stored in a low-performing format (irregular gridded data, for example). In this case, you can improve performance by converting the data to CRF and publishing the CRF as an image service.
Note:
If your data includes multiple files, consider managing them with a mosaic dataset first, then use this to create the CRF.
Mosaic datasets are a better option in the following scenarios:
- You need to maintain access to overlapping images, which is preserved with a mosaic dataset.
- You need to handle discontinuous data. For example, since Alaska and Hawaii are not adjacent to the mainland of the United States, there will be a large area of NoData if you create one CRF for the whole United States, which will require additional storage.
- You do not want to create a copy of the data, and the service is mainly used for cataloging operations; you will not need to generate temporal profiles.
- The service will need to be updated by inserting or deleting data (not just appending new data).
- You are managing datasets with different numbers of bands together (3-band imagery and temperature data, for example)
Building transpose CRF
Calculations along dimensions (such as generating a temporal profile) are significantly faster when using transpose CRF than with either standard CRF or a mosaic dataset. However, building or updating the transpose requires additional time, and the transpose requires additional storage.
If the end user of the dataset will perform dimension-wise operations (using the trend tools, generating temporal profiles, etc., ), or will display more than one slice at a time, it's recommended to build a transpose. Otherwise, it's faster and more storage-efficient to stick with standard CRF.
How many services to build
A common question is whether to create one service that includes all variables or a separate service for each variable. Reducing the number of services will generally reduce the overall cost. However, the specifics of your data and requirements of your applications may dictate multiple services.
Following are features of your data that may indicate you should manage your data as separate services.
Different temporal resolutions
Variables with different temporal resolutions should be published as separate services. Because ArcGIS normalizes time, depth, and pressure as StdTime, StdZ, and StdPressure, this recommendation includes the same variable recorded at different temporal resolutions. For example, it is better to manage yearly temperature and daily temperature as separate services.
Wind or ocean data
Since these are commonly displayed using the Vector Field renderer, they should be stored as separate services so they can be easily overlaid onto other variables.
Different bands
Datasets with different numbers of different bands should be published as separate services.
Different spatial resolutions
If you’re managing your data as a CRF, you should publish variables with different spatial resolutions as separate services. CRF manages all variables at the same spatial resolution, meaning you’d have to down-sample high resolution data, losing accuracy, or up-sample lower resolution, wasting storage space.
Mosaic datasets can manage datasets at different spatial resolutions, and can be used to publish these datasets as a single service. In this case, the display will be resampled on the fly based on request. This type of service is not recommended for analysis, though, because clients will not know the resolution of the source data.
Discontinuous datasets
If you’re managing your data using CRF, you may want to publish discontinuous datasets, which contain the same variable but in different regions, as separate services. Otherwise, the service will contain a lot of NoData between the regions, increasing storage requirements.
Alternatively, publishing discontinuous data as one service using a mosaic dataset can be an option if you won’t need to perform analysis on a long time series.
Different pixel types
If your variables are different pixel types, you can generally publish them as a single service. If you choose to create one service out of mixed pixel types, make sure to choose a pixel type that is inclusive of all the data you plan to manage within the mosaic dataset, and define the pixel type when you create mosaic dataset. If you convert this mosaic dataset to a CRF, make sure to define LERC compression with an appropriate accuracy quality for your data.
Alternatively, if a client application will need to access data as integer type, you would need to publish integer data separate from floating point data.
Compression
Compression is used in two places in the publication workflow. First, the data is compressed when the CRF is created. Second, data is compressed for image transfer over the internet when the image service is published. Limited Error Raster Compression (LERC) can be used in both places.
LERC allows you to control the accuracy while compressing the data. It is very efficient for compressing floating point data and has an equally good compression ratio for lossless compression of integer data. It’s well-suited for scientific data, since you can compress the data to minimize storage requirements while maintaining the required accuracy. LZ77 is also supported.
Using multidimensional data stored in the cloud
Recommendations for managing multidimensional data stored in the cloud are similar to general recommendations for managing data in the cloud.
If the source data is already in the cloud, you should create the CRF on a virtual machine (e.g., an EC2 instance) and write to blob storage directly using an ACS file.
If the source data is stored on-premises, you can create the CRF locally and then copy it over to the cloud. An SSD disk is highly recommended if you create CRF locally.