Preparing input data—Managing Elevation

This section will discuss best practices and specifications for preparing your data when using the Managing Elevation workflow.

Imagery requirements

The Managing Elevation workflow is intended to work with specific types of data. The basic data requirements are listed in the table below.

Parameter	Requirement
Bit depth	Required
Recommended file format	Tiled TIF, LZW compression, with internal tile size 128 or 256 pixels (provides the fastest access and smallest size). Consider converting other formats to tiled TIF to improve efficiency and reduce data volume. Note: Converting LIDAR data to tiled TIF will result in loss of point information. Instead, in ArcGIS, manage LIDAR data using LAS files or LAS datasets within a mosaic dataset so that no data is lost. Alternatively, LIDAR can be rasterized to a DEM or DSM with fixed pixel size. For terrains that are best represented by model points and breaklines, you can (1) manage the data as a LAS dataset that includes both the LAS data and breaklines or (2) extract only the required points and breaklines into a managed terrain.
Reprojection or Resampling?	Elevation data should not be unnecessarily reprojected or resampled from its original source. Transforming rasterized elevation data from one coordinate system to another will degrade the quality of the data.

Typical imagery types

Many organizations have their own collections of elevation data acquired by various means. Typically, these have been managed as separate projects, often resulting in users spending a considerable amount of time finding data suitable for different projects. To simplify this process, data managers can combine elevation sources into a single set of mosaic datasets and elevation services that can then be referenced by multiple applications.

Datasets that cover areas larger than the individual project areas are often helpful, providing a better, more extensive base. The following are locations for datasets that are publicly available and can be downloaded for use in such services.

Approximate Pixel Size (m)	Data Identifier	Primary Sources
3.1	NED1/9 Arcsecond	http://www.usgs.gov
10	NED1/3 Arcsecond	http://www.usgs.gov
31	NED1 Arcsecond	http://www.usgs.gov
62	NED 2 Arcsecond	http://www.usgs.gov
93	SRTM	http://www.usgs.gov, http://www.cgiar-csi.org, http://www.nasa.gov
232	GMTED2010	http://www.usgs.gov
928	GEBCO bathymetry	http://www.bodc.ac.uk
4,638	EGM2008 geoid	http://earth-info.nga.mil

Data structure recommendations

Organize your files, directories, and mosaic datasets according to these guidelines.

How should I structure my directories?

Store each collection of image files in a separate directory.
Define a folder hierarchy that makes sense for the data, and plan ahead to provide sufficient granularity later.
- For example: directories might be separated by location, with subdirectories for multiple elevation datasets.
To maximize performance, try to keep the number of files per directory under 1,000.

How should I manage my files?

File names are generally defined by the data provider; keep the original names, if possible.
Store metadata that comes with your imagery in the same location as the imagery files.
Store main imagery files as read-only when possible. This helps ensure that the original files are not modified and that they are backed up multiple times.
Don't set the directory in which the files are stored as read-only. Many of the workflows result in additional pyramid, statistics, or metadata files being written along with the source files. If the directories are read-only during the authoring processes, these files will be stored in separate locations disconnected from the originals.

How should I organize my mosaic datasets?

Store mosaic datasets in a file geodatabase (most cases) or enterprise geodatabase (when multiple users need to edit the mosaic dataset at the same time).
Typically, use one geodatabase for each mosaic dataset or for a small group of related mosaic datasets that define a project. This makes backup and restoring simpler.
Use a standardized naming convention. Imagery Workflows use the following prefixes:
- S_xxx-Source mosaic dataset
- D_xxx-Derived mosaic dataset
- R_xxx-Referenced mosaic dataset

Preparing metadata

Store any metadata files that came with your data in the same directory as your data. Key metadata will be recorded in the attribute table of each mosaic dataset you create to support queries, sorting, and general data management.

If metadata is readable by ArcGIS, view it in the mosaic dataset's properties.
If metadata is not readable by ArcGIS, store it in a network-accessible location and record hyperlinks to that location in the mosaic dataset's attribute table.

Preprocessing

The following preprocessing guidelines should be undertaken before creating mosaic datasets.

Do I need to create pyramids or generate statistics?

Yes, if pyramids are not provided with the original data (or the data is reformatted), pyramids should be created. (Multi-image mosaics typically have built-in pyramids, so you won't need to create them.)

Find more information about building pyramids and calculating statistics in ArcGIS Pro.

In the majority of cases, pyramids can be compressed even if the original data sources contain no compressed data, since analysis is typically not performed on the pyramids themselves. For a table of pyramid sizes, see Raster pyramids.

When creating pyramids, there are environmental variables that control how they are generated. These include the following:

Variable	Description
Pyramid/Compression	For elevation or categorical data, LZW compression is recommended.
Sampling method	For elevation, carefully consider what sampling method to use, but in most cases bilinear is recommended. Using nearest sampling with a factor of 2 will result in a half-pixel shift at each overview level due to alignment of the image extents. If nearest neighbor sampling is required, it is generally better to set the sampling factor to 3 to avoid such shifts, although this can affect the performance at smaller scales by about 20 percent. As a general rule, statistics should be created for elevation datasets, where the range of valid data values can be large. The creation of statistics should be done at the same time (and using the same tool) as pyramids. Note: Statistics are used by the system primarily to ensure suitable default display of the images. If statistics exist with a raster dataset, ArcGIS will apply a stretch to the imagery to make the imagery appear brighter. If statistics are not present, then when displaying a single image, the system will attempt to approximate statistics by reading the central part of the imagery. When creating statistics, an environmental variable called Skip Factor can be used to reduce the time it takes by not reading all the pixels in a raster dataset. One way to identify a reasonable skip factor value is to divide the number of columns by 1,000 and use the resulting integer as the skip factor. Using such a skip factor only reduces the time taken if pyramids exist.

Variable

Description

Pyramid/Compression

For elevation or categorical data, LZW compression is recommended.

Sampling method

For elevation, carefully consider what sampling method to use, but in most cases bilinear is recommended. Using nearest sampling with a factor of 2 will result in a half-pixel shift at each overview level due to alignment of the image extents. If nearest neighbor sampling is required, it is generally better to set the sampling factor to 3 to avoid such shifts, although this can affect the performance at smaller scales by about 20 percent.

As a general rule, statistics should be created for elevation datasets, where the range of valid data values can be large. The creation of statistics should be done at the same time (and using the same tool) as pyramids.

Note:

Statistics are used by the system primarily to ensure suitable default display of the images. If statistics exist with a raster dataset, ArcGIS will apply a stretch to the imagery to make the imagery appear brighter. If statistics are not present, then when displaying a single image, the system will attempt to approximate statistics by reading the central part of the imagery.

When creating statistics, an environmental variable called Skip Factor can be used to reduce the time it takes by not reading all the pixels in a raster dataset. One way to identify a reasonable skip factor value is to divide the number of columns by 1,000 and use the resulting integer as the skip factor.

Using such a skip factor only reduces the time taken if pyramids exist.

Which parameters should I verify before I make mosaic datasets?

In order to make informed decisions when creating mosaic datasets, verify the following beforehand.

Parameter	Guidelines
NoData values?	Ensure that they are correctly defined by reviewing a representative sample of the dataset. This should be done before pyramids are generated, to ensure that pyramids do not have artifacts from incorrectly sampling NoData values. Note: NoData areas for elevation are often very irregular, so it is difficult to use footprints to clip the data. However, since elevation data is usually not compressed using lossy compression, NoData values should be easily defined.
Orthometric or ellipsoidal?	Verify whether the elevation data is orthometric, and note the datum to include in the metadata.
What are the units?	If elevation data is received in units other than meters, the source mosaic dataset should be created in original units. An arithmetic function to convert units will need to be applied when the source mosaic dataset is ingested into the derived mosaic dataset.

Check out the next section to learn more about best practices for creating mosaic datasets in the Managing Elevation workflow.

Feedback on this topic?

Imagery requirements

Note:

Typical imagery types

Data structure recommendations

How should I structure my directories?

How should I manage my files?

How should I organize my mosaic datasets?

Preparing metadata

Preprocessing

Do I need to create pyramids or generate statistics?

Note:

Which parameters should I verify before I make mosaic datasets?

Note:

In this topic