Source, derived, and referenced mosaic datasets—Imagery Workflows

When large collections of imagery are to be managed, it is often impractical to work with a single mosaic dataset to manage all your imagery, so most workflows follow a pattern of using source and derived mosaic datasets. Sometimes referenced mosaic datasets are created as a subset. This pattern breaks a potentially complex task into smaller tasks, and makes it easier to manage multiple sources, perform quality assurance of the mosaic datasets, and maintain the services.

Although it is possible to create a single mosaic dataset from many collections of imagery, the best practice is to use a combination of different mosaic datasets, as is summarized in the following sections and diagrammed in the image below.

Source mosaic datasets

Source mosaic datasets are typically created for subset image collections of a large project and then combined into a derived mosaic dataset. For each collection of similar images, a source mosaic dataset is created, which represents a single manageable unit typically used for checking that metadata is defined correctly, defining specific processes to be applied, or doing quality assurance. Each record in the source mosaic dataset defines an image or raster with specific metadata. A source mosaic dataset could represent all imagery from a specific type of sensor, or represent imagery that was acquired as a part of a discrete project that covers a known extent or period in time. The number of images in each source mosaic dataset typically ranges from tens to hundreds of thousands of images. Source mosaic datasets are generally not made accessible to the end users or served as image services.

All imagery in a source mosaic dataset should have the following attributes:

Similar in terms of the number of bands, bit depth, and type of metadata
Use a single raster type for the imagery added
Have similar scales or pixel size (though possibly in different projections)

Typically, if modifications to the raster item (within the mosaic dataset) are required—clipping images to a footprint for example, or applying a stretch or orthorectification—they are defined and refined in the source mosaic dataset.

The spatial reference of a source mosaic dataset should be the best choice to encompass all imagery. For example, do not use a state plane projection to contain data across an entire country. Instead, use a projection suitable to contain the entire country's data. The imagery to be added to the source mosaic should within the extent horizon of the selected spatial reference system. If all the imagery is of a single projection then typically the mosaic dataset is created in this projection.

The number of bands and bit depth of the source mosaic dataset are set to be suitable to contain all the data. For example, a source mosaic dataset with high-resolution satellite imagery, such as GeoEye-1, Ikonos, or QuickBird, would be defined as 4-band, 16-bit.

Source mosaic datasets do not have to be static; over time additional rasters can be added. In some workflows, source mosaic datasets are created manually, while for others the creation of source mosaic datasets may be fully automated.

Overviews are typically computed for source mosaic datasets, then summary attributes are copied to the overview records. For example if all the imagery were collected from a specific project, an attribute called ProjectID may be added to all the images, including the overviews. Later, if multiple source mosaic datasets are added to a derived mosaic dataset and published, users would still be able to include a query such as ProjectID=1234 and only see the imagery (including overviews) for the specific project.

As source mosaic datasets are generally not directly used as image services, their properties are not as important to set. The primary reason to set properties for the source mosaic datasets is to enable quality assurance checking of the mosaic datasets. Most workflows will set all the required properties to ensure suitable quality assurance.

Derived mosaic datasets

Derived mosaic datasets are created from multiple source mosaic datasets. The derived mosaic dataset typically combines multiple source mosaic datasets into a single larger collection.

Imagery is added to the derived mosaic dataset using the Table raster type. This enables all records from one or more source mosaic datasets to be added. When the Table raster type is used and the source is another mosaic dataset, then the complete record, including processing and metadata attributes, is copied from the source. In some cases, only a subset of the source mosaic dataset(s) will be added to a derived mosaic dataset. For example, images with too much cloud cover may be excluded based on metadata provided in the source mosaic dataset. The spatial reference of the derived mosaic dataset is set to encompass all the imagery and may be different than the source. The number of bands and bit depth is set to be appropriate for all the data sources.

Optionally, functions can be applied to transform the data. For example, the Extract Bands function may be used to convert imagery from 4-band to 3-band, or a stretch might be applied to convert from 16-bit to 8-bit. In most cases, each derived mosaic dataset will have a range of functions added to define different products. For example, a mosaic dataset that provides elevation data may have a set of functions added to provide hillshade, slope, and aspect representations.

Multiple derived mosaic datasets may use the same source mosaic datasets. For example, a derived mosaic dataset for natural color imagery and one for enabling multispectral analysis may use the same source mosaic dataset from a high-resolution satellite.

In many workflows, overviews are computed on the source mosaic datasets, then added to the derived mosaic datasets. When attributed correctly, they allow users to view collections of imagery at small scales by setting appropriate filters.

In some cases, imagery is directly added to a derived mosaic dataset, rather than being organized into a source mosaic dataset first. For example, an image source such as World Imagery or NaturalVue (available on ArcGIS Online as an image service or cached map service providing global 15-meter resolution imagery) may be added to provide a background image for natural color imagery, or an overview image from some other source may be added to provide context at small scales. If no suitable overview exists for the derived mosaic dataset, then overviews may be built.

Derived mosaic datasets do not need to be static, and over time, the source mosaic datasets from which they are derived may change or new source mosaic datasets may be added. To update the derived mosaic datasets, two different approaches can be used. The Synchronize Mosaic Dataset tool can be used, which checks for changes in all sources and updates any changes. Alternatively, if the process of creating the derived mosaic dataset is automated, the derived mosaic dataset can be re-created, as the process is generally very fast.

Derived mosaic datasets may be directly served, but there are cases where it is better to create and serve referenced mosaic datasets.

The steps to create a derived mosaic dataset are similar to those of a source mosaic dataset:

Create a derived mosaic dataset using the Table raster type
Add the source mosaic datasets
Refine the mosaic dataset properties
Compute cell sizes
Refine footprints and define NoData
Generate Overviews

Create a new mosaic dataset

Derived mosaic datasets are created using the spatial reference system, bands, and bit depth appropriate for the final service. For organizations that work on local datasets and have standardized on one spatial reference system, this is typically used. For global datasets, the Web Mercator Auxiliary Sphere projection is often used. The spatial reference system of the derived mosaic dataset does not need to be the same as the source, but it should be noted that when the footprints of the source mosaic dataset are transformed to the derived mosaic dataset spatial reference system, then the footprint will be densified if there are differences in the curvature of the projection. This densification can add a large number of vertices to a footprint, which can affect performance.

Add rasters

The Table raster type is used when creating a derived mosaic dataset. This raster type ensures that every item in the source mosaic dataset is duplicated in the derived mosaic datasets and ensures that all records and associated raster item properties are quickly accessible, The process of creating a derived mosaic dataset by this method is fast as it is not necessary for the system to read metadata from the source imagery, instead all the metadata and attributes are quickly copied.

Although this may result in a large number of records in the derived mosaic dataset, it is more scalable method. An alternative method is to add the mosaic data set using the Raster Dataset raster type. This will add the source mosaic datasets as a single item. The resulting derived mosaic dataset would then only have one record for each source mosaic. Although this will work it does not scale well as the system will need to potentially open and close many different mosaic datasets.

There are cases where rasters are directly added to a derived mosaic dataset. For example, a service may use an image, image service, or map service as a background when there is no other imagery to display. This can be achieved by adding the selected image or service as a raster dataset and then setting the ZOrder field to a large positive value, which puts it at a low display priority. As a result if no other imagery is to be displayed then the added raster will be displayed. Note that setting a negative ZOrder value will force the imagery to be displayed at a higher priority than the other rasters.

When adding rasters to the derived mosaic dataset, it is important to turn off the Update Cell Size Ranges parameter. If it's not turned off, every cell size will be recomputed, which can potentially break the ordering that is defined in each source mosaic dataset.

Cell sizes

Cell sizes are copied from the source mosaic dataset, so there is no requirement to recompute them. Running the Calculate Cell Size Ranges tool using defaults should not be done, because this will result in the cell sizes being recomputed based on the standard overlap rules, which is rarely required and will change the imported values (which are difficult to reset). In cases where additional rasters have been added individually, their MinPS and MaxPS values should be set manually.

The Calculate Cell Size Ranges tool not only computes the MinPS and MaxPS cell size values for each raster item, but also computes values for a levels table. This table is used to determine how to group images together based on their scale ranges so that functionality such as seamline generation can correctly create lines around images of similar pixel sizes. The grouping is determined based on the mosaic dataset's Cell Size Tolerance Factor property. Therefore, it may be necessary to set this value and run the Calculate Cell Size Ranges tool, with the Compute Minimum and Maximum Cell Sizes parameters turned off (unchecked).

Footprints, boundaries, and NoData

In most cases, there is no need to refine footprints or change NoData values in the derived mosaic datasets. There are cases where the boundary may need to be recomputed. Instead of computing the boundary when the source mosaic datasets are added, the boundary is usually computed once after all the sources are added using the Build Boundary tool. In many cases where the boundary geometry becomes unnecessarily complex, the boundary is set to the envelope of the footprints using the Build Boundary tool with the simplification method set to Envelope.

You need to consider if the imagery should be clipped by the boundary. The mosaic dataset's Always Clip the mosaic dataset to its Boundary property can be set to either clip or not clip the imagery to the boundary. Typically this is set to clip only when the boundary is to be used to restrict access to imagery outside the boundary. Otherwise, it is better not to clip to the boundary so as to remove the additional clip processing that would be performed.

The extent of an image service is set when the service is published, based on the boundary. This cannot be changed while the service is running. In applications where new imagery is added to the service after it has been published, you need to ensure that the extent (envelope) of the service is sufficient to cover all new imagery. Therefore, it is sometimes necessary to redefine the boundary of a service as a rectangle coving the complete extent of all imagery that may be added. This can be done using the standard feature editing tools and modifying boundary feature.

Overviews

In many cases the overviews in the source mosaic datasets are used in the derived mosaic datasets. As long as suitable attributes are defined for the overviews, they can be used in some queries. For example, a derived mosaic dataset of high-resolution satellite imagery created from source mosaic datasets from different sensors may have overviews attributed as QuickBird or GeoEye1. Note that when overviews are imported using a table raster type, the Category field is set back to primary.

It is often advantageous to create a separate overview from the derived mosaic dataset for use at the very small scales. When a user zooms to the extent of a mosaic dataset (which often occurs), it is advantageous if the system only needs to read a single raster. To enable this, it is best to define and build overviews for the smallest scales. Typically, the pixel size for these overviews may be set to about 1/5000 of the width. As with creating overviews for source mosaic datasets, it is best to build these overviews after the appropriate default mosaic method has been defined.

For some imagery (such as 3-band 8-bit) it is often also possible to create a cache instead of overviews, then use the cache in place of overviews. This can be advantageous in cases where on-demand caching is used for the larger scales. This is discussed in more detail in the caching section.

Referenced mosaic datasets

Referenced mosaic datasets are sometimes created by referencing derived or source mosaic datasets. A reference mosaic dataset has its own properties and service level functions, but uses the footprint table of what it references.

The reference can be defined with a query such that a reference mosaic dataset can also be a subset of the source. For example, from a derived mosaic representing elevation data for the whole world, a referenced mosaic dataset may be created to define a hillshade or slope map product for a selected area.

As ArcGIS manages security at the service level, one way of defining different access rights to different user groups is to create separate referenced mosaic datasets for each group.

Referenced mosaic datasets are also often created to define different restrictions. For example, downloading may be restricted in one service, but enabled in another that is used for geoprocessing. Similarly applying color correction is a property of the mosaic dataset and not set by the client application. You may want to publish an image services with and without color correction. This can be done by creating and publishing a referenced mosaic dataset.

Another possible use is for services that require different default properties. For example, you may need to serve two web map services, one serving natural color and the other false color. This could be done by creating a single 4-band image service that defaults to natural color, with a separate reference mosaic dataset with the Extract Bands to False Color server function.

Feedback on this topic?