Skip To Content

Data structure

In most cases, the structure of how the data is stored on disk in terms of directories is not important, and it is often best to leave existing data in its original structure. Otherwise, existing applications may have difficulty accessing the data. If new data is being acquired, recommendations may be given on how to structure the files to simplify data management. As a general rule, it is better to structure imagery based on a directory naming convention and hierarchy agreed on by the organization.

The following are recommendations for data structure:

Directory structures

Define a hierarchy that makes sense for the data, and plan ahead to provide sufficient granularity later. A typical hierarchy could be as follows:

DataType\Source\Type\Geography\Date

For example: Satellite\GeoEye\GeoEye1\Europe\2001

Even if you initially only have imagery of one specific subcategory, planning in advance with sufficient granularity makes it easier to extend later.

File naming

The names of the files are generally defined by the data provider. It is recommended that you do not rename any files. If creating new files, it is best to include key descriptors in the name, such as date and some geospatial indicator. Unique names can help later in linking metadata attributes.

Metadata location

Keep all metadata that comes with the imagery in the same location as the imagery.

Files per directory

Try to keep the number of files per directory under 1,000. There is no specific maximum and this does not affect performance of data access, but if there are very large numbers of files in a directory, it can take a long time to list the files in Catalog or Explorer. It is generally better to create subdirectories based on some hierarchy.

Setting files as read-only

Images generally do not change. Files such as pyramids (.ovr) and metadata (.aux.xml) files may be added, but mostly it is possible and recommended to set the main imagery files (.tif, and so on) as read-only. Some processes (such as setting the spatial reference) optionally change the file, but alternately modify the associated .aux.xml file. By having image files set as read-only you can help ensure that the original files are not modified and that they are backed up multiple times. It is recommended to not set the directory in which the files are stored as read-only, since many of the workflows result in additional pyramid, statistics, or metadata files being written along with the source files. If the directories are read-only during the authoring processes, these files will be stored in separate locations disconnected from the originals.

Drive performance

As defined in the section above, due to the large size of imagery, it is not possible for the system to load all imagery into memory, and ArcGIS needs to read the required pixels off the disk system as needed. Therefore, the performance of the disk system is an important component of optimization. It is important that the server have fast access to the imagery. If the imagery is highly compressed this is less of a concern. It is recommended that you check the performance of the disk subsystem by using a disk speed testing utility.

UNC versus drive letters

On some file systems it may be better to reference files by drive names versus UNC paths. Whether there is a difference in performance, and which is faster, is dependent on many different factors, and it is best to determine this by testing.

Location of mosaic datasets

Mosaic datasets can be stored in a file or enterprise geodatabase. In most cases file geodatabases are used, as they provide fast access and required scalability for most applications. The main application for using enterprise geodatabases is in workflows where multiple users may be editing the mosaic dataset simultaneously. These workflows will generally assume file geodatabases are being used.

When working on small collections of files connected via a removable drive, many users store the mosaic dataset in a root directory of the data. The advantage of this is that should the drive letter or location of the files change, ArcGIS will still be able to access the imagery, as the system will look for files in a relative path to the mosaic dataset if the imagery cannot directly be accessed. For larger implementations and optimization, as a general rule, it is better to store the mosaic datasets in a separate set of directories dedicated for mosaic datasets, making it also simpler to optimize access to these small files and assist in backing up. By default, when overviews are created for mosaic datasets, they are stored in the same location as the mosaic datasets. In cases where these overviews are small, this is suitable, but when working with mosaic datasets that have large overviews, it is often advantageous to define the location of the overviews so that they are stored similarly to the imagery.

File geodatabases provide very good performance, but access by desktop or server can be very 'chatty' and large numbers of request are made to the file system. It is therefore advisable to store the mosaic datasets on a drive to which the server has fast access. Often it is better to ensure that the file geodatabase is on a faster direct access drive on the server, this is especially true when working with mosaic datasets that contain hundreds of thousands of records. One popular pattern is to store the mosaic dataset being used for authoring on a NAS or SAN and then prior to publishing copy it to the appropriate servers.

Naming of mosaic datasets

When using large numbers of different mosaic datasets, it is often useful to adhere to a standardized naming convention. These documented workflows will use the following prefixes:

S_xxx—Source mosaic dataset

D_xxx—Derived mosaic dataset

R_xxx—Referenced mosaic dataset

Number of mosaic datasets per geodatabase

A single geodatabase may contain a large number of different mosaic datasets. For file geodatabases, there is no significant advantage to storing multiple mosaic datasets in a single geodatabase. Typically, either a separate geodatabase is used for each mosaic dataset or a small group of related mosaic datasets that define a project are stored in a single geodatabase, making backup and restoring simpler.