How the Locate Regions tool works

The Locate Regions tool identifies the best regions in the input raster that meet specified size requirements and spatial constraints. Regions are groups of contiguous cells of the same value. Some of the requirements and constraints that can be defined in this tool include: the total area to select, the number of regions that the total area should be distributed between, the shape of the desired regions, and the minimum and maximum distances between the regions.

Locate Regions is often used in conjunction with the Optimal Region Connections tool to select and then join the best available regions in the most efficient way. To do this analysis, you first need a suitability surface, which you can create by employing other tools in this toolset. Next, use Locate Regions to identify the best regions available. Finally, use Optimal Region Connections to determine the least-cost network of paths between the regions. For more information on how to create a suitability model see Understanding overlay analysis.

Example problems solved by Locate Regions

Using a surface created from a suitability model, you can identify the best regions for the following:

  • The most preferred deer habitat to conserve. Eight habitat patches (regions) are needed to maintain a viable population and each region must be approximately 50 contiguous acres. To support breeding opportunities within the herd, the regions should be close enough to one another so that they can be feasibly connected via wildlife corridors.
  • The best locations to extract timber for a logging operation. To be financially viable, the areas (regions) to be logged must be at least 250 contiguous acres and each region must be within one mile from another.
  • The ideal location for a new shopping center. The shopping center requires the best 60 acres, however, for construction purposes, the area need to be contiguous and the shape of the building site (region) should be as compact as possible.

Grouping cells into regions

There are six primary ways by which to create regions from the individual cells in the suitability raster.

  • Cells will be grouped into a single region.
  • Cells will be grouped into a specified number of regions of equal area.
  • Cells will be grouped into a specified number of regions of equal area while honoring specified distance constraints among the regions.
  • Cells will be grouped into a specified number of regions of varying size, controlled by the defined minimum and maximum area requirements for the regions.
  • Cells will be grouped into a specified number of regions of varying size, controlled by the defined minimum and maximum area requirements for the regions, and such that no two regions can be within an identified minimum or farther than a maximum distance.
  • Same as the previous option, but pre-existing regions that were already allocated in the study area must be considered in the selection process.

General algorithm for Locate Regions

The Locate Regions takes as input a raster in which the higher values represent a higher degree of utility. From that raster, the tool selects the best regions that meet the specified region requirements and spatial constraints.

Locating regions with the this tool is a four step process. The four general steps are listed below with detailed descriptions to follow:

  1. Eliminate locations that are considered unsuitable for the selection process. Examples of locations typically include those within water bodies, existing buildings, and areas that are too steep. This is a preprocessing step.
  2. Define the characteristics of the desired region or regions. Examples of these characteristics include their size, shape, and orientation. This step is achieved by setting parameters in the tool.
  3. Identify all candidate regions from the input raster based on the user defined tradeoff between maintaining a region's shape while maximizing utility. This step is accomplished through the region growth algorithm implemented by the tool.
  4. Select the best region or regions from the candidate regions using a user defined evaluation criterion. For example, select only the regions with the highest average value. This step is carried out within the tool by applying a selection algorithm using the specified evaluation method.

The primary algorithm for identifying the candidate regions uses a parameterized region-growing (PRG) technique which treats each identified cell as a potential seed from which a region will grow. Selecting which contiguous cells will be added to a region is based on the evaluation of the tradeoff between the cells' contribution to maintaining the desired shape of the region relative to the utility (suitability) of the cells' attribute value. The higher the attribute value, the greater the utility. Potential candidate regions will continue to grow until the specified area requirements for the region are met. This growth process is performed for each seed. Each resulting region is considered a candidate option and at this stage there will be many overlapping candidate regions. No cells are allocated in this step and a cell can be a member of multiple candidate regions.

To select the best region or regions, a selection algorithm evaluates each candidate region identified by the PRG technique for the most ideal configuration based on the following preferences:

  • The specified Evaluation method criterion, such as the highest average value, the highest sum, or the greatest amount of edge.
  • The inter-region evaluation criterion, as defined by the Maximum Distance and Minimum Distance parameters.

When multiple regions are desired, the Selection method gives you additional control over how the best regions are selected. These are COMBINATORIAL and SEQUENTIAL.

  • If the COMBINATORIAL method is selected, all possible number of combinations of the desired number of regions will be evaluated. For example, with this method, if the Number of regions is set to eight, and the potential number of regions created from the PRG is 150,000, then all combinations of eight regions available in the 150,000 candidate regions will be tested to identify the optimum eight regions based on the Evaluation method and the spatial constraints. It is possible the single best region will not be selected if it is not part of the optimum combination of eight regions.
  • If the SEQUENTIAL method is selected, the first region selected will be the best region based on the Evaluation method and that meets the spatial constraints. The second region selected will be the next best region based on the evaluation method and that meets the spatial constraints relative the first selected region. This process continues until the Number of regions is met.

Candidate regions may overlap, however, a cell can only be allocated to one region. Once a region is selected, any remaining candidate regions that include an allocated cell will no longer be considered in the selection process of subsequent regions. The other cells within that candidate regions will still be considered for other candidate regions.

How seeds are distributed

To reduce processing time, instead of growing regions from every available cell location within the input raster, candidate regions can be grown from certain identified cell locations known as seeds. The number of seeds from which to grow regions from can be controlled by the Number of seeds to grow from parameter.

The specified number of seeds are distributed throughout the raster based on the spatial distribution of the utility values within the input raster. That is, more seeds are located in areas of the input raster where the utility values are the highest. It is assumed that it is more likely that the best regions will be located in areas where the input raster utility values are the highest.

To identify the specific locations of the seeds, a distribution is created from all the input raster cells and their utility values. Cells with a high utility value will comprise a greater proportion of the distribution. A value is randomly selected from this distribution to identify the cell location where a seed should be located. Since cells with higher utility values represent a greater proportion of the distribution, it is more likely that these locations will be selected.

An additional adjustment is made to ensure the seeds are not too close to one another while making sure the distribution of the number of seeds in a given area is proportional to the total utility of the cells in that area.

Seed distribution example

For a simplified example, we have a four cell raster with utility values of 1, 2, 3, and 4. A distribution is created from the four values. The sum of the cells values here is 10. The values are then adjusted to a 0 to 1 scale. The cell with a utility value of 1 contributes 10 percent to the distribution (0 to 0.1 of the distribution), the cell with a value of 2 contributes 20 percent (.1 to .3 of the distribution), the cell with a value of 3 contributes 30 percent (0.3 to 0.6 of the distribution), and the cell with a value of 4 contributes 40 percent (0.6 to 1 of the distribution). A random value between 0 and 1 is selected. There is a 40 percent chance that the random value will fall between the 0.6 to 1 range of the distribution, which would mean placing a seed at the cell location assigned the value 4, the cell with the highest utility.

Adjusting the region growth resolution based on the size of the desired regions

In addition to using the Number of seeds to grow from parameter to reduce processing time, you can also improve performance by using the Resolution of the growth parameter. You can use the Resolution of the growth parameter to direct the PRG algorithm to grow on a coarser intermediate version of the input raster. In this case, once the desired regions are selected from the candidate regions using the intermediate raster, the resulting regions are resampled to the Cell Size to produce the final output raster. The resolution of the intermediate raster is determined by the number of cells associated with the specified Resolution of the growth.

To ensure there will be enough cells in each of the resulting regions and to reduce unnecessary processing, a second adjustment might occur to the resolution and total number of cells identified by each target Resolution of the growth for the intermediate raster. Based on the resolution determined from the specified Resolution of the growth, the number of cells in the average region size is identified. The average region size is calculated by dividing the desired total area by the specified number of regions. To ensure there are enough cells in each selected region, if there are too few cells in the average region size, the resolution of the intermediate raster is made finer (cell size is reduced, thus cell number is increased). To reduce unnecessary processing, if there are too many cells in the average region size, the resolution of the intermediate raster is coarsened.

The thresholds for determining if the number of cells in the average region size is too small or too large are based on the selected Resolution of the growth. For example, if the LOW resolution option is selected and the number of cells in the average region size is too low for reasonable results, for this selection less than 1,800 cells, the resolution of the intermediate raster will be set finer so that there will be at least 1,800 cells in the average region size. This ensures that there are enough cells to produce a reasonable region. Conversely, to reduce unnecessary processing, if there are more than 5,400 cells in the average region size, then the resolution of the intermediate raster for LOW resolution is coarsened until there are 5,400 cells in the average region size.

These same adjustments occur for the MEDIUM and HIGH selections of Resolution of the growth, but the thresholds vary. For MEDIUM resolution, the lower threshold for the average region size is 3,200 cells and the upper limit is 9,600 cells. For HIGH resolution, the lower threshold for the average region size is 7,200 cells and the upper limit is 21,600 cells.

As a result of this second adjustment, the total cells for the intermediate resampled raster on which the PRG will be performed for each of the specified Resolution of the growth can be lower or higher than the target number of cells.

How regions are determined when a minimum and maximum area are specified

When Region minimum area and Region maximum area are specified there will be too many combinations of regions to compare if every possible region size between the specified minimum and maximum size is considered from each seed. Therefore, from each seed, the algorithm defines the number of regions sized between the minimum and maximum that are created by the PRG process and considered in the COMBINATORIAL and SEQUENTIAL selection process to identify the best regions.

All region sizes are generated from the minimum, maximum, and average region sizes. To determine the average region size, the algorithm divides the total area by the number of specified regions. The average region size is the first region size that will be generated from each seed. Generally, the average region size will be closer to either the specified minimum or maximum area size. That is, it is the greater distance between the Abs(maximum - average) or Abs(minimum - average). This value will be referred to as LargerDist.

To calculate the step interval to define the region sizes falling between the average region size and the larger distance the following formula is used:

StepInterval = LargerDist/(N - 1)
  • where N is the number of specified regions.

Starting at the average region size, the StepInterval is added or subtracted sequentially until the larger distance value is reached. The same StepInterval is added or subtracted sequentially in the opposite direction until the smaller distance value is reached.

In this processing step, if the number of region sizes is less than 4, two additional sizes are added between each of the existing values. If the number of sizes is less than 7 but greater than 3, an additional size is added between each of the existing values. As a result, the minimum number of regions sizes that will be created from each seed is 7 and, depending on the number of regions specified, the maximum number of region sizes is 15.

Some examples showing the interaction of these parameters are available further down in this section.

When the Region minimum area and Region maximum area are specified, during either the COMBINATORIAL or the SEQUENTIAL selection process, each of the region sizes is considered for each seed as a candidate region and is tested in the selection process to identify the best regions.

If only a Region minimum area is specified and no Region maximum area is identified, the maximum area is determined from the minimum area size, the total area, and the number of regions specified. For example, the Region minimum area is set to 5 square miles, the total area to 50 square miles, and number of regions to 5. The maximum possible area is determined by assuming 4 of the regions are the size of the minimum area, in our example, 5 square miles, which totals 20 square miles. Thirty square miles remain, which is the greatest maximum area possible, thus it will be assigned. Similar logic is applied when only a Region maximum area is specified, but the minimum area must be greater than 0.

Example 1

In this example, the following parameters are set:

  • Total area is set to 300 square miles
  • Number of regions is set to 6
  • Region minimum area is set to 40 square miles
  • Region maximum area is set to 100 square miles

The first region size that will be created by the PRG is the average region size, which is determined by dividing the total area by the number of regions; 50 square miles (300/6). The LargerDist is 50 (LargerDist = Abs(100 - 50)). The StepInterval is 10 (StepInterval = 50/(6 - 1)).

To identify the second region size to create from each seed is determined by adding the StepInterval to the average region size (10 + 50), thus 60 square miles. Continuing to add the StepInterval of 10 to the average region size until the larger distance value is reached. This identifies the third, fourth, fifth, and sixth region sizes, which are 70, 80, 90, and 100 square miles. Finally, iteratively subtracting the StepInterval from the average region size until the smaller distance value is reached identifies the seventh region size to create; in this case, 40 square miles. In this example, the number of regions that will be created from each seed is 7; they are 40, 50, 60, 70, 80, 90, and 100 square miles.

Example 2

In this example, the following parameters are set:

  • Total area is set to 100 square miles
  • Number of regions is set to 4
  • Region minimum area is set to 10 square miles
  • Region maximum area is set to 60 square miles

The first region size that will be created by the PRG is the average region size which is determined by dividing the total area by the number of regions; 25 square miles.

The LargerDist is 35 square miles (Abs(60 - 25)). The StepInterval is 11.6667 (35/(4 - 1)). Iteratively adding 11.6667 to the average region size until the larger distance value is reached results in values of 36.6667, 48.3334 and 60. Subtracting the StepInterval from the average region size until the difference is equal to or goes below the minimum results in 13.3333. Thus far, the number of region sizes is 5; they are 13.3333, 25, 36.6667, 48.3334, and 60. Notice the minimum or maximum value that created the smaller distance is not guaranteed to be included in the region sizes (in this example, 13.3333 - 11.6667 = 1.6666, which is less than 10). Again, the minimum number of region sizes that will be created from each seed is 7, and the maximum number of region sizes is 15. Since 5 is less than the required minimum of 7, additional regions sizes are added between each of the 5 region sizes. In this example, the number of regions that will be created from each seed is 9; they are 13.3333, 19.1667, 25, 30.8334, 36.6667, 42.5001, 48.3334, 54.1667, and 60 square miles.

References

Brooks, Christopher J. 1997. A parameterized region growing programme for site allocation on raster suitability maps. International Journal of Geographic Information Science, 11:4, 375-396.

Brooks, Christopher J. 1997. A genetic algorithm for location optimal sites on raster suitability maps. Transactions in GIS, Vol. 2, No. 3, p 201-212.

Brooks, Christopher J. 1998. A genetic algorithm for designing optimal region configurations in GIS. A thesis submitted for the degree of Doctor of Philosophy, University College, University of London, London.

Brooks, Christopher J. 2001. A genetic algorithm for designing optimal patch configurations in GIS. International Journal of Geographic Information Science, Vol. 15, No. 6, 539-559.

Li, Xia and Anthony Gar-On Yeh, 2005. Integration of genetic algorithms and GIS for optimal location search, International Journal of Geographic Information Science, Vol. 19, No. 5, 581-601.