The Focal Statistics tool performs an operation that calculates a statistic for input cells within a set of overlapping windows or neighborhoods. The statistic (for example, mean, maximum, or sum) is calculated for all input cells contained within each neighborhood.
Neighborhood processing
Conceptually, the algorithm visits each cell in the input raster and calculates a statistic for the cells that fall in the specified neighborhood shape around it. The cell for which the statistic is being calculated is referred to as the processing cell. The value of the processing cell is typically included in the neighborhood statistics calculation, but depending on the shape of the neighborhood, it might not be. Since neighborhoods will overlap in the scan process, input cells that are included in the calculation for one processing cell may also contribute in the calculation for another processing cell.
Several predefined neighborhood shapes available to choose from. You can also create a custom shape. The statistics that you can calculate for a neighborhood are majority, maximum, mean, median, minimum, minority, percentile, range, standard deviation, sum, and variety.
Example calculation
To illustrate the neighborhood processing for Focal Statistics, consider calculating a Sum statistic of the neighborhood around the processing cell with the value of 5 in the following diagram. A rectangular 3 by 3 cell neighborhood shape is specified and the Ignore NoData in calculations parameter is left at the default checked setting. The sum of the values of the neighboring cells (3 + 2 + 3 + 4 + 2 + 1 + 4 = 19) plus the value of the processing cell (5) equals 24 (19 + 5 = 24). A value of 24 is given to the cell in the output raster in the same location as the processing cell in the input raster.
The above diagram demonstrates how the calculations are performed on a single cell in the input raster. In the following diagram, the results for all the input cells are shown. The cells highlighted in yellow identify the same processing cell and neighborhood as in the example above.
NoData cells
The Ignore NoData in calculations parameter controls how NoData cells within the neighborhood window are processed. When this parameter is checked (ignore_nodata = "DATA" in Python), any cells in the neighborhood that are NoData will be ignored in the calculation of the output value for the processing cell. When unchecked (ignore_nodata = "NODATA" in Python), if any cell in the neighborhood is NoData, the output value for the processing cell will be NoData.
If the processing cell itself is NoData, with the Ignore NoData in calculations option selected, the output value for the cell will be calculated based on the other cells in the neighborhood that have a valid value. If all of the cells in the neighborhood are NoData, the output will be NoData.
Corner and edge cells
When the processing cell is near the corners and edges of the input raster, the number of cells that are included in the neighbourhood is adjusted accordingly. The calculation of the statistic is also adjusted.
The following diagrams illustrate how the output statistic is calculated for each processing cell from the available cells in each individual neighborhood. The process starts at the upper left corner of the input raster and scans from left to right across each row before proceeding to the next row. The neighborhood used in this example is a 3 by 3 cell rectangle, and the statistic used is sum. The Ignore NoData in calculations parameter is left at the default checked setting. In the diagrams, the neighborhood is outlined in yellow, and the processing cell is outlined in cyan.
For the first processing cell, because it is at upper left corner of the input raster 6 by 6 cell raster, there are only four cells available to be in the neighborhood. Adding those values together results in the output value for the first cell being assigned a value of 11. For the next cell to the right, there are now six cells in the neighborhood, and the sum is calculated for those. The scan proceeds across all the cells in the first row. To save space, not all the processing cells are shown.
Note that in the first row, for the third processing cell from the left (value = 1), one of the input cells has a value of NoData. Because the tool was set to ignore NoData, that particular cell will be ignored in the calculations. If the statistic to be calculated had been set to Mean instead of Sum, it would be calculated as the sum of all the cells in the neighborhood that are not NoData, divided by 5.
For the second row of input cells, the statistic for the first processing cell will be calculated based on having six cells available in the neighborhood. For the next processing cell, there will be nine cells to consider in the calculation. For the subsequent cell, there will be eight input values to calculate with, since one of the cells in the 3 by 3 neighborhood is NoData. The process continues for the rest of the cells in the row and then on to the following rows until all the processing cells have been analyzed.
Neighborhood size and performance
The tool can process very large neighborhoods. However, as the neighborhood increases in size, the performance will be impacted since more input cells will be included in each calculation. The rectangle neighborhood type has some optimizations that allow for increased performance relative to other neighbourhood shapes for a given area.
The maximum size of any dimension of a neighborhood is limited to 4,096 cells. This means that rectangular neighborhoods cannot exceed this number of cells in either the horizontal or vertical direction. For circular neighborhoods, the radius cannot exceed 2,047 cells.
Neighborhood types
The shape of a neighborhood can be an annulus (a donut), a circle, a rectangle, or a wedge. Using a kernel file, you can also define a custom neighborhood shape, as well as assign different weights to specific cells in the neighborhood before the statistic is calculated.
Following are descriptions of the neighborhood shapes and how they are defined:
- Annulus
- The annulus shape is composed of two circles, one inside the other to make a donut shape. Cells with centers that fall outside the radius of the smaller circle but inside the radius of the larger circle will be included in processing the neighborhood. The area that falls between the two circles constitutes the annulus neighborhood.
- The radius is identified in cells or map units, measured perpendicular to the x- or y-axis. When the radii are specified in map units, they are converted to radii in cell units. The resulting radii in cell units produce an area that most closely represents the area calculated using the original radii in map units. Any cell center encompassed by the annulus will be included in the processing of the neighborhood.
- The default annulus neighborhood is an inner radius of one cell and an outer radius of three cells.
- An example illustration of an annulus neighborhood follows:
- Circle
- A circle neighborhood is created by specifying a radius value.
- The radius is identified in cell or map units, measured perpendicular to the x- or y-axis. When the radius is specified in map units, additional logic is used to determine which cells are included in the processing neighborhood. First, the exact area of a circle defined by the specified radius value is calculated. Next, the area is calculated for two additional circles, one with the specified radius value rounded down and one with the specified radius value rounded up. These two areas are compared to the result from the specified radius, and the radius of the area that is closest will be used in the operation.
- The default circle neighborhood radius is three cells.
- An example illustration of a circle neighborhood follows:
- Rectangle
- The rectangle neighborhood is specified by providing a width and a height in either cells or map units.
- Only the cells with centers that fall within the defined object are processed as part of the rectangle neighborhood.
- The default rectangle neighborhood is a square with a height and width of three cells.
- The x,y position for the processing cell within the neighborhood, with respect to the upper left corner of the neighborhood, is determined by the following equations:
x = (width of the neighborhood + 1)/2 y = (height of the neighborhood + 1)/2
If the input number of cells is even, the x,y coordinates are computed using truncation. For example, in a 5 by 5 cell neighborhood, the x- and y-values are 3,3. In a 4 by 4 neighborhood, the x- and y-values are 2,2.
- The following are example illustrations of two rectangle neighborhoods:
- Wedge
- A wedge is a pie-shaped neighborhood specified by a radius, a starting angle, and an ending angle.
- The wedge extends counterclockwise from the starting angle to the ending angle. Angles are specified in arithmetic degrees from 0 to 360, where 0 is on the positive x-axis (3:00 on a clock), and can be integer or floating point. Negative angles can be used.
- The radius is identified in cells or map units, measured perpendicular to the x- or y-axis. When the radius is specified in map units, it is converted to a radius in cell units. The resulting radius in cell units produces an area that most closely represents the area calculated using the original radius in map units. Any cell center encompassed by the wedge will be included in the processing of the neighborhood.
- The default wedge neighborhood is from 0 to 90 degrees, with a radius of three cells.
- An example illustration of a wedge neighborhood follows:
- Irregular
- This allows you to specify an irregularly shaped neighborhood around the processing cell.
- The irregular kernel file specifies the cell positions to be included within the neighborhood.
- The x,y position for the processing cell within the neighborhood, with respect to the upper left corner of the neighborhood, is determined by the following equations:
x = (width + 1)/2 y = (height + 1)/2
If the input number of cells is even, the x- and y-coordinates are computed using truncation.
- The following apply to a kernel file for an irregular neighborhood:
- The irregular kernel file is an ASCII text file that defines the values and shape of an irregular neighborhood. The file can be created with any plain text editor. It must have a .txt file extension and no spaces in the file name.
- The first line specifies the width and height of the neighborhood (the number of cells in the x direction, followed by a space, and the number of cells in the y direction).
- The subsequent lines define the value to use for each position in the neighborhood they represent. A space between each value is necessary.
- The values define whether a position in the neighborhood will be included in the calculation. Typically, the value 1 is used to identify the positions to include in the calculations for an irregular neighbourhood, but any positive or negative value other than 0 can be used. Floating point values can also be used.
- To exclude a location in the neighborhood from the calculation, use a value of 0 (not a blank space) at the corresponding location in the kernel file.
- The following example shows the contents of an irregular kernel file and the neighborhood it represents:
- Weight
- Similar to the irregular neighborhood type, the weight neighborhood allows you to define an irregular neighborhood around the processing cell but also allows you to apply weights to the input values.
- The weight kernel file specifies the cell positions to include within the neighborhood and the weights by which they will be multiplied.
- The weight neighborhood is only available for the mean, standard deviation, and sum statistics types.
- The x,y position for the processing cell within the neighborhood, with respect to the upper left corner of the neighborhood, is determined by the following equations:
x = (width + 1)/2 y = (height + 1)/2
If the input number of cells is even, the x- and y-coordinates are computed using truncation.
- The following apply for the kernel file for a weight neighborhood:
- The weight kernel file is an ASCII text file that defines the values and shape of a weight neighborhood. The file can be created with any plain text editor. It must have a .txt file extension and no spaces in the file name.
- The first line specifies the width and height of the neighborhood (the number of cells in the x direction, followed by a space, and the number of cells in the y direction).
- The subsequent lines define the value to use for each position in the neighborhood they represent. A space between each value is necessary.
- For the sum statistic, a weight can be any positive, negative, integer, or floating point value.
- For the mean and standard deviation statistics, a weight can be any positive integer or floating point value. Negative values are not allowed for these statistics, so any position with a negative weight will be ignored in the calculations.
- To exclude a location in the neighborhood from the calculation, use a value of 0 (not a blank space) at the corresponding location in the kernel file.
- The following example shows the contents of a weight kernel file and the neighborhood it represents:
Statistics types
The available statistics are majority, maximum, mean, median, minimum, minority, percentile, range, standard deviation, sum, and variety. The default statistics type is mean.
Certain statistics types are only available when the input raster is of integer type.
- Majority
- Only an integer raster can be used as input.
- The frequency of each unique cell value in each neighborhood is determined. If there is a single value that has the highest frequency (occurs the most often), that value is assigned to all cells in that neighborhood. If there is a tie, the lowest of the tied values is assigned, unless the value of the processing cell is one of the ties. In that case, the original value of the processing cell value is returned.
- Maximum
- If the input raster is integer, the values on the output raster will be integer; if the values on the input are floating point, the values on the output will be floating point.
- Mean
- The input can be an integer or a float raster.
- The output raster will always be floating point.
- For the weight neighborhood type, this is one of the subset of statistics types that is supported. See the Weighted neighborhood section for details on how this statistic is calculated.
- Median
- The input can be an integer or a float raster.
- The output raster will always be floating point.
- If there is an odd number of valid cell values in the neighborhood, the median value is calculated by ranking the values and selecting the middle value. If there is an even number of values in a neighborhood, the values will be ranked and the middle two values will be averaged.
- Minimum
- If the input raster is integer, the values on the output raster will be integer; if the values on the input are floating point, the values on the output will be floating point.
- Minority
- Only an integer raster can be used as input.
- The frequency of each unique cell value in each neighborhood is determined. If there is a single value that has the lowest frequency (occurs the least often), that value is assigned to all cells in that neighborhood. If there is a tie, the lowest of the tied values is assigned, unless the value of the processing cell is one of the ties. In that case, the original value of the processing cell value is returned.
- Percentile
- The input can be an integer or a float raster.
- The output raster will always be floating point.
- The result for the percentile statistic is calculated based on the following formula (Hyndman and Fan, 1996):
pk = (k-1)/(n-1)
- Range
- If the input raster is integer, the values on the output raster will be integer; if the values on the input are floating point, the values on the output will be floating point.
- The values for each cell location on the output raster are determined on a cell-by-cell basis by applying the formula: Focal Range = Focal Maximum – Focal Minimum
- Standard deviation
- The output raster will always be floating point.
- For the weight neighborhood type, this is one of the subset of statistics types that is supported. See the Weighted neighborhood section for details on how this statistic is calculated.
- The standard deviation is calculated on the entire population (the N method); it is not estimated based on a sample (the N-1 method).
- Sum
- If the input raster is integer, the values on the output raster will be integer; if the values on the input are floating point, the values on the output will be floating point.
- For the weight neighborhood type, this is one of the subset of statistics types that is supported. See the Weighted neighborhood section for details on how this statistic is calculated.
- Variety
- Only an integer raster can be used as input.
Weighted neighborhood calculations
The amount of influence that each value in the neighborhood has on the final result for the processing cell can be adjusted by applying weights.
In the following sections, the formulas used to calculate the results for the weighted mean, standard deviation, and sum statistics are shown. An example accompanies each, showing the calculations for a processing cell and the results for a 3 x 3 cell rectangle neighborhood.
Weighted mean statistic
For the weight neighborhood with the mean statistic, the output value for a central processing cell is the sum of the product of the kernel weight values multiplied by the input values, divided by the sum of the kernel weight values.
The formula applied to the cells within a neighborhood is as follows:
Where:
- µW is the population weighted mean value for the processing cell.
- N is the number of cells in the neighborhood.
- wi is a weight value defined in the kernel.
- xi is an input cell value.
Legacy:
In previous releases, the calculations used the number of cells in the neighborhood as the denominator.
Weight values must be positive values and can be integer or floating point.
Example
Consider the following processing cell of value 7 and its eight surrounding neighbors:
4 6 7
6 7 8
4 5 6
The mathematical average (sum / count) of these values is 53 / 9 = 5.889.
Consider the following 3 x 3 weighted cell kernel:
3 3
0.0 0.5 0.0
0.5 2.0 0.5
0.0 0.5 0.0
This kernel gives the highest degree of influence to the center cell (weight of 2), lessens the influence of the four orthogonal neighbors to the processing cell (weight of 0.5), and makes the four corner cells have no influence (weight of 0).
Applying the weighted mean equation provided above, the following shows the calculations for achieving the final value.
= (w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6 + w7x7 + w8x8 + w9x9) /
(w1 + w2 + w3 + w4 + w5 + w6 + w7 + w8 + w9)
= ((0*4)+(0.5*6)+(0*7)+(0.5*6)+(2.0*7)+(0.5*8)+(0*4)+(0.5*5)+(0*6)) /
(0+0.5 + 0 + 0.5 + 2.0 + 0.5 + 0 + 0.5 + 0)
= (0 + 3.0 + 0 + 3.0 + 14.0 + 4.0 + 0 + 2.5 + 0) /
(0.5 + 0.5 + 2.0 + 0.5 + 0.5)
= (3.0 + 3.0 + 14.0 + 4.0 + 2.5) / 4.0
= 26.5 / 4.0
= 6.625
For comparison, the regular average of the nine input cells would be 5.889. If only the five input cells that are within the kernel (where the weight != 0) are included but with the values of the weights not considered, the average would be 6.4 (6 + 6 + 7 + 8 + 5 = 32, which when be divided by the count of five).
Weighted standard deviation statistic
For the weight neighborhood with the standard deviation statistic, the output value for a processing cell is the result of the following equation:
Where:
- SDW is the population weighted standard deviation value for the processing cell.
- µW is the population weighted mean value for the processing cell.
- N is the number of cells in the neighborhood.
- wi is a weight value defined in the kernel.
- xi is an input cell value.
Weight values must be positive values and can be integer or floating point.
If all the input values in a neighborhood are the same, the standard deviation value for that processing cell will be 0.
Example
The same neighborhood values that were used in the weighted mean example above will be used again for this example.
4 6 7
6 7 8
4 5 6
The same weighted kernel values will also be used:
3 3
0.0 0.5 0.0
0.5 2.0 0.5
0.0 0.5 0.0
Applying the weighted standard deviation equation provided above for the central processing cell of value 7, the result of the weighted standard deviation calculation is approximately 0.85696.
Weighted sum statistic
For the weight neighborhood with the sum statistic, the output value for a processing cell is the result of the following equation:
Where:
- SW is the weighted sum value for the processing cell.
- N is the number of cells in the neighborhood.
- wi is a weight value defined in the kernel.
- xi is an input cell value.
Weight values can be positive or negative values and can be integer or floating point.
Example
Consider the following neighborhood input values:
4 6 7
6 7 8
4 5 6
Consider the following 3 x 3 weighted cell kernel:
3 3
-1 -2 -1
0 0 0
1 2 1
Applying the equation provided above, the following shows the calculationsused to achieve the final value:
= (w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6 + w7x7 + w8x8 + w9x9)
= ((-1*4) + (-2*6) + (-1*7) + (0*6) + (0*7) + (0*8) + (1*4) + (2*5) + (1*6))
= (-4) + (-12) + (-7) + 4 + 10 + 6
= -3
References
- Hyndman, R.J. and Y. Fan, November 1996. "Sample Quantiles in Statistical Packages." The American Statistician 50 (4): 361-365.