Spatial Aggregation calculates statistics in areas where an input layer overlaps a boundary layer.
Example
A business analyst for a consortium of colleges is doing research for a marketing campaign in states with high-value colleges and wants to know which state has the most colleges with a high return on investment (ROI). Spatial Aggregation can be used to aggregate the colleges into states to find the number of colleges with above average ROI.
Run Spatial Aggregation
Spatial Aggregation can be run on maps with two layers: one area layer with the boundaries that will be used for aggregation (for example, counties, census tracts, or police districts) and one layer to aggregate.
Complete the following steps to calculate spatial statistics:
- Click the map card to activate it if necessary.
A card is active when the toolbar and Action button appear.
- Click the Action button and choose Spatial Aggregation.
- For Choose area layer, select the boundary layer, and for Choose layer to summarize, select the layer to aggregate.
- For Style by, select the field or statistic to calculate and display.
- Optionally, use Additional options to select additional fields and statistics.
- Click Run.
Tip:
You can also run Spatial Aggregation by dragging a dataset onto the Spatial aggregation drop zone on an existing map.
Usage notes
Use the Choose area layer and Choose layer to summarize parameters to select the boundary layer and the layer that will be summarized. For the Choose area layer parameter, only layers with area features are available.
Use the Style by parameter to change the statistic being calculated. The default statistic depends on the type of layer being summarized. Use the drop-down menu to select a different style option. The following table summarizes the Style by options for each layer type:
Summary layer type | Default style option | Other style options |
---|---|---|
Point | Count | Number or rate/ratio field (sum, minimum, maximum, average, or mode) String fields (mode) |
Line | Number (sum) or rate/ratio (average) field | Number or rate/ratio field (sum, minimum, maximum, average, or mode) String fields (mode) Sum of length (meters, kilometers, feet, or miles) |
Area | Number (sum) or rate/ratio (average) field | Number or rate/ratio field (sum, minimum, maximum, average, or mode) String fields (mode) Sum of area (square meters, square kilometers, square feet, or square miles) |
Note:
A best practice is to use numbers rather than rates or ratios when calculating statistics for lines and areas so that the proportional calculations make logical sense. For more information, see the How Spatial Aggregation works section below.
You can expand the Additional options parameter and assign additional statistics. Each time a field is added to the list of summary statistics, a new field appears below it.
Limitations
When you perform spatial aggregation or spatial filtering on data from the same database connection, you must ensure that all the data is stored in the same spatial reference system. For datasets from SQL Server, the data must also have the same data type (geography or geometry).
The following limitations apply for Google BigQuery, Snowflake, and database platforms that are not supported out of the box:
- Spatial Aggregation using line and area features, as the Choose a layer to summarize parameter is not supported for read-only connections.
- Both input layers must be from the same database connection.
Google BigQuery does not support mode calculations.
How Spatial Aggregation works
Average statistics are calculated using weighted mean for line and area features. The following equation is used to calculate weighted mean:
where:
N = number of observations
xi = observations
Wi = weights
Points
Point layers are summarized using only the point features within the input boundary. None of the calculations are weighted.
The figure and tables below explain the statistical calculations of a point layer within a hypothetical boundary. The Population field was used to calculate the numeric statistics (count, sum, minimum, maximum, and average) and the Type field was used for mode.
Field | Statistic | Result District A | Result District B |
---|---|---|---|
Population | Count | 6 | 6 |
Sum |
|
| |
Minimum | Minimum of:
| Minimum of:
| |
Maximum | Maximum of:
| Maximum of:
| |
Average |
|
| |
Type | Mode | Primary School | Primary School |
A real-life scenario in which this analysis could be used is determining the total number of students in each school district. Each point represents a school. The Type field displays the type of school (elementary, middle school, or secondary) and the Population field displays the number of students enrolled at each school. The calculations and results are shown in the table above. The results show that District A has 2,568 students and District B has 3,400 students.
Lines
Line layers are summarized numerically using only the proportions of the line features that are within the input boundary. When summarizing lines, use fields with counts and amounts rather than rates or ratios so proportional calculations make logical sense in the analysis. The results are displayed using graduated symbols.
The mode for line layers is based on the count of features that intersect the boundary. Lines do not need to be completely contained within a boundary to be counted toward the mode, and each line is counted as one feature, regardless of the proportion that is contained within the boundary. The results are displayed using unique symbols.
The figure and table below show the statistical calculations of a line layer within a hypothetical boundary. The volume was used to calculate the statistics (sum, minimum, maximum, and average) for the layer. The statistics are calculated using only the proportion of the lines that are within the boundary. The mode is calculated for the type of water feature.
Statistic | Field | Result |
---|---|---|
Sum of length | Length |
Note:Length can also be calculated in feet, meters, and kilometers. |
Sum | Volume |
|
Minimum | Minimum of:
| |
Maximum | Maximum of:
| |
Average |
| |
Mode | Type | River |
A real-life scenario in which this analysis could be used is determining the total volume of water in rivers within the boundaries of a state park. Each line represents a river that is partially located inside the park. The results show that there are 6.5 miles of rivers within the park and the total volume is 1,200 units.
Areas
Area layers are summarized using only the proportions of the area features that are within the input boundary. When summarizing areas, use fields with counts and amounts rather than rates or ratios so proportional calculations make logical sense in the analysis. The results are displayed using graduated colors.
The mode for area layers is based on the count of features that intersect the boundary. Areas do not need to be completely contained within a boundary to be counted toward the mode, and each area is counted as one feature, regardless of the proportion that is contained within the boundary. The results are displayed using unique symbols.
The figure and table below show the statistical calculations of an area layer within a hypothetical boundary. The populations were used to calculate the statistics (sum, minimum, maximum, and average) for the layer. The statistics are calculated using only the proportion of the area that is within the boundary. The mode is calculated using the tapestry segment designation for each area.
Statistic | Field | Result |
---|---|---|
Sum of area | Area |
Note:Area can also be calculated in square feet, square meters, and square kilometers. |
Sum | Population |
|
Minimum | Minimum of:
| |
Maximum | Maximum of:
| |
Average |
| |
Mode | Segment | Segment 2 |
A real-life scenario in which this analysis could be used is determining the population in a city neighborhood. The blue outline represents the boundary of the neighborhood, and the smaller areas represent census blocks. The results show that there are 10,841 people in the neighborhood and an average of approximately 2,666 people per census block.