Available in big data analytics.
The Summarize Within tool calculates statistics in areas where an input layer is within or overlaps a boundary layer. The area being summarized can be an area layer or a hexagonal or square bin.
Workflow diagram
Examples
The following are example uses of the Summarize Within tool:
- A cable provider is starting a pilot program that provides low-cost internet access to low-income community college students. Using Summarize Within by bins can be used to determine the number of low-income students within square bins of a defined size so the cable provider can determine an appropriate region for its pilot program.
- To complete routine maintenance projects efficiently, the city uses the Summarize Within tool to count the street lights and to sum the miles of bike lanes within each maintenance assessment district. It can then estimate the material and staff needed to complete the work in each district.
Usage notes
Keep the following in mind when working with the Summarize Within tool:
- The input layer to be summarized can be a point, line, or polygon layer.
- The output layer is always a polygon area or bin layer, and only the area or bin features where summarized features occur are returned.
- You can think of summarize within as taking two layers, the area features and the input summary features, and stacking them on top of each other. After stacking these layers, you view down through the stack and count the number of input summary features that fall within the areas. In addition to the number of features, you can also calculate simple statistics about the attributes of the input summary features, such as sum, mean, minimum, maximum, and so on.
- You can use the Summarize Within tool to calculate standard statistics and geographically weighted statistics. Standard statistics summarize the statistical values without weighting. Weighted statistics calculate values using the geographically weighted values of the proportion of lines within a polygon, or proportion of polygons within a polygon. Weighted statistics do not apply to points within polygons.
How the Summarize Within tool works
The following describe how the Summarize Within tool works.
Equations
For summarized line and area features, weighted statistics incorporate Summary Area weights. None of the statistics for point features are weighted. The following table shows the equations used to calculate variance, the weighted mean, and the weighted standard deviation.
Statistic | Equation | Variables | Features |
---|---|---|---|
Variance | Points | ||
Weighted Mean | Weights are calculated as the percentage of the feature within the summary area. | Lines and Areas | |
Weighted Standard Deviation | Weights are calculated as the percentage of the feature within the summary area. | Lines and Areas |
Points
Point layers are summarized using only the point features that fall within the Summary Area. Weighted statistics cannot be applied when summarizing points.
The figure and table below explain the statistical calculations of a point Summarized Layer within hypothetical areas. The Population field was used to calculate the statistics (Count, Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer.
Numeric statistic | Results District A |
---|---|
Count | Count of:
|
Sum |
|
Minimum | Minimum of:
|
Maximum | Maximum of:
|
Range |
|
Mean |
|
Variance |
|
Standard Deviation |
|
String statistic | Results District A |
---|---|
Count |
|
Any | = Secondary School |
Note:
The count statistic (for strings and numeric fields) counts the number of nonnull values. For example, the count of [0, 1, 10, 5, null, 6] is 5. The count of [Primary, Primary, Secondary, null] is 3.
A real-life scenario in which this analysis could be used is in determining the total number of students in each school district. Each point represents a school. The Type field gives the type of school (elementary, middle school, or secondary) and a student population field gives the number of students enrolled at each school. The calculations and results are given for District A in the table above. From the results, you can see that District A has 2,568 students. When running the Summarize Within tool, the results would also be given for District B.
Lines
For weighted statistics, line layers are summarized using only the proportions of line features that are within the Summary Area. Standard (non-weighted) statistics summarize any line intersecting the Summary Area. When summarizing lines using weighted statistics, use counts and amounts (rather than rates or indices) so proportional calculations make logical sense in your analysis.
The figure and table below explain the statistical calculations of a line Summarized Layer within a hypothetical Summary Area. The Volume field was used to calculate the statistics (Count, Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The standard statistics are calculated using lines that intersect the boundary and the weighted statistics are calculated using the proportion of the lines that are within the Summary Area.
Numeric statistics | Standard statistics | Weighted statistics |
---|---|---|
Calculating Weights | Not applicable | Weight of the brown line (value = 600):
Weight of the blue line (value = 1000):
|
Count | Count of:
| Count of:
|
Sum |
|
|
Minimum | Minimum of:
| Minimum of:
|
Maximum | Maximum of:
| Maximum of:
|
Range |
|
|
Mean |
|
|
Variance |
|
|
Standard Deviation |
|
|
A real-life scenario in which this analysis could be used is in determining the total volume of water in rivers within the boundaries of a state park. Each line represents a river that is partially located inside the park. From the results, you can see that there are 5 miles of rivers within the park and the total volume is 900 units.
Areas
Area layers are summarized using only the proportions of the area features that are within the input boundary. When summarizing areas, use fields with absolute numbers so proportional calculations make logical sense in the analysis.
Weighted statistics for summarized area layers are based on the proportions of the Summary Area features that are within the Summarized Layer. When summarizing areas, use counts or amounts (rather than rates or indices) so proportional calculations make logical sense in your analysis.
The figure and table below explain the statistical calculations of an area layer within a hypothetical Summary Area. The population field was used to calculate the statistics (Count, Sum, Minimum, Maximum, Range, Mean, Standard Deviation, and Variance) for the layer. The standard statistics are calculated using areas that intersect the Summary Area, and the weighted statistics are calculated using a proportional weight based on the portion of summary areas contained within each Summarized Layer.
Numeric statistics | Standard statistics: Results Neighborhood 1 | Weighted statistics: Results Neighborhood 1 |
---|---|---|
Calculating Weights | Weight of the yellow area (value = 3200):
Weight of the green area (value = 4700):
Weight of the pink area (value = 1000):
Weight of the blue area (value = 4500):
Weight of the orange area (value = 3600):
| |
Count | Count of:
| Count of:
|
Sum |
|
|
Minimum | Minimum of:
| Minimum of:
|
Maximum | Maximum of:
| Maximum of:
|
Range |
|
|
Mean |
|
|
Variance |
|
|
Standard Deviation |
|
|
Parameters
The following are the parameters for the Summarize Within tool:
Parameter | Description | Data type |
---|---|---|
Input Layer | The point, line, or polygon features that will be summarized within area features. | Features |
Bin Type | The bin shape that will be used to create the regular bins. Options are Square and Hexagon. If a polygon source is connected to the join port of this tool, this parameter will no longer appear or be required. | String |
Bin Size | The distance interval that represents the bin size into which the input points will be aggregated. For square bins, the bin size represents the height of a square. This is the default. For hexagonal bins, the bin size represents the height between two parallel sides. If a polygon source is connected to the join port of this tool, this parameter will no longer appear or be required. | String |
Summarize Shapes | Specifies whether shape information will be summarized as part of the analysis (length of lines or area of polygons). If the input summary features are points, there is no shape information to summarize. Only the count of points within each area feature is added. | Boolean |
Shape Units | The unit in which to calculate shape summary attributes. If the input summary features are lines, specify a linear unit. If the input summary features are polygons, specify an areal unit. | String |
Summary Fields | The statistics that will be calculated for specified fields. Different statistics are available depending on whether the specified field is a string, numeric, or date field.
| String |
Weighted Statistics | The geographically weighted statistics that will be calculated for specified fields. Weighted statistics calculate values using the geographically weighted values of the proportion of lines within a polygon, or proportion of polygons within a polygon. Weighted statistics do not apply to points within polygons. Different statistics are available depending on whether the specified field is a string, numeric, or date field.
| String |
Output layer
The output layer will contain the following fields in place of the original fields. If you configured summary fields, those fields will also be calculated for the output layer.
Field name | Description | Field type |
---|---|---|
COUNT | The number of features from the input layer that were summarized into this polygon bin. | Float64 |
sum_length_<units> | If the input layer is a polyline feature, and the Summarize Shapes parameter is set to Yes, the output will generate this field that reports the total length of polyline features within each bin, in the units specified by the Shape Units parameter. | Float64 |
sum_area_<units> | If the input layer is a polygon feature, and the Summarize Shapes parameter is set to Yes, the output will generate this field that reports the total area of polygon features within each bin, in the units specified by the Shape Units parameter. | Float64 |
Considerations and limitations
Lines and areas are summarized using proportions; therefore, it is best to summarize absolute data (such as population) rather than relative data (such as average income) when lines or areas are being summarized.