Summary statistics—ArcGIS Online

Summary statistics are calculated by the Aggregate Points, Summarize Within, Summarize Nearby, Join Features, and Dissolve Boundaries tools.

Equations

Mean and standard deviation are calculated using weighted mean and weighted standard deviation for line and polygon features. None of the statistics for point features are weighted. The weight is the length or area of the feature that falls within the boundary.

The following table shows the equations used to calculate standard deviation, weighted mean, and weighted standard deviation:


Statistic	Variables	Features
Standard Deviation	where: N = Number of observations x_i = Observations x̄ = Mean	Points
Weighted Mean	where: N = Number of observations x_i = Observations w_i = Weights	Lines and polygons
Weighted Standard Deviation	where: N = Number of observations x_i = Observations w_i = Weights x̄_w = Weighted mean N' = Number of non-zero weights	Lines and polygons

Note:

Null values are excluded from all statistical calculations. For example, the mean of 10, 5, and a null value is:

(10+5)/2=7.5

Points

Point layers are summarized using only the point features within the boundary areas.

A real-life scenario in which points could be summarized is in determining the total number of students in each school district. Each point represents a school. The Type field gives the type of school (primary school, middle school, or secondary school) and a population field gives the number of students enrolled at each school.

The figure below shows a hypothetical point and boundary layer, and the table summarizes the attributes for the point layer.


ObjectID	District	Type	Population
1	A	Primary school	280
2	A	Primary school	408
3	A	Primary school	356
4	A	Middle school	361
5	A	Middle school	450
6	A	Secondary school	713
7	B	Primary school	370
8	B	Primary school	422
9	B	Primary school	495
10	B	Middle school	607
11	B	Middle school	574
12	B	Secondary school	932

The calculations and results for District A are given in the table below. From the results, you can see that District A has 2,568 students. When running a tool, the results would also be given for District B.


Statistic	Result District A
Sum	`280+408+356+361+450+713 =2568`
Minimum	Minimum of: `[280, 408, 356, 361, 450, 713] =280`
Maximum	Maximum of: `[280, 408, 356, 361, 450, 713] =713`
Mean	`2568/6 =428`
Standard Deviation	`√((280-428)²+(408-428)²+(356-428)²+(361-428)²+(450-428)²+(713-428)²)/(6-1) =150.79`

Lines

Line layers are summarized using only the proportions of the line features that are within the boundary areas.

Tip:

When summarizing lines, use fields with counts or amounts so proportional calculations make logical sense in your analysis. For example, use population rather than population density.

A real-life scenario in which you can use this analysis is determining the total volume of water in rivers within a specified boundary. Each line represents a river that is partially located inside the boundary.

The figure below shows a hypothetical line and boundary layer, and the table summarizes the attributes for the line layer.


River	Length (miles)	Volume (gallons)
Yellow	3	6,000
Blue	8	10,000

The calculations for volume are given in the table below. From the results, you can see that the total volume is 9,000 gallons.

Note:

The calculations use the proportions of the lines within the boundary area. For example, the yellow line has a total volume of 6,000 gallons with two of its three total miles within the boundary. Therefore, the calculations are preformed using 4,000 gallons as the volume for the yellow line:

6000*(2/3)=4000


Statistic	Result
Sum	`4000+5000=9000`
Minimum	Minimum of: `[4000, 5000]=4000`
Maximum	Maximum of: `[4000, 5000]=5000`
Mean	`((24000)+(35000))/(2+3) =(8000+15000)/5 =4600`
Standard Deviation	`√(2(4000-4600)²+3(5000-4600)²)/((2-1)/2(2+3)) =692.8`

Polygons

Polygon layers are summarized using only the proportions of the polygon features that are within the boundary areas.

Tip:

When summarizing polygons, use fields with counts or amounts so proportional calculations make logical sense in your analysis. For example, use population rather than population density.

A real-life scenario in which you can use this analysis is determining the population in a city neighborhood. The blue outline represents the boundary of the neighborhood and the smaller polygons represent census blocks.

The figure below shows a hypothetical polygon and boundary layer, and the table summarizes the attributes for the polygon layer.


Census block	Area (miles²)	Population
Yellow	6	3,200
Green	6	4,700
Pink	2.5	1,000
Blue	8	4,500
Orange	4	3,600

The calculations for population are given in the table below. From the results, you can see that there are 10,841 people in the neighborhood and an average (mean) of approximately 2,666 people per census block.

Note:

The calculations use the proportions of the polygons within the boundary area. For example, the yellow polygon has a total population of 3,200 with four of its six total square miles within the boundary. Therefore, the calculations are preformed using 2,133 as the population for the yellow polygon:

3200*(4/6)=2133


Statistic	Result
Sum	`2133+3133+400+3375+1800=10841`
Minimum	Minimum of: `[2133, 3133, 400, 3375, 1800]=400`
Maximum	Maximum of: `[2133, 3133, 400, 3375, 1800]=3375`
Mean	`((42133)+(43133)+((1400)+(63375)+(2*1800))/(4+4+1+6+2) =2665.53`
Standard Deviation	`√(4(2133-2665.53)²+4(3133-2665.53)²+1(400-2665.53)²+6(3375-2665.53)²+2(1800-2665.53)²)/((5-1)/5(4+4+1+6+2)) =925.91`

Equations

Note:

Points

Lines

Tip:

Note:

Polygons

Tip:

Note:

Related topics

In this topic