Summary statistics

Summary statistics are calculated by the Aggregate Points, Summarize Within, Summarize Nearby, Join Features, and Dissolve Boundaries tools.

Equations

Mean and standard deviation are calculated using weighted mean and weighted standard deviation for line and polygon features. None of the statistics for point features are weighted. The weight is the length or area of the feature that falls within the boundary.

The following table shows the equations used to calculate standard deviation, weighted mean, and weighted standard deviation:

StatisticEquationVariablesFeatures

Standard Deviation

Standard deviation equation

where:

  • N = Number of observations
  • xi = Observations
  • = Mean

Points

Weighted Mean

Weighted mean equation

where:

  • N = Number of observations
  • xi = Observations
  • wi = Weights

Lines and polygons

Weighted Standard Deviation

Weighted standard deviation equation

where:

  • N = Number of observations
  • xi = Observations
  • wi = Weights
  • w = Weighted mean
  • N' = Number of non-zero weights

Lines and polygons

Note:

Null values are excluded from all statistical calculations. For example, the mean of 10, 5, and a null value is:

(10+5)/2=7.5

Points

Point layers are summarized using only the point features within the boundary areas.

A real-life scenario in which points could be summarized is in determining the total number of students in each school district. Each point represents a school. The Type field gives the type of school (primary school, middle school, or secondary school) and a population field gives the number of students enrolled at each school.

The figure below shows a hypothetical point and boundary layer, and the table summarizes the attributes for the point layer.

Summarizing a point layer

ObjectIDDistrictTypePopulation

1

A

Primary school

280

2

A

Primary school

408

3

A

Primary school

356

4

A

Middle school

361

5

A

Middle school

450

6

A

Secondary school

713

7

B

Primary school

370

8

B

Primary school

422

9

B

Primary school

495

10

B

Middle school

607

11

B

Middle school

574

12

B

Secondary school

932

The calculations and results for District A are given in the table below. From the results, you can see that District A has 2,568 students. When running a tool, the results would also be given for District B.

StatisticResult District A

Sum

280+408+356+361+450+713
=2568

Minimum

Minimum of:

[280, 408, 356, 361, 450, 713]
=280

Maximum

Maximum of:

[280, 408, 356, 361, 450, 713]
=713

Mean

2568/6
=428

Standard Deviation

√((280-428)²+(408-428)²+(356-428)²+(361-428)²+(450-428)²+(713-428)²)/(6-1)
=150.79

Lines

Line layers are summarized using only the proportions of the line features that are within the boundary areas.

Tip:

When summarizing lines, use fields with counts or amounts so proportional calculations make logical sense in your analysis. For example, use population rather than population density.

A real-life scenario in which you can use this analysis is determining the total volume of water in rivers within a specified boundary. Each line represents a river that is partially located inside the boundary.

The figure below shows a hypothetical line and boundary layer, and the table summarizes the attributes for the line layer.

Summarizing a line layer

RiverLength (miles)Volume (gallons)

Yellow

3

6,000

Blue

8

10,000

The calculations for volume are given in the table below. From the results, you can see that the total volume is 9,000 gallons.

Note:

The calculations use the proportions of the lines within the boundary area. For example, the yellow line has a total volume of 6,000 gallons with two of its three total miles within the boundary. Therefore, the calculations are preformed using 4,000 gallons as the volume for the yellow line:

6000*(2/3)=4000

StatisticResult

Sum

4000+5000=9000

Minimum

Minimum of:

[4000, 5000]=4000

Maximum

Maximum of:

[4000, 5000]=5000

Mean

((2*4000)+(3*5000))/(2+3)
=(8000+15000)/5
=4600

Standard Deviation

√(2(4000-4600)²+3(5000-4600)²)/((2-1)/2(2+3))
=692.8

Polygons

Polygon layers are summarized using only the proportions of the polygon features that are within the boundary areas.

Tip:

When summarizing polygons, use fields with counts or amounts so proportional calculations make logical sense in your analysis. For example, use population rather than population density.

A real-life scenario in which you can use this analysis is determining the population in a city neighborhood. The blue outline represents the boundary of the neighborhood and the smaller polygons represent census blocks.

The figure below shows a hypothetical polygon and boundary layer, and the table summarizes the attributes for the polygon layer.

Summarizing a polygon layer

Census blockArea (miles²)Population

Yellow

6

3,200

Green

6

4,700

Pink

2.5

1,000

Blue

8

4,500

Orange

4

3,600

The calculations for population are given in the table below. From the results, you can see that there are 10,841 people in the neighborhood and an average (mean) of approximately 2,666 people per census block.

Note:

The calculations use the proportions of the polygons within the boundary area. For example, the yellow polygon has a total population of 3,200 with four of its six total square miles within the boundary. Therefore, the calculations are preformed using 2,133 as the population for the yellow polygon:

3200*(4/6)=2133

StatisticResult

Sum

2133+3133+400+3375+1800=10841

Minimum

Minimum of:

[2133, 3133, 400, 3375, 1800]=400

Maximum

Maximum of:

[2133, 3133, 400, 3375, 1800]=3375

Mean

((4*2133)+(4*3133)+((1*400)+(6*3375)+(2*1800))/(4+4+1+6+2)
=2665.53

Standard Deviation

√(4(2133-2665.53)²+4(3133-2665.53)²+1(400-2665.53)²+6(3375-2665.53)²+2(1800-2665.53)²)/((5-1)/5(4+4+1+6+2))
=925.91

Related topics

Use the following topics to learn more about summary statistics within a specific tool: