Skip To Content

Calculation estimates for user-created areas

Community Analyst employs a GeoEnrichment service which uses the concept of a study area to define the location of the point or area that you want to enrich with additional information. If one or more points is input as a study area, the service will create a one-mile ring buffer around the points or points to collect and append enrichment data. You can optionally change the ring buffer size or create drive-time service areas around a point.

The GeoEnrichment service uses a sophisticated geographic retrieval methodology to aggregate data for rings and other polygons. A geographic retrieval methodology determines how data is gathered and summarized or aggregated for input features. For standard geographic units, such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. For example, if an input study trade area contains a selection of ZIP Codes, the data retrieval is a simple process of gathering the data for those areas.

Data Allocation Method

The Data Allocation method allocates block group data to custom areas by examining where the population is located within the block group and determines how much of the population of a block group overlaps a custom area. This method is used in the United States, and similarly in Canada. The population data reported for census blocks, a more granular level of geography than block groups, is used to determine where the population is distributed within a block group. If the geographic center of a block falls within the custom area, the entire population for the block is used to weight the block group data. The geographic distribution of the population at the census block level determines the proportion of census block group data that is allocated to user specified areas as shown in the example.

Calculation estimates

Note:

Depending on the data, households, housing units or businesses at the block group level are used as weights. Employing block centriods is superior because it accounts for the possibility that the population may not be evenly distributed geographically throughout a block group.

How data apportionment works

Community Analyst uses a data apportionment algorithm to redistribute demographic, business, economic, and landscape variables to input polygon features. The algorithm analyzes each polygon to be enriched relative to a point dataset and a detailed dataset of reporting unit polygons that contain attributes for the selected variables. Based on how each polygon being enriched overlays these datasets, the algorithm determines the appropriate amount of each variable to assign.

Depending on which country the enrichment polygon is located in, the granular point dataset represents one of the following:

  • Census Block Points—U.S. and Canada only. These points are initially produced as centroids from the most detailed census tabulation areas in these countries; census blocks in the U.S. and dissemination areas for Canada. In some cases, Esri has moved these points to be located within residential areas, rather than obviously industrial or other non-residential areas. Each point contains attributes for the count of people and households living in the corresponding tabulation area.
  • Settlement Points—For most other countries, Esri produces settlement points based on a settlement likelihood model that uses Landsat8 imagery and road intersections. Road intersections particularly help in areas where dense forest canopy obscures dwellings. Settlement points are initially produced as a dasymetric raster surface, meaning places that people cannot live, or where people do not live have been removed. This raster surface is produced at a resolution of 75-meters, which is roughly the size of a city block. The model assigns each cell or point a settlement likelihood score, representing the likelihood of people living there.
  • Address Based Settlement Points—Switzerland and Netherlands only. Some countries track and make available the points representing residential addresses of their citizens. Esri aggregates the count of these address points in to a 75-meter resolution raster and converts that to a point dataset like settlement points.

The GeoEnrichment service uses the most detailed geographies with the most recent census data, or authoritative estimates, available for commercial use from each country. For instance in 2018, South Africa's dataset has 85,483 features, Hungary's has 3,177, and Japan's has 217,201.

For most countries, data is updated every two years, and a few countries are updated annually because data are readily available. Esri spreads the updates throughout the year on a quarterly basis. The data for each country are the most recently available estimates. Generally, the data Esri releases reflects the demographic and economic conditions nine months prior to the release date.

Apportionment methodology

The illustration below shows how the purple polygon to be enriched relates to the dark blue settlement points and detailed statistical polygons with gray outlines that will support enrichment. Here is how the process works to enrich the purple ring with total population:

  1. Select the statistical polygons that are completely inside the ring polygon. These polygons are shown in white. Compute the sum of the total population variable for these polygons.
  2. Select the statistical polygons that partially intersect the ring polygon. These are shown in light green. For each of these polygons do the following:
    1. Select all the dark blue settlement points that are inside. Using the total population variable from the statistical polygon and the sum of settlement likelihood scores determine the ratio of people per unit of settlement score.
    2. For only the points that are inside the purple ring, compute the sum of settlement likelihood, and from that derive the number of people represented by those points.

      Settlement points

      The dark blue settlement points represent two types of information. First, a regularly spaced, 75-meter grid of points that is produced as described above. Second, because some reporting units are small enough to fall between the 75-meter grid of points, the centroids of these units are added to guarantee these areas are not omitted.

Variations in apportionment method

The above description applies to most countries, however, in the U.S. and Canada, the process is simpler because the points already have an attribute with the population living there. Thus, the sum of the population attribute for the points inside the enrichment polygon is all that is needed to determine the total population. The values for other variables are determined based on population or summaries pre-calculated means or rates.

The above information describes the default apportionment method, which is called BlockApportionment in the ArcGIS REST API for the GeoEnrichment service. If the service detects a significantly large polygon a faster, less computationally intensive, method will be used. This method is called CentroidsInPolygon. The metadata for the results of an enrichment operation will supply the name of the method used.

The Centroids in polygon method uses coarser geographies and their centroids as the basis for apportionment. For example, in the U.S., instead of using the U.S. Census Bureau's block group polygons, the U.S. Census Bureau's census tract boundaries will be used instead, and instead of using block points as the basis for apportionment, the tract centroids will be used instead. As the size of polygons increases, progressively coarser polygon geographies and their centroids will be used by the centroids in polygons method. These thresholds are based on the diameters of buffers:

  • In the United States these diameters and polygon datasets are used:
    • 100 to 150 miles use Census Block Groups and Block Group centroids.
    • 151 to 200-mile use Census Tracts and Tract centroids.
    • 201 to 400-mile use Counties and County centroids.
  • Outside the United states the polygon datasets vary by country, depending on availability:
    • 150 to 250-km use the second most detailed geography available and its centroids.
    • 250 to 400-km use the third most detailed geography available and its centroids.

Thus, it is important to segregate large polygons from smaller polygons when managing data as inputs for the Enrich Layer tools.

While the differences in results between the methods are usually quite small, sometimes the coarser geographies produce less than optimal results. The GeoEnrichment service has a detailedAggregationMethod property, which can be set to override the default behavior described above. However, when the detailedAggregationMethod is set, only a single large polygon at a time may be enriched.

Note:

Extremely large polygons, exceeding the sizes described above for the centroids in polygon method, will not be processed and return a warning.

Similarly, tiny polygons—that is, smaller than a city block in an urban area, or smaller than a square kilometer or mile in a rural area may not produce any results because they do not intersect or contain any settlement or census block points.

Additional considerations

One of the most often asked questions about GeoEnrichment is: how reliable are the results? The answer is that it varies depending on the data available for each country. It also varies within most countries depending on whether the area to be enriched is heavily or sparsely populated.

Each country has a two reliability scores. The potential range is 1.0 (best) to 5.0 (worst), though no countries are rated with these extreme scores. The most important score that affects the reliability of data apportionment is ratio of the population polygon's area to the number of people estimated to live there. When the polygon is large, and the number of people is small, the probability that the settlement points intersect where people live goes down. Most countries have a mix of circumstances. Saudi Arabia is a good example, where the cities have many polygons representing small areas, but the vast tracts of desert are represented with only a few polygons.

A second reliability score represents the overall reliability. This includes the ratio of the polygon area to the population and a rating of the reliability of the census data for that country and the complexity of the footprint of settlement. The census reliability accounts for the age of the last official census, the collection method, and the completeness of that census and other estimates and surveys used to derive the current estimate. The complexity of the footprint is important because the creation of the settlement points starts with using raster model which is susceptible to lower settlement likelihood values at the edges of settlement due to the effects of resampling.