ArcGIS Business Analyst Web App employs a GeoEnrichment service which uses the concept of a study area to define the location of the point or area that you want to enrich with additional information. If one or more points is input as a study area, the service will create a one-mile ring buffer around the points or points to collect and append enrichment data. You can optionally change the ring buffer size or create drive-time service areas around a point.
The GeoEnrichment service uses a sophisticated geographic retrieval methodology to aggregate data for rings and other polygons. A geographic retrieval methodology determines how data is gathered and summarized or aggregated for input features. For standard geographic units, such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. For example, if an input study trade area contains a selection of ZIP Codes, the data retrieval is a simple process of gathering the data for those areas.
Data Allocation Method
The Data Allocation method allocates block group data to custom areas by examining where the population is located within the block group and determines how much of the population of a block group overlaps a custom area. This method is used in the United States, and similarly in Canada. The population data reported for census blocks, a more granular level of geography than block groups, is used to determine where the population is distributed within a block group. If the geographic center of a block falls within the custom area, the entire population for the block is used to weight the block group data. The geographic distribution of the population at the census block level determines the proportion of census block group data that is allocated to user specified areas as shown in the example.
Depending on the data, households, housing units or businesses at the block group level are used as weights. Employing block centroids is superior because it accounts for the possibility that the population may not be evenly distributed geographically throughout a block group.
How data apportionment works
ArcGIS Business Analyst Web App uses a data apportionment algorithm to redistribute demographic, business, economic, and landscape variables to input polygon features. The algorithm analyzes each polygon to be enriched relative to a point dataset and a detailed dataset of reporting unit polygons that contain attributes for the selected variables. Based on how each polygon being enriched overlays these datasets, the algorithm determines the appropriate amount of each variable to assign.
Depending on which country the enrichment polygon is located in, the granular point dataset represents one of the following:
- Census Block Points—U.S. and Canada only. These points are initially produced as centroids from the most detailed census tabulation areas in these countries; census blocks in the U.S. and dissemination areas for Canada. In some cases, Esri has moved these points to be located within residential areas, rather than obviously industrial or other non-residential areas. Each point contains attributes for the count of people and households living in the corresponding tabulation area.
- Settlement Points—For most other countries, Esri produces settlement points based on a settlement likelihood model that uses Landsat8 imagery and road intersections. Road intersections particularly help in areas where dense forest canopy obscures dwellings. Settlement points are initially produced as a dasymetric raster surface, meaning places that people cannot live, or where people do not live have been removed. This raster surface is produced at a resolution of 75-meters, which is roughly the size of a city block. The model assigns each cell or point a settlement likelihood score, representing the likelihood of people living there.
- Address Based Settlement Points—Switzerland and Netherlands only. Some countries track and make available the points representing residential addresses of their citizens. Esri aggregates the count of these address points in to a 75-meter resolution raster and converts that to a point dataset like settlement points.
- Building Footprint Settlement Points—Spain AIS Group Data only. The count of building footprint centroids of residential buildings is summed to a 75-meter resolution raster to produce a dataset of settlement points.
The GeoEnrichment service uses the most detailed geographies with the most recent census data, or authoritative estimates, available for commercial use from each country. For instance in 2018, South Africa's dataset has 85,483 features, Hungary's has 3,177, and Japan's has 217,201.
For most countries, data is updated every two years, and a few countries are updated annually because data are readily available. Esri spreads the updates throughout the year on a quarterly basis. The data for each country are the most recently available estimates. Generally, the data Esri releases reflects the demographic and economic conditions nine months prior to the release date.
The illustration below shows how the purple polygon to be enriched relates to the dark blue settlement points and detailed statistical polygons with gray outlines that will support enrichment. Here is how the process works to enrich the purple ring with total population:
- Select the statistical polygons that are completely inside the ring polygon. These polygons are shown in white. Compute the sum of the total population variable for these polygons.
- Select the statistical polygons that partially intersect the ring polygon. These are shown in light green. For each of these polygons do the following:
- Select all the dark blue settlement points that are inside. Using the total population variable from the statistical polygon and the sum of settlement likelihood scores determine the ratio of people per unit of settlement score.
For only the points that are inside the purple ring, compute the sum of settlement likelihood, and from that derive the number of people represented by those points.
The dark blue settlement points represent two types of information. First, a regularly spaced, 75-meter grid of points that is produced as described above. Second, because some reporting units are small enough to fall between the 75-meter grid of points, the centroids of these units are added to guarantee these areas are not omitted.
Variations in apportionment method
The above description applies to most countries, however, in the U.S. and Canada, the process is simpler because the points already have an attribute with the population living there. Thus, the sum of the population attribute for the points inside the enrichment polygon is all that is needed to determine the total population. The values for other variables are determined based on population or summaries pre-calculated means or rates.
The above information describes the default apportionment method, which is called BlockApportionment in the ArcGIS REST API for the GeoEnrichment service. If the service detects a significantly large polygon a faster, less computationally intensive, method will be used. This method is called CentroidsInPolygon. The metadata for the results of an enrichment operation will supply the name of the method used.
The Centroids in polygon method uses coarser geographies and their centroids as the basis for apportionment. For example, in the U.S., instead of using the U.S. Census Bureau's block group polygons, the U.S. Census Bureau's census tract boundaries will be used instead, and instead of using block points as the basis for apportionment, the tract centroids will be used instead. As the size of polygons increases, progressively coarser polygon geographies and their centroids will be used by the centroids in polygons method. These thresholds are based on the diameters of buffers:
- In the United States these diameters and polygon/point datasets are used:
- 0 to 504 miles use Census Block Groups and Block points.
- 505 to 786 mile use Census Tracts and Block points based on generalization level 2.
- 787 to 866 mile use Census Tracts and Block points based on generalization level 3.
- 867 to 954 mile use Census Tracts and Block points based on generalization level 4.
- Beyond these ranges, Counties and Block points are based on generalization level 5.
- For a global list of all apportionment settings, see this spreadsheet.
Thus, it is important to segregate large polygons from smaller polygons when managing data as inputs for the Enrich Layer tools.
While the differences in results between the methods are usually quite small, sometimes the coarser geographies produce less than optimal results. The GeoEnrichment service has a detailedAggregationMethod property, which can be set to override the default behavior described above. However, when the detailedAggregationMethod is set, only a single large polygon at a time may be enriched.
Extremely large polygons, exceeding the sizes described above for the centroids in polygon method, will not be processed and return a warning.
Similarly, tiny polygons—that is, smaller than a city block in an urban area, or smaller than a square kilometer or mile in a rural area may not produce any results because they do not intersect or contain any settlement or census block points.
One of the most often asked questions about GeoEnrichment is: how reliable are the results? The answer is that it varies depending on the data available for each country. It also varies within most countries depending on whether the area to be enriched is heavily or sparsely populated.
Each country has a two reliability scores. The potential range is 1.0 (best) to 5.0 (worst), though no countries are rated with these extreme scores. The most important score that affects the reliability of data apportionment is ratio of the population polygon's area to the number of people estimated to live there. When the polygon is large, and the number of people is small, the probability that the settlement points intersect where people live goes down. Most countries have a mix of circumstances. Saudi Arabia is a good example, where the cities have many polygons representing small areas, but the vast tracts of desert are represented with only a few polygons.
A second reliability score represents the overall reliability. This includes the ratio of the polygon area to the population and a rating of the reliability of the census data for that country and the complexity of the footprint of settlement. The census reliability accounts for the age of the last official census, the collection method, and the completeness of that census and other estimates and surveys used to derive the current estimate. The complexity of the footprint is important because the creation of the settlement points starts with using raster model which is susceptible to lower settlement likelihood values at the edges of settlement due to the effects of resampling.