Skip To Content

Calculation estimates for user-created areas

How data is calculated for user-created areas

The Business Analyst web app employs a GeoEnrichment service which uses the concept of a study area to define the location of the point or area that you want to enrich with additional information. If one or more points is input as a study area, the service will create a one-mile ring buffer around the points or points to collect and append enrichment data. You can optionally change the ring buffer size or create drive-time service areas around a point.

The GeoEnrichment service uses a sophisticated geographic retrieval methodology to aggregate data for rings and other polygons. A geographic retrieval methodology determines how data is gathered and summarized or aggregated for input features. For standard geographic units, such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. For example, if an input study trade area contains a selection of ZIP Codes, the data retrieval is a simple process of gathering the data for those areas.

Data Allocation Method

The Data Allocation method allocates block group data to custom areas by examining where the population is located within the block group and determines how much of the population of a block group overlaps a custom area. This method is used in the United States, and similarly in Canada. The population data reported for census blocks, a more granular level of geography than block groups, is used to determine where the population is distributed within a block group. If the geographic center of a block falls within the custom area, the entire population for the block is used to weight the block group data. The geographic distribution of the population at the census block level determines the proportion of census block group data that is allocated to user specified areas as shown in the example.

Calculation estimates
Note:

Depending on the data, households, housing units or businesses at the block group level are used as weights. Employing block centroids is superior because it accounts for the possibility that the population may not be evenly distributed geographically throughout a block group.

Data apportionment outside the United States

The GeoEnrichment service uses the underlying statistical boundaries (e.g. postal code boundaries in Turkey) and population data in each country to aggregate demographic attributes for any given study area.

A traditional GIS approach to aggregating the population for a study area would find all the statistical areas that have their center point inside a ring buffer and sum their populations to get the total. The challenge with this approach is that it excludes the statistical areas that have their center point outside the ring even though a portion of those statistical areas might be inside the ring. For example, how would the population of the postal code in Turkey, highlighted below (i.e. Postcode 20160 with a population value of 94,294), be considered in the aggregation? Because neither the whole area nor its center point is completely within the ring buffer, Postcode 20160 would be excluded from aggregation using the traditional GIS approach.

Postcode5 layer

To better handle such statistical areas that are only partially contained inside the buffer ring, Esri uses a weighted population (aka Dasymetric mapping) approach to aggregate population and other demographic attributes for smaller study areas. In the above example, Esri has a behind-the-scenes weighted population layer of almost 11 million points that represent how population is distributed throughout each of the statistical areas (e.g. postal codes). This allows the GeoEnrichment service to better determine how the population of 94,294 is distributed across the Postcode 20160 statistical area and estimate the portion of the population that is within the ring buffer study area. For the entire 10-kilometer ring study area, this process is repeated for all partially intersected postal code statistical areas to determine that there is a estimated population of 544,097.

Challenges with the weighted population approach for large study areas

The approach described above for handling partially intersected statistical areas using a weighted population approach works very well for small input study areas (e.g. ring buffer of less than 100-miles in diameter). Since this technique uses very detailed population distribution points, the performance of aggregating these points can becomes slow and consume an enormous amount of computing power for large study areas. Therefore, the GeoEnrichment service falls back to using a traditional GIS approach when aggregating data for large study areas.

Small and large study areas are defined by the diameter of the area. For non-circular areas, such as custom polygons or drive-time service areas, the diameter is defined by choosing the maximum of the height and width of the box bounding that area. This bounding box is often referred to as the extent of the study area. For example, the same location in the original example above in Turkey is shown on the map below with a radial buffer of 50 kilometers (i.e. a diameter of 100 kilometers), so any study area this size or smaller will use the population weighted approach for aggregating data. Study areas that are larger than this size will fall back to the traditional GIS approach.

50 km ring buffer

Understanding what aggregation approach is used

When using the GeoEnrichment service, the enrich response provides information about what aggregation approach is used to aggregate demographic data. For example, a 10 kilometers ring for Turkey provides the following response:

"features" : [ {
          "attributes" : {
            "ID" : "1",
            "OBJECTID" : 2,
            "sourceCountry" : "TR",
            "myID" : "point2",
            "areaType" : "RingBufferBands",
            "bufferUnits" : "esriKilometers",
            "bufferUnitsAlias" : "Kilometers",
            "bufferRadii" : 10,
            "aggregationMethod" : "BlockApportionment:TR.Postcodes5",
            "HasData" : 1,
            "TOTPOP" : 544097

The aggregationMethod property in the response describes how the data is aggregated for the input study area:

aggregationMethod=BlockApportionment:TR.Postcodes5

This means that the 10 kilometer ring has data aggregated with the BlockApportionment (weighted population) approach with the Postcode5 statistical boundaries.

Conversely, a 120 kilometer study area has an aggregation method of:

aggregationMethod=CentroidsInPolygon:TR.Postcodes5

120 kilometer study area response

"features" : [ {
          "attributes" : {
            "ID" : "1",
            "OBJECTID" : 7,
            "sourceCountry" : "TR",
            "myID" : "point2",
            "areaType" : "RingBufferBands",
            "bufferUnits" : "esriKilometers",
            "bufferUnitsAlias" : "Kilometers",
            "bufferRadii" : 120,
            "aggregationMethod" : "CentroidsInPolygon:TR.Postcodes5",
            "HasData" : 1,
            "TOTPOP" : 3071779

This means that the traditional GIS approach of intersecting center points of the post code boundaries (i.e. CentroidsInPolygon) was used with the Postcode5 statistical boundaries for this larger study area.

Improving how population is estimated for large study areas

A new property in the GeoEnrichment service called detailedAggregationMethod uses the weighted population approach for study areas as large as 300 kilometers in diameter. This property has a limitation on the number of areas that can be run at a time. For small study areas, the GeoEnrichment service allows up to 100 input study areas in a single request, but this is over-ridden when the detailed aggregation property is set to true. Users can run only a single area at a time when the detailedAggregationMethod is set since aggregation for any areas larger than 100 kilometers is computer intensive. The service would not be able to support multiple concurrent requests with very large study areas and would result in request timeouts, very poor performance, overloading of servers, and other negative impacts.

Note:

  • When using the detailedAggregationMethod, larger tolerances for the population weighted approach for aggregating data can be used. In most countries, the tolerance will be increased from 100 kilometer to 150 kilometer when the aggregation method is switched from a weighted population approach to a traditional GIS approach.
  • The GeoEnrichment service will always use the weighted population aggregation approach for all drive-time service areas up to and including 90-minutes.