Use the model—ArcGIS pretrained models

You can use this model with the Detect Objects Using Deep Learning tool in the Image Analyst toolbox in ArcGIS Pro. Follow the steps below to use the model for detecting humans in drone images. This model can also be fine-tuned using Train Deep Learning Model tool. See Fine-tune the model page for details on how to fine-tune this model.

Detect humans

To detect humans in drone imagery, complete the following steps:

Download the Human Detection (Drone Imagery) model and add the imagery layer in ArcGIS Pro.
Zoom to an area of interest.
On the Analysis tab, click Tools.
Click the Toolboxes tab in the Geoprocessing pane, click Image Analyst Tools, expand Deep Learning, and click the Detect Objects Using Deep Learning tool.
On the Parameters tab, set the parameter values as follows:
1. Input Raster—Select the imagery.
2. Output Detected Objects—Set the output feature class that will contain the detected objects.
3. Model Definition—Select the pretrained or fine-tuned model .dlpk file.
4. Model Arguments—Change the values of the arguments if necessary.
  - padding—The number of pixels at the border of image tiles from which predictions are blended for adjacent tiles. Increase its value to smooth the output while reducing edge artifacts. The maximum value of the padding can be half of the tile size value.
  - threshold—The detections that have a confidence score higher than this threshold are included in the result. The allowed values range from 0 to 1.0. The recommended threshold value range is between 0.75 and 0.9.
  - nms_overlap—The ratio of overlap between bounding boxes with which to filter detections with lower confidence scores.
  - batch_size—The number of image tiles processed in each step of the model inference. This depends on the memory of your graphics card.
  - exclude_pad_detections— If true, it allows the model to ignore padded areas in images, enhancing precision by focusing on relevant content, while setting it to false includes these areas, which may introduce noise and irrelevant detections.
  - test_time_augmentation—Performs test time augmentation while predicting. This is a technique used to improve the robustness and accuracy of model predictions. It involves applying data augmentation techniques during inferencing, which means generating multiple slightly modified versions of the test data and aggregating the predictions. If true, predictions of flipped and rotated orientations of the input image will be merged into the final output and their confidence values are averaged. This may cause the confidence to fall below the threshold for objects that are only detected in a few orientations of the image.
5. Non Maximum Suppression—Optionally, check the check box to remove the overlapping features with lower confidence.
  If checked, set the following parameters:
  - Confidence Score Field
  - Class Value Field (optional)
  - Maximum Overlap Ratio (optional)
On the Environments tab, set the values as follows:
1. Processing Extent—Select Current Display Extent or any other option from the drop-down menu.
2. Cell Size (required)—Set the value as the resolution of the imagery. Keep the default value to use the imagery extent by default.
3. Processor Type—Select CPU or GPU.
  It is recommended that you select GPU, if available, and set GPU ID to the GPU to be used.
Click Run.
The output layer is added to the map.

Feedback on this topic?