Skip To Content

Use the model

You can use this model in the Detect Objects Using Deep Learning tool available in the Image Analyst toolbox in ArcGIS Pro. Follow the steps below to use the model for parsing text in images.

Supported imagery

This model can be used with high-resolution, three-band street-level imagery or oriented imagery with medium-to-large size text, mosaic images, or image services.

Detect and recognize text

Complete the following steps to read text from images:

  1. Download the Scene Text Parsing model and add an image or street-level imagery with text in ArcGIS Pro.
    Three-band image in ArcGIS Pro
  2. Zoom to an area of interest.
    Zoomed in to an area of interest
  3. Browse to Tools on the Analysis tab.
    Tools on the Analysis tab
  4. Click the Toolboxes tab in the Geoprocessing pane, select Image Analyst Tools, and browse to the Detect Objects Using Deep Learning tool under Deep Learning.
    Detect Object Using Deep Learning tool
  5. Set the variables on the Parameters tab as follows:
    1. Input Raster—Select the image.
    2. Output Detected Objects—Set the output detected object that will contain the text detection and recognition results.
    3. Model Definition—Select the pretrained model .dlpk file.
    4. Arguments (optional)—Change the values of the arguments if required.
      • threshold—The detections with a confidence score higher than this threshold are included in the result. The allowed values range from 0 to 1.0.
      • test_time_augmentation—Performs test time augmentation while predicting. If true, predictions of flipped and rotated variants of the input image will be merged into the final output.
    Detect Object Using Deep Learning tool Parameters tab
  6. Set the variables on the Environments tab as follows:
    1. Processing Extent—Select Default or any other option from the drop-down menu.
    2. Processor Type—Select CPU.

      This model would run only on a CPU.

    Detect Object Using Deep Learning tool Environments tab
  7. Click Run.

    The output layer is added to the map.

    Detected and recognized text as a result