Skip To Content

Use the model

You can use this model in the Classify Objects Using Deep Learning tool available in the Image Analyst toolbox in ArcGIS Pro. Follow the steps below to use the model for visual question answering in images.

Use the model

Use the following steps to generate a response, given a question based on the imagery:

  1. Download the HF Visual Question Answering model and add the imagery layer in ArcGIS Pro.
  2. Zoom to an area of interest.
    Zoom to an area of interest.
  3. Browse to Tools under the Analysis tab.
    Tools on the Analysis tab in ArcGIS Pro
  4. Click the Toolboxes tab in the Geoprocessing pane, select Image Analyst Tools, and browse to the Classify Objects Using Deep Learning tool under Deep Learning.
    Classify Objects Using Deep Learning tool
  5. Set the variables under the Parameters tab as follows:
    1. Input Raster—Select the imagery.
    2. Output Classified Objects Feature Class—Set the output feature layer that will contain the generated response from the model.
    3. Model Definition—Select the pretrained model .dlpk file.
    4. Arguments—Change the values of the arguments if required.

      • huggingface_id—The model id of a pretrained Visual Question Answering model hosted on huggingface.co.

        Visual Question Answering models can be filtered by choosing the Visual Question Answering tag in the Tasks list on the Hugging Face model hub, as shown below:

        Hugging Face hub model categories

        The model id consists of the {username}/{repository} as displayed at the top of the model page, as shown below:

        Hugging Face model name

        Only those models that have config.json and preprocessor_config.json are supported. The presence of these files can be verified on the Files and versions tab of the model page, as shown below:

        Configuration files for huggingface model

      • padding—Number of pixels at the border of image tiles from which predictions are blended for adjacent tiles. Increase its value to smooth the output while reducing edge artifacts. The maximum value of the padding can be half of the tile size value.
      • question—The question to be answered based on the image.
      • batch_size—Number of image tiles processed in each step of the model inference. This depends on the memory of your graphic card.

    Classify Objects Using Deep Learning tool Parameters tab
  6. Set the variables under the Environments tab as follows:
    1. Processing Extent—Select Current Display Extent or any other option from the drop-down menu.
    2. Cell Size (required)—Set the value as resolution of the imagery. You can keep the default value.
    3. Processor Type—Select CPU or GPU.

      It is recommended that you select GPU, if available, and set GPU ID to the GPU to be used.

    Classify Objects Using Deep Learning tool Environments tab
  7. Click Run.

    Once processing finishes, the output layer is added to the map.

    Extracted features

In this topic
  1. Use the model