Skip To Content

Use the model

You can use this model in the Classify Objects Using Deep Learning tool available in the Image Analyst toolbox in ArcGIS Pro. Follow the steps below to use the model for visual question answering in images.

Use the model

Use the following steps to generate a response, given a question based on the imagery:

  1. Download the HF Visual Question Answering model and add the imagery layer in ArcGIS Pro.
  2. Zoom to an area of interest.
    Zoom to an area of interest.
  3. Browse to Tools under the Analysis tab.
    Tools on the Analysis tab in ArcGIS Pro
  4. Click the Toolboxes tab in the Geoprocessing pane, select Image Analyst Tools, and browse to the Classify Objects Using Deep Learning tool under Deep Learning.
    Classify Objects Using Deep Learning tool
  5. Set the variables under the Parameters tab as follows:
    1. Input Raster—Select the imagery.
    2. Output Classified Objects Feature Class—Set the output feature layer that will contain the generated response from the model.
    3. Model Definition—Select the pretrained model .dlpk file.
    4. Class Label Field— The name of the field that will contain the class or category label in the output feature class.
    5. Arguments—Change the values of the arguments if required.

      • huggingface_id—The model id of a pretrained Visual Question Answering model hosted on huggingface.co.

        Visual Question Answering models can be filtered by choosing the Visual Question Answering tag in the Tasks list on the Hugging Face model hub, as shown below:

        Hugging Face hub model categories

        The model id consists of the {username}/{repository} as displayed at the top of the model page, as shown below:

        Hugging Face model name

        Only those models that have config.json and preprocessor_config.json are supported. The presence of these files can be verified on the Files and versions tab of the model page, as shown below:

        Configuration files for huggingface model

      • question—The question to be answered based on the image.
      • batch_size—Number of image tiles processed in each step of the model inference. This depends on the memory of your graphic card.
      • blacken_around_feature—Specifies whether the pixels surrounding each object or feature in an image tile should be masked out. If set to False, the surrounding pixels will not be masked and remain visible. This is the default setting.

    Classify Objects Using Deep Learning Environments tab
  6. Set the variables under the Environments tab as follows:
    1. Processing Extent—Select Current Display Extent or any other option from the drop-down menu.
    2. Cell Size (required)—Set the value as resolution of the imagery. You can keep the default value.
    3. Processor Type—Select CPU or GPU.

      It is recommended that you select GPU, if available, and set GPU ID to the GPU to be used.

    Classify Objects Using Deep Learning tool Environments tab
  7. Click Run.

    Once processing finishes, the output layer is added to the map.

    Extracted features

In this topic
  1. Use the model