Use the model—ArcGIS pretrained models

You can use this model in the Classify Objects Using Deep Learning tool available in the Image Analyst toolbox in ArcGIS Pro. Follow the steps below to use the model for visual question answering in images.

Use the model

Use the following steps to generate a response, given a question based on the imagery:

Download the HF Visual Question Answering model and add the imagery layer in ArcGIS Pro.
Zoom to an area of interest.
Browse to Tools under the Analysis tab.
Click the Toolboxes tab in the Geoprocessing pane, select Image Analyst Tools, and browse to the Classify Objects Using Deep Learning tool under Deep Learning.
Set the variables under the Parameters tab as follows:
1. Input Raster—Select the imagery.
2. Output Classified Objects Feature Class—Set the output feature layer that will contain the generated response from the model.
3. Model Definition—Select the pretrained model .dlpk file.
4. Arguments—Change the values of the arguments if required.
  - huggingface_id—The model id of a pretrained Visual Question Answering model hosted on huggingface.co.
    Visual Question Answering models can be filtered by choosing the Visual Question Answering tag in the Tasks list on the Hugging Face model hub, as shown below:
    The model id consists of the {username}/{repository} as displayed at the top of the model page, as shown below:
    Only those models that have config.json and preprocessor_config.json are supported. The presence of these files can be verified on the Files and versions tab of the model page, as shown below:
  - padding—Number of pixels at the border of image tiles from which predictions are blended for adjacent tiles. Increase its value to smooth the output while reducing edge artifacts. The maximum value of the padding can be half of the tile size value.
  - question—The question to be answered based on the image.
  - batch_size—Number of image tiles processed in each step of the model inference. This depends on the memory of your graphic card.
Set the variables under the Environments tab as follows:
1. Processing Extent—Select Current Display Extent or any other option from the drop-down menu.
2. Cell Size (required)—Set the value as resolution of the imagery. You can keep the default value.
3. Processor Type—Select CPU or GPU.
  It is recommended that you select GPU, if available, and set GPU ID to the GPU to be used.
Click Run.
Once processing finishes, the output layer is added to the map.

Feedback on this topic?