Skip To Content

Introduction to the model

Banner image

The HF Visual Question Answering deep learning package seamlessly integrates pretrained Visual Question Answering (VQA) models from the Hugging Face Hub with ArcGIS, allowing you to use a variety of Hugging Face VQA models directly from ArcGIS. With this deep learning package, you can generate answers to open-ended questions, based on an image, using a pretrained model designed to interpret and analyze visual data. The deep learning package is compatible with a wide range of Hugging Face models for VQA tasks.

Before running a model, ensure compliance with its licensing terms, which can be found on the Hugging Face model page. Run only trusted models, as they include weights and code that could impact system security. Since model sizes vary, ensure adequate CPU/GPU memory is available for inference.

Model details

This model has the following characteristics:

  • Input—8-bit, 3-band RGB imagery.
  • Output—This model outputs a feature class with the generated response from the model.
  • Compute—This workflow is compute intensive, and a GPU with minimum CUDA compute capability of 6.0 is recommended.
  • Applicable geographies—The model is expected to work globally.
  • Architecture—This deep learning package uses the model from the Hugging Face Visual Question Answering Models page, given the model ID.

Access and download the model

Download the HF Visual Question Answering pretrained model from ArcGIS Living Atlas of the World. Alternatively, access the model directly from ArcGIS Pro, or consume it in ArcGIS Image for ArcGIS Online.

  1. Browse to ArcGIS Living Atlas of the World.
  2. Sign in with your ArcGIS Online credentials.
  3. Search for HF Visual Question Answering and open the item page from the search results.
  4. Click the Download button to download the model.

    You can use the downloaded .dlpk file directly in ArcGIS Pro or upload and use it in ArcGIS Enterprise.

Release notes

The following are the release notes:

DateDescription

February 2025

First release of HF Visual Question Answering