Introduction to the model—ArcGIS pretrained models

The Vision Language Context-Based Classification pretrained model available on ArcGIS Living Atlas of the World is a deep learning model that is used to classify images.

This deep learning package (DLPK) serves as a connection between ArcGIS Pro and vision language models, supporting OpenAI's GPT-4 and GPT-4o, as well as Llama models. OpenAI and Llama Vision models are known for their advanced capabilities in natural language processing and understanding, and interpreting and generating human-like text. The integration of these models into a DLPK enhances their utility by enabling them to process images and perform zero-shot classification of objects in imagery.

Use this DLPK to use OpenAI's large vision language and Llama Vision models to perform object classification on images and rasters in ArcGIS Pro. This DLPK allows for flexibility in classifying objects, as it is not restricted to predefined classes; users can specify custom class labels at tool run time. This analysis and interpretation of spatial data allows professionals in fields such as environmental science, urban planning, and remote sensing to extract meaningful insights from their visual datasets.

License requirements

To complete this workflow, the following are the license requirements:

ArcGIS Desktop—ArcGIS Image Analyst extension for ArcGIS Pro
ArcGIS Enterprise—ArcGIS Image Server with raster analytics configured
ArcGIS Online—ArcGIS Pro or Professional Plus user type

Model details

This model has the following characteristics:

Input—8-bit RGB imagery.
Output—Feature class with information about classification of the image.
Compute—This workflow can run on CPU or GPU.
Applicable geographies—This model is expected to work well globally.
Architecture—The implementation uses either OpenAI's vision language models or Llama Vision models.

Access and download the model

Download the Vision Language Context-Based Classification pretrained model from ArcGIS Living Atlas of the World. Alternatively, access the model directly from ArcGIS Pro, or use it in ArcGIS Image for ArcGIS Online.

To download the model, complete the following steps:

Browse to ArcGIS Living Atlas of the World.
Sign in with your ArcGIS Online credentials.
Search for Vision Language Context-Based Classification and open the item page from the search results.
Click the Download button to download the model.
You can use the downloaded .dlpk file directly in ArcGIS Pro.

Release notes

The following are the release notes:


Date	Description
March 2025 December 2024	Second release of Vision Language Context-Based Classification First release of Vision Language Context-Based Classification

Feedback on this topic?

License requirements

Model details

Access and download the model

Release notes

In this topic