Skip To Content

Introduction to the model

Banner image for the model showing prompts and detection

This document explains how to use the Text SAM pretrained model available on ArcGIS Living Atlas of the World. The model is used to detect objects in an image using a prompt.

Text SAM is an open-source sample model that can be prompted using free-form text prompts to extract features of various kinds. This is achieved by using Grounding DINO and Segment Anything Model (SAM). Grounding DINO is an open-set object detector that can find objects given a text prompt. Segment Anything Model can be used to segment any object in a region of interest represented by a bounding box or a point. Both the models are called sequentially within this deep learning package. The bounding boxes representing the detected objects from Grounding DINO are fed into Segment Anything Model as prompts to generate masks for the objects. Finally, the masks are converted to polygons and returned as GIS features. These features, which are described by the input text prompts, can be any object of interest such as vehicles, swimming pools, ships, airplanes, solar panels, and so on.

To complete this workflow, the following are the license requirements:

  • ArcGIS DesktopArcGIS Image Analyst extension for ArcGIS Pro
  • ArcGIS EnterpriseArcGIS Image Server
  • ArcGIS OnlineArcGIS Pro or Professional Plus user type.

Model details

This model has the following characteristics:

  • Input— 8-bit, 3-band RGB imagery.
  • Output—Feature class containing masks of various objects in the image.
  • Compute—This workflow is compute-intensive, and a GPU with minimum CUDA compute capability of 6.0 is recommended. This model requires a GPU with at least 8 GB of GPU memory.
  • Applicable geographies—The model is expected to work globally.
  • Architecture—This model is based on the open-source Grounding DINO by IDEA-Research (The International Digital Economy Academy) and Segment Anything Model (SAM) by Meta. You can check the source code of this sample deep learning package (DLPK) for additional information.

Access and download the model

Download the Text SAM pretrained model from ArcGIS Living Atlas of the World. Alternatively, access the model directly from ArcGIS Pro, or consume it in ArcGIS Image for ArcGIS Online.

  1. Browse to ArcGIS Living Atlas of the World.
  2. Sign in with your ArcGIS Online credentials.
  3. Search for Text SAM and open the item page from the search results.
  4. Click the Download button to download the model.

    You can use the downloaded .dlpk file directly in ArcGIS Pro or upload it and use it in ArcGIS Enterprise. Additionally, you can fine-tune the pretrained model if necessary.

Release notes

The following are the release notes:

DateDescription

March 2024

First release of Text SAM