Some checks failed
Self-hosted runner (nightly-past-ci-caller) / Get number (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.11 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.10 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.9 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.8 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.7 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.6 (push) Has been cancelled
Self-hosted runner (nightly-past-ci-caller) / TensorFlow 2.5 (push) Has been cancelled
Self-hosted runner (benchmark) / Benchmark (aws-g5-4xlarge-cache) (push) Has been cancelled
Build documentation / build (push) Has been cancelled
Build documentation / build_other_lang (push) Has been cancelled
CodeQL Security Analysis / CodeQL Analysis (push) Has been cancelled
New model PR merged notification / Notify new model (push) Has been cancelled
PR CI / pr-ci (push) Has been cancelled
Slow tests on important models (on Push - A10) / Get all modified files (push) Has been cancelled
Secret Leaks / trufflehog (push) Has been cancelled
Update Transformers metadata / build_and_package (push) Has been cancelled
Slow tests on important models (on Push - A10) / Model CI (push) Has been cancelled
Check Tiny Models / Check tiny models (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Model CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Pipeline CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Example CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / DeepSpeed CI (push) Has been cancelled
Self-hosted runner (Intel Gaudi3 scheduled CI caller) / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI - Flash Attn / Setup (push) Has been cancelled
Nvidia CI - Flash Attn / Model CI (push) Has been cancelled
Nvidia CI / Setup (push) Has been cancelled
Nvidia CI / Model CI (push) Has been cancelled
Nvidia CI / Torch pipeline CI (push) Has been cancelled
Nvidia CI / Example CI (push) Has been cancelled
Nvidia CI / Trainer/FSDP CI (push) Has been cancelled
Nvidia CI / DeepSpeed CI (push) Has been cancelled
Nvidia CI / Quantization CI (push) Has been cancelled
Nvidia CI / Kernels CI (push) Has been cancelled
Doctests / Setup (push) Has been cancelled
Doctests / Call doctest jobs (push) Has been cancelled
Doctests / Send results to webhook (push) Has been cancelled
Extras Smoke Test / Get supported Python versions (push) Has been cancelled
Extras Smoke Test / Test extras on Python ${{ matrix.python-version }} (push) Has been cancelled
Extras Smoke Test / Check Slack token availability (push) Has been cancelled
Extras Smoke Test / Notify failures to Slack (push) Has been cancelled
Self-hosted runner (AMD scheduled CI caller) / Trigger Scheduled AMD CI (push) Has been cancelled
Stale Bot / Close Stale Issues (push) Has been cancelled
102 lines
4.5 KiB
Markdown
102 lines
4.5 KiB
Markdown
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
|
|
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
|
rendered properly in your Markdown viewer.
|
|
|
|
-->
|
|
*This model was published in HF papers on 2023-04-06 and contributed to Hugging Face Transformers on 2024-02-26.*
|
|
|
|
# SegGPT
|
|
|
|
|
|
## Overview
|
|
|
|
The SegGPT model was proposed in [SegGPT: Segmenting Everything In Context](https://huggingface.co/papers/2304.03284) by Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang. SegGPT employs a decoder-only Transformer that can generate a segmentation mask given an input image, a prompt image and its corresponding prompt mask. The model achieves remarkable one-shot results with 56.1 mIoU on COCO-20 and 85.6 mIoU on FSS-1000.
|
|
|
|
The abstract from the paper is the following:
|
|
|
|
*We present SegGPT, a generalist model for segmenting everything in context. We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images. The training of SegGPT is formulated as an in-context coloring problem with random color mapping for each data sample. The objective is to accomplish diverse tasks according to the context, rather than relying on specific colors. After training, SegGPT can perform arbitrary segmentation tasks in images or videos via in-context inference, such as object instance, stuff, part, contour, and text. SegGPT is evaluated on a broad range of tasks, including few-shot semantic segmentation, video object segmentation, semantic segmentation, and panoptic segmentation. Our results show strong capabilities in segmenting in-domain and out-of*
|
|
|
|
Tips:
|
|
|
|
- One can use [`SegGptImageProcessor`] to prepare image input, prompt and mask to the model.
|
|
- One can either use segmentation maps or RGB images as prompt masks. If using the latter make sure to set `do_convert_rgb=False` in the `preprocess` method.
|
|
- It's highly advisable to pass `num_labels` when using `segmentation_maps` (not considering background) during preprocessing and postprocessing with [`SegGptImageProcessor`] for your use case.
|
|
- When doing inference with [`SegGptForImageSegmentation`] if your `batch_size` is greater than 1 you can use feature ensemble across your images by passing `feature_ensemble=True` in the forward method.
|
|
|
|
Here's how to use the model for one-shot semantic segmentation:
|
|
|
|
```python
|
|
import torch
|
|
from datasets import load_dataset
|
|
|
|
from transformers import SegGptForImageSegmentation, SegGptImageProcessor
|
|
|
|
|
|
checkpoint = "BAAI/seggpt-vit-large"
|
|
image_processor = SegGptImageProcessor.from_pretrained(checkpoint)
|
|
model = SegGptForImageSegmentation.from_pretrained(checkpoint, device_map="auto")
|
|
|
|
dataset_id = "EduardoPacheco/FoodSeg103"
|
|
ds = load_dataset(dataset_id, split="train")
|
|
# Number of labels in FoodSeg103 (not including background)
|
|
num_labels = 103
|
|
|
|
image_input = ds[4]["image"]
|
|
ground_truth = ds[4]["label"]
|
|
image_prompt = ds[29]["image"]
|
|
mask_prompt = ds[29]["label"]
|
|
|
|
inputs = image_processor(
|
|
images=image_input,
|
|
prompt_images=image_prompt,
|
|
segmentation_maps=mask_prompt,
|
|
num_labels=num_labels,
|
|
return_tensors="pt"
|
|
)
|
|
|
|
with torch.no_grad():
|
|
outputs = model(**inputs)
|
|
|
|
target_sizes = [image_input.size[::-1]]
|
|
mask = image_processor.post_process_semantic_segmentation(outputs, target_sizes, num_labels=num_labels)[0]
|
|
```
|
|
|
|
This model was contributed by [EduardoPacheco](https://huggingface.co/EduardoPacheco).
|
|
The original code can be found [here]([(https://github.com/baaivision/Painter/tree/main)).
|
|
|
|
## SegGptConfig
|
|
|
|
[[autodoc]] SegGptConfig
|
|
|
|
## SegGptImageProcessor
|
|
|
|
[[autodoc]] SegGptImageProcessor
|
|
- preprocess
|
|
- post_process_semantic_segmentation
|
|
|
|
## SegGptImageProcessorPil
|
|
|
|
[[autodoc]] SegGptImageProcessorPil
|
|
- preprocess
|
|
- post_process_semantic_segmentation
|
|
|
|
## SegGptModel
|
|
|
|
[[autodoc]] SegGptModel
|
|
- forward
|
|
|
|
## SegGptForImageSegmentation
|
|
|
|
[[autodoc]] SegGptForImageSegmentation
|
|
- forward
|