Fast Segment Anything

The paper proposes using a CNN-detector instead of a Transformer architecture to produce a 50x increase in the segmentation task.

The authors have replaced the Transformer (ViT) architecture with a YOLOv8 model. The task is also reformulated into two sequential stages of (1)producing segmentation masks using a CNN-based architecture and; (2) outputting the region of interest corresponding to the prompt.

This opens up several potential industrial-grade applications in building extraction from EOS imagery, salient object detection and anomaly detection.

The FastSAM method seems to have some weakness, primarily (1) the low quality of small-sized segmentation maks which have large confidence scores. This is because the confidence score comes from the YOLOv8 model and is not strongly correlated to the mask quality; (2) masks of some tiny-sized objects tend to be near the square, while masks of larger objects tend to have artifacts at the border of the bounding boxes.

 

Paper📓: https://arxiv.org/abs/2306.12156
Code🛠️:https://github.com/CASIA-IVA-Lab/FastSAM
Dataset💽:SA 1B Dataset

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top