Video Class Agnostic Segmentation

Video Class Agnostic Segmentation Benchmark

Video class agnostic segmentation is the task of segmenting objects without regards to its semantics combining appearance, motion and geometry from monocular video sequences. The main motivation behind this is to account for unknown objects in the scene and to act as a redundant signal along with the segmentation of known classes for better safety as shown in the following Figure.

There are two main formulation for this problem, we provide a benchmark with these two main tracks

Motion Segmentation Track Open-set Segmentation Track Demo

Motion Segmentation Track

It poses the problem as segmenting moving objects like animals crossing the street, or unusual construction vehicles.

We provide an improved dataset for motion instance segmentation towards that end where we mainly focus on increasing the sequences and categories to avoid overfitting to a certain semantic class for moving objects. We build upon the publicly available Cityscapes-VPS[6] and KITTI-MOTS[7] datasets. We further provide baselines for real-time joint panoptic and motion instance segmentation that are publicly released under Models.

Dataset	#Frames	#Seqs	#Cats	Instances	Panoptic	Tracking	Annotation Type
DAVIS[1]	6208	90	78	Yes	No	Yes	Manually Labelled
Kitti-Motion[5]	455	-	1	No	No	No	Manually Labelled
Kitti-MoSeg[2][3]	12919	38	1	No	No	No	Weak Annotations
Cityscapes-Motion[5]	3475	-	1	Yes	No	No	Manually Labelled
Kitti-MoSeg Extended[4]	12919	38	5	Yes	No	No	Weak Annotations
Ours	11008	520	8	Yes	Yes	Yes	Manually Labelled

Cityscapes-VPS Motion Kitti-MOTS Motion

Preview Download Data Benchmark Demo

[1] Sergi Caelles, Jordi Pont-Tuset, Fed-erico Perazzi, Alberto Montes, Kevis-Kokitsi Maninis,and Luc Van Gool. The 2019 davis challenge on vos:Unsupervised multi-object segmentation.arXiv preprintarXiv:1905.00737, 2019.
[2] Mennatullah Siam, Heba Mahgoub, Mo-hamed Zahran, Senthil Yogamani, Martin Jagersand, andAhmad El-Sallab. Modnet: Moving object detection net-work with motion and appearance for autonomous driving.arXiv preprint arXiv:1709.04821, 2017.
[3] Hazem Rashed, Mohamed Ramzy,Victor Vaquero, Ahmad El Sallab, Ganesh Sistu, andSenthil Yogamani. Fusemodnet: Real-time camera and li-dar based moving object detection for robust low-light au-tonomous driving. InThe IEEE International Conferenceon Computer Vision (ICCV) Workshops, Oct 2019.
[4] EslamMohamed,MahmoudEwaisha, Mennatullah Siam, Hazem Rashed, Senthil Yo-gamani, and Ahmad El-Sallab. Instancemotseg: Real-timeinstance motion segmentation for autonomous driving.arXiv preprint arXiv:2008.07008, 2020.
[5] Johan Vertens, Abhinav Valada, andWolfram Burgard. Smsnet: Semantic motion segmentationusing deep convolutional neural networks. InProceed-ings of the IEEE International Conference on IntelligentRobots and Systems (IROS), Vancouver, Canada, 2017.
[6] Dahun Kim, Sanghyun Woo, Joon-YoungLee, and In So Kweon. Video panoptic segmentation. InProceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, pages 9859–9868, 2020.
[7] PaulVoigtlaender,MichaelKrause, Aljosa Osep, Jonathon Luiten, Berin Balachan-dar Gnana Sekar, Andreas Geiger, and Bastian Leibe.Mots: Multi-object tracking and segmentation. InCon-ference on Computer Vision and Pattern Recognition(CVPR), 2019.

Open-set Segmentation Track

It poses the problem as segmenting classes outside the closed set of known classes. Thus, it can segment unknown static objects like traffic warnings or other rare objects in parking lots near markets.

We build Custom Carla scenarios and provide a synthetic data for the open-set segmentation with fine-grained class annotations for the unknown objects. The fine-grained annotations help to perform a controlled set of experiments about the objects labelled as unknown during training versus testing to understand the generalization ability of the model. In order to collect large-scale data, the Carla basic agent is modified to cope with unknown objects in the road and avoid them through lane changing. A large-scale dataset of approximately 70,000 frames is collected with random traffic, weather conditions and different towns and scenarios.

Construction Parking Barrier

Dataset Coming Soon ...