Code for paper in CVPR2019, 'Shifting More Attention to Video Salient Object Detection'
Authors: Deng-Ping Fan, Wenguan Wang, Ming-Ming Cheng, Jianbing Shen.
[Supplementary material is also attached in the repo]
Paper with code: https://paperswithcode.com/task/video-salient-object-detection
Figure 1: OVerall architecture of the proposed SSAV model.
The last decade has witnessed a growing interest in video salient object detection (VSOD). However, the research community long-term lacked a well-established VSOD dataset representative of real dynamic scenes with high-quality annotations. To address this issue, we elaborately collected a visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions. With corresponding real human eye-fixation data, we obtain precise ground-truths. This is the first work that explicitly emphasizes the challenge of saliency shift, i.e., the video salient object(s) may dynamically change. To further contribute the community a complete benchmark, we systematically assess 17 representative VSOD algorithms over seven existing VSOD datasets and our DAVSOD with totally ~84K frames (largest-scale). Utilizing three famous metrics, we then present a comprehensive and insightful performance analysis. Furthermore, we propose a baseline model. It is equipped with a saliencyshift-aware convLSTM, which can efficiently capture video saliency dynamics through learning human attention-shift behavior. Extensive experiments1 open up promising future directions for model development and comparison.
The saliency shift is not just represented as a binary signal, w.r.t., whether it happens in a certain frame. Since we focus on an object-level task, we change the saliency values of different objects according to the shift of human attention. The rich annotations, including saliency shift, object-/instance-level ground-truths (GT), salient object numbers, scene/object categories, and camera/object motions, provide a solid foundation for VSOD task and benefit a wide range of potential applications.
Figure 2: Annotation examples of our DAVSOD dataset.
Figure 3: Statistics of the proposed DAVSOD dataset.
Figure 3 shows (a) Scene/object categories. (b, c) Distribution of annotated instances and image frames, respectively. (d) Ratio distribution of the objects/instances. (e) Mutual dependencies among scene categories in (a).
DAVSOD dataset.
Note:
SSAV model.
Popular Existing Datasets.
Previous datasets have different formats, causing laborious data preprocessing when training or testing. Here we unified the format of all the datasets for easier use:
Year | Publisher | Dataset | Clips | Download Link1 | Download Link2 |
---|---|---|---|---|---|
2010 | BMVC | SegV1 | 5(11.9MB) | Baidu Pan | Google Driver |
2013 | ICCV | SegV2 | 14(84.9MB) | Baidu Pan | Google Driver |
2014 | TPAMI | FBMS | 30 (790MB) | Baidu Pan | Google Driver |
2014 | TPAMI | FBMS-59 (overall) | 59(817.52MB) | Baidu Pan | Google Driver |
2015 | TIP | MCL | 9(308MB) | Baidu Pan | Google Driver |
2015 | TIP | ViSal | 17(59.1MB) | Baidu Pan | Google Driver |
2016 | CVPR | DAVIS | 50(842MB) | Baidu Pan | Google Driver |
2017 | TCSVT | UVSD | 18(207MB) | Baidu Pan | Google Driver |
2018 | TIP | VOS | 200(3.33GB) | Baidu Pan | Google Driver |
VOS_test | 40(672MB) | Baidu Pan | Google Driver | ||
2019 | CVPR | DAVSOD | 187(10.24G) | Baidu Pan (fetch code: ivzo) | Google Driver |
All datasets | Google Driver |
* We do not include the “penguin sequence” of the SegTrack-V2 dataset due to its inaccurate segmentation. Thus the results of the MCL dataset only contains 13 sequences in our benchmark. For VOS dataset, we only benchmark the test sequence divided by our benchmark (Traning Set: Val Set: Test Set = 6:2:2).
Papers & Codes & Results (continue updating).
We have spent about half a year to execute all of the codes. You can download all the results directly for the convience of research. Please cite our paper if you use our results. The overall results link is here (Baidu|Google) (Update: 2019-11-17)
Year & Pub & Paper | Model | DAVIS | FBMS | MCL | SegV1 | SegV2 | UVSD | ViSal | VOS | DAVSOD & Overall |
---|---|---|---|---|---|---|---|---|---|---|
2008 & CVPR & PQFT | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2009 & JOV & SST | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2010 & ECCV & SIVM | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2014 & CVPR & TIMP | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2014 & TCSVT & SPVM | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2015 & CVPR & SAG | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2015 & TIP & GFVM | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2015 & TIP & RWRV | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2015 & ICCV & MB+ | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2016 & CVPR & MST | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2017 & TCSVT & SGSP | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2017 & TIP & SFLR | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2017 & TIP & STBP | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2017 & TCYB & VSOP | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2017 & BMVC & DSR3D | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2018 & TIP & STCRF | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2018 & TIP & SCOM | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2018 & TIP & DLVSD | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2018 & TCSVT & SCNN | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2018 & ECCV & MBNM | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2018 & ECCV & PDB | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2018 & CVPR & FGRNE | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
2019 & CVPR & SSAV | Code | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google | Baidu | Google |
The code has been tested successfully on Ubuntu 16.04 with CUDA 8.0 and OpenCV 3.1.0
Prerequisites:
Use DAVSOD/mycaffe-convlstm/Makefile.config
to build caffe. Follow official instructions.
make all -j8
make pycaffe
Downloading necessary data:
downloading the training/testing dataset and move it into Datasets/
.
downloading pre-trained caffemodel and move it into model/
, which can be found in baidu pan(Fetch Code: pb0h) | google drive.
Testing Configuration:
After you download the pre-trained model and testing dataset, just run generateTestList.py
to get the test list, and SSAV_test.py
to get the saliency maps in results/SSAV/
.
Just enjoy it!
The prediction results of all competitors and our PNS-Net can be found at Google Drive (7MB).
Training Configuration:
Evaluating model:
main.m
in DAVSOD/EvaluateTool/
. Details of the evaluation metrics please refer to the following papers.
[1]Structure-measure: A New Way to Evaluate the Foregournd Maps, ICCV2017.
[2]Enhanced Alignment Measure for Binary Foreground Map Evaluation, IJCAI2018.
Note that: This version only provide the implicit manner for learning attention-shift. The explicit way to train this model will not be released due to the commercial purposes (Hua Wei, IIAI).
Leaderboard.
Figure 4: Summarizing of 36 previous representative VSOD methods and the proposed SSAV model.
Benchmark.
Figure 5: Benchmarking results of 17 state-of-the-art VSOD models on 7 datasets.
Performance Preview
Figure 6: Blackswan sequence result generated by our SSAV model. More results can be found in supplemental materials.
DAVSOD Dataset Profile
Figure 7: Samples from our dataset, with instance-level ground truth segmentation and fixation map.
Figure 8: Examples to show our rigorous standards for segmentation labeling.
Figure 9: Example sequence of saliency shift considered in the proposed DAVSOD dataset. Different (5th row) from tradition work which labels all the salient objects (2th row) via static frames. The proposed DAVSOD dataset were strictly annotated according to real human fixation record (3rd row), thus revealing the dynamic human attention mechanism.
Watch the video for Supplemental Material
If you find this useful, please cite the following works:
@InProceedings{Fan_2019_CVPR,
author = {Fan, Deng-Ping and Wang, Wenguan and Cheng, Ming-Ming and Shen, Jianbing},
title = {Shifting More Attention to Video Salient Object Detection},
booktitle = {IEEE CVPR},
year = {2019}
}
@inproceedings{song2018pyramid,
title={Pyramid dilated deeper ConvLSTM for video salient object detection},
author={Song, Hongmei and Wang, Wenguan and Zhao, Sanyuan and Shen, Jianbing and Lam, Kin-Man},
booktitle={ECCV},
pages={715--731},
year={2018}
}
Contact: Deng-Ping Fan (dengpingfan@mail.nankai.edu.cn).