DiffMOT (CVPR2024)

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction arxiv paper arxiv paper

Teaser

Framework

Framework Framework

News

Tracking performance

Benchmark Evaluation

Dataset HOTA IDF1 Assa MOTA DetA Weight Results
DanceTrack 62.3 63.0 47.2 92.8 82.5 download DanceTrack_Results
SportsMOT 76.2 76.1 65.1 97.1 89.3 download SportsMOT_Results
MOT17 64.5 79.3 64.6 79.8 64.7 download MOT17_Results
MOT20 61.7 74.9 60.5 76.7 63.2 download MOT20_Results

Results on DanceTrack test set with different detector

Detector HOTA IDF1 MOTA FPS
YOLOX-S 53.3 56.6 88.4 30.3
YOLOX-M 57.2 58.6 91.2 25.4
YOLOX-L 61.5 61.7 92.0 24.2
YOLOX-X 62.3 63.0 92.8 22.7

The tracking speed (including detection and tracking speed) is test on an RTX 3090 GPU. Smaller detectors can achieve higher FPS, which indicates that DiffMOT can flexibly choose different detectors for various real-world application scenarios. With YOLOX-S, the tracking speed of the entire system can reach up to 30.3 FPS.

Video demos

I. Installation.

conda create -n diffmot python=3.9
conda activate diffmot
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirement.txt
cd external/YOLOX/
pip install -r requirements.txt && python setup.py develop
cd ../external/deep-person-reid/
pip install -r requirements.txt && python setup.py develop
cd ../external/fast_reid/
pip install -r docs/requirements.txt

II. Prepare Data.

The file structure should look like:

{DanceTrack ROOT}
|-- dancetrack
|   |-- train
|   |   |-- dancetrack0001
|   |   |   |-- img1
|   |   |   |   |-- 00000001.jpg
|   |   |   |   |-- ...
|   |   |   |-- gt
|   |   |   |   |-- gt.txt            
|   |   |   |-- seqinfo.ini
|   |   |-- ...
|   |-- val
|   |   |-- ...
|   |-- test
|   |   |-- ...
{SportsMOT ROOT}
|-- sportsmot
|   |-- splits_txt
|   |-- scripts
|   |-- dataset
|   |   |-- train
|   |   |   |-- v_1LwtoLPw2TU_c006
|   |   |   |   |-- img1
|   |   |   |   |   |-- 000001.jpg
|   |   |   |   |   |-- ...
|   |   |   |   |-- gt
|   |   |   |   |   |-- gt.txt
|   |   |   |   |-- seqinfo.ini         
|   |   |   |-- ...
|   |   |-- val
|   |   |   |-- ...
|   |   |-- test
|   |   |   |-- ...
{MOT17/20 ROOT}
|-- mot
|   |-- train
|   |   |-- MOT17-02
|   |   |   |-- img1
|   |   |   |   |-- 000001.jpg
|   |   |   |   |-- ...
|   |   |   |-- gt
|   |   |   |   |-- gt.txt            
|   |   |   |-- seqinfo.ini
|   |   |-- ...
|   |   |-- MOT20-01
|   |   |   |-- img1
|   |   |   |   |-- 000001.jpg
|   |   |   |   |-- ...
|   |   |   |-- gt
|   |   |   |   |-- gt.txt            
|   |   |   |-- seqinfo.ini
|   |   |-- ...
|   |-- test
|   |   |-- ...

and run:

python dancetrack_data_process.py
python sports_data_process.py
python mot_data_process.py

III. Model ZOO.

Detection Model

We provide some trained YOLOX weights in download for DiffMOT. Some of them are inherited from ByteTrack, DanceTrack, and MixSort.

ReID Model

Ours ReID models for MOT17/MOT20 is the same as BoT-SORT , you can download from MOT17-SBS-S50, MOT20-SBS-S50. The ReID model for DanceTrack is the same as Deep-OC-SORT, you can download from Dance-SBS-S50. The ReID model for SportsMOT is trained by ourself, you can download from Sports-SBS-S50.

Notes:

Motion Model (D$^2$MP)

Refer to models. We train on DanceTrack and MOT17/20 for 800 epochs, and train on SportsMOT for 1200 epochs.

IV. Training.

Train the detection model

Train the ReID model

Train the motion model (D$^2$MP)

python main.py --config ./configs/dancetrack.yaml
python main.py --config ./configs/sportsmot.yaml
python main.py --config ./configs/mot.yaml

V. Tracking.

Prepare detections

Prepare ReID embeddings

Track on DanceTrack

python main.py --config ./configs/dancetrack_test.yaml

Track on SportsMOT

python main.py --config ./configs/sportsmot_test.yaml

Track on MOT17

python main.py --config ./configs/mot17_test.yaml

Track on MOT20

python main.py --config ./configs/mot20_test.yaml

Contact

If you have some questions, please concat with kroery@shu.edu.cn.

Acknowledgement

A large part of the code is borrowed from DDM-Public and Deep-OC-SORT. Thanks for their wonderful works.

Citation

@inproceedings{lv2024diffmot,
  title={DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction},
  author={Lv, Weiyi and Huang, Yuhang and Zhang, Ning and Lin, Ruei-Sung and Han, Mei and Zeng, Dan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19321--19330},
  year={2024}
}