Multi-Camera Vehicle Tracking and Re-identification

This repository contains our source code of Track 3 in the NVIDIA AI City Challenge Workshop at CVPR 2018.

[Full source code], [Slides], [Paper], [Poster], [Project Page], [2018 NVIDIA AI City Challenge]

How It Works

We achieved Multi-Camera Vehicle Tracking and Re-identification based on a fusion of histogram-based adaptive appearance models, DCNN features, detected license plates, detected car types and traveling time information.

Getting Started

Prerequisites

The code has been tested on Ubuntu 16.04.

Dataset

Track 3 dataset

The track 3 dataset contains 15 videos from 4 different locations, each around 0.5-1.5 hours long, recorded at 30 fps and 1080p resolution (1920×1080). The task is to identify all vehicles that pass through each recorded location at least once in the given set of videos. The camera locations and linked areas are shown below:

UA-DETRAC dataset

The UA-DETRAC dataset includes 10 hours of videos captured with a Canon EOS 550D camera at 24 different locations at Beijing and Tianjin in China. The videos are recorded at 25 frames per seconds (fps), with resolution of 960×540 pixels. There are more than 140 thousand frames in the UA-DETRAC dataset and 8250 vehicles that are manually annotated, leading to a total of 1.21 million labeled bounding boxes of objects. The primary aim of this dataset is to train vehicle detection algorithms.

CompCars dataset

The Comprehensive Cars (CompCars) dataset contains data from two scenarios, including images from web-nature and surveillance-nature. The web-nature data contains 163 car makes with 1,716 car models. The dataset is well prepared for the following computer vision tasks:

You can find a pre-trained model here. Please refer the dataset website and follow the author's instruction to download the dataset.

BoxCars dataset

The BoxCars dataset contains 116k of images of vehicles with fine-grained labels taken from surveillance cameras under various viewpoints.

Input/Output Format

Simply run bash src/run_all.sh in the command line. The input is single camera tracking results for all 15 videos by using our method in Track 1. The format of each line is as follows:

<video_id> <frame_id> <obj_id> <xmin> <ymin> <xmax> <ymax> <speed> <confidence>

The output is all possible candidates which will be used for license plate comparison. The format of each line is as follows:

<img_path> <similarity>

Demo [video]

Reference

Please cite these papers in your publications if it helps your research:

@inproceedings{tang2018vehicle,
  author = {Zheng Tang and Gaoang Wang and Hao Xiao and Aotian Zheng and Jenq-Neng Hwang},
  booktitle = {CVPR Workshop (CVPRW) on the AI City Challenge},
  title = {Single-camera and Inter-camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features},
  year = {2018},
  pages = {108--115}
}