DISB: DNN Inference Serving Benchmark

DISB is a DNN Inference Serving Benchmark with diverse workloads and models. It was originally designed to simulate real-time scenarios, e.g. autonomous driving systems, where both low latency and high throughput are demanded.

DISB uses the client-server architecture, where the clients send the DNN inference requests to the server via RPC, and the server returns the inference result. Clients can submit the inference requests periodically or randomly. An inference request may contain the model name (or id), the input data and other customized attributes (e.g., priority or deadline).

Note: Please use git lfs to clone this repo in order to download model files.

Table of Contents

DISB Toolkit

DISB provides a C++ library (libdisb) to perform benchmarking. To integrate your own DNN inference system with DISB, you only need to implement DISB::Client to wrap your inference interface. See usage for details.

DISB Workloads

Currently, DISB provides 5 workloads with different DNN models and different number of clients.

There are three pattern for submitting inference requests in DISB clients:

  1. Uniform Distribution (U): The client sends inference requests periodically, with a fixed frequency (e.g., 20 reqs/s). This pattern is common in data-driven applications (e.g., obstacle detection with cameras).
  2. Poisson Distribution (P): The client sends inference requests in a Poisson distribution pattern with a given average arrival speed (e.g., 25 reqs/s). This pattern can simulate event-driven applications (e.g., speech recoginition).
  3. Closed-loop (C): The client continuously sends inference requests, which simulates a contention load.
  4. Trace (T): The client sends inference requests according to a given trace file which contains a series of request time points. This pattern can reproduce real world workloads.
  5. Dependent (D): The client sends inference requests when all prior tasks have completed, prior tasks can be other clients. This pattern can simulate inference graph (or inference DAG), where a model need the output of another model as its input.

We combined these patterns into 6 typical workloads for benchmarks, see workloads for workload details.

[TBD] We're still working on providing more representative and general DNN inference serving workloads.

Build & Install

Install dependencies:

sudo apt install build-essential cmake
sudo apt install libjsoncpp-dev

Build and install DISB tools:

# will build and install into disb/install
make build

Usage

Samples

Benchmark Results

We have supported DISB on some mainstream DNN inference serving frameworks, including:

We tested these DNN inference serving frameworks under 6 DISB Workloads. Test results are shown in results.md.

[TBD] We're still working on supporting more DNN inference serving frameworks.

Paper

If you use DISB in your research, please cite our paper:

@inproceedings {osdi2022reef,
  author = {Mingcong Han and Hanze Zhang and Rong Chen and Haibo Chen},
  title = {Microsecond-scale Preemption for Concurrent {GPU-accelerated} {DNN} Inferences},
  booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
  year = {2022},
  isbn = {978-1-939133-28-1},
  address = {Carlsbad, CA},
  pages = {539--558},
  url = {https://www.usenix.org/conference/osdi22/presentation/han},
  publisher = {USENIX Association},
  month = jul,
}

The Team

DISB is developed and maintained by members from IPADS@SJTU and Shanghai AI Laboratory. See Contributors.

Contact Us

If you have any questions about DISB, feel free to contact us.

Weihang Shen: shenwhang@sjtu.edu.cn

Mingcong Han: mingconghan@sjtu.edu.cn

Rong Chen: rongchen@sjtu.edu.cn

License

DISB is released under the Apache License 2.0.