MiniAutoML

cover page

This repository contains source code examples for Engineering Deep Learning Systems. Purchase a copy here.

To demonstrate design principles introduced in the book, we built this mini system (5 micro-services) with Java (web API) and Python (model training code). The system can run on Mac or Linux if you have Docker and Kubernetes (optional) installed, and it addresses the core requirements of a machine learning system: dataset management, model training and model serving.

Learning a deep learning system could be intimidating. The complexity of web services setup, intricacies of business logic, and variety of libraries and framework installations prevent us from getting to the core piece of a deep learning system. To enable you to easily access the core implementation of the system, we did several things to keep this demo system simple:

  1. Use gRPC to build the web API to avoid boilerplate code.
  2. Offer shell scripts to run the system and interact with services.
  3. Code is simple and short. Each service of the system is built with less than a few thousand lines of code. We only keep the bare minimum code that is sufficient to demonstrate the design principles.

System Overview

Our mini deep learning system consists of four services and one storage system, they are:

In the book Engineering Deep Learning Systems, each of these serivces gets their own chapter. To play with the system locally, please install the system requirements and then follow the instructions in the lab section.

System Requirements

The installation of system requirements are not included in the scripts folder. Please make sure those requirements are met before executing scripts in the scripts folder.

Module List

In the root folder you'll find a Maven project description file pom.xml, which describes a multi-module Java project.

Lab

After installing the system requirements: docker, Minio, grpcurl and JQ, you can use our lab scripts to setup the sample deep learning system locally and start to play with it, such building a NLP model. By exeucting these scripts one by one, you will see a complete deep learning cycle, from data collection to model training and model serving.

It is strongly recommended to create and activate a brand new Conda environment before running any lab scripts to avoid long setup times and possible failures. You can do so by running

conda create --name miniautoml-lab
conda activate miniautoml-lab

Please run the lab scripts from the root folder, a side note for running our lab scripts successfully is to execute them from the root folder where you download this project.

The lab scripts are placed at the scripts folder. They start with the word "lab". Please execute them in order. See below explaination for each script.

Next Steps

  1. Look at the service definitions in grpc-contract
  2. Play with data-management
  3. Play with training-service