This repository contains the implementation of Privacy preserving vertical federated learning for tree-based models. This paper proposes a private and efficient solution for tree-based models, including decision tree (DT), random forest (RF), and gradient boosting decision tree (GBDT), under the vertical federated learning (VFL) setting. The solution is based on a hybrid of threshold partially homomorphic encryption (TPHE) and secure multiparty computation (MPC) techniques.
You can build the docker image using tools/docker/Dockerfile (test passed on Ubuntu20.04), or download the pre-built image from docker hub here.
After building the image, follow the steps in tools/docker/README.md to run the test on a single machine.
If want to build from source, you can follow the steps in tools/docker/Dockerfile, but need to update some configurations on your host machine.
data/networks/Parties.txt: defining the participating parties' ip addresses and portssrc/include/common.h:
DEFAULT_PARAM_DATA_FILE: the SPDZ related party file (in the Pivot-SPDZ folder)SPDZ_PORT_NUM_DT: the port for connecting to SPDZ decision tree MPC programSPDZ_PORT_NUM_DT_ENHANCED: the port for connecting to SPDZ decision tree prediction of the enhanced protocolsrc/include/common.h: e.g., the number of partiesPrograms/Source/vfl_decision_tree.mpc
PORT_NUM: same as SPDZ_PORT_NUM_DTMAX_NUM_CLIENTS: the maximum number of clients could handleMAX_CLASSES_NUM: the maximum number of classes for classification (by default is 2 for regression)Programs/Source/vfl_dt_enhanced_prediction.mpc
PORT_NUM: same as SPDZ_PORT_NUM_DT_ENHANCEDMAX_NUM_CLIENTS: the maximum number of clients could handleMAX_TREE_DEPTH: the maximum depth of the evaluated tree, must be the same as in PivotTESTING_NUM: the number of samples in the testing stage, must be the exact at the momentfast-make.sh: modify Setup.x and setup-online.sh (the security parameter is 128 bits)MY_CFLAGS = -DINSECURE is in the CONFIG.mine file (for running fake online protocol)make mpir to generate required mpir lib;bash fast-make.sh to generate pre-requisite programs and parameters;./compile.py ${PIVOT_SPDZ_HOME}/Programs/Source/vfl_decision_tree.mpc
./compile.py ${PIVOT_SPDZ_HOME}/Programs/Source/vfl_dt_enhanced_prediction.mpc
mkdir build
cmake -Bbuild -H.
cd build/
make
./semi-party.x -F -N 3 -I -p 0 vfl_decision_tree
./semi-party.x -F -N 3 -I -p 1 vfl_decision_tree
./semi-party.x -F -N 3 -I -p 2 vfl_decision_tree
./Pivot --client-id 0 --client-num 3 --class-num 2 --algorithm-type 0
--tree-type 0 --solution-type 0 --optimization-type 1
--network-file ${PIVOT_HOME}/data/networks/Parties.txt
--data-file ${PIVOT_HOME}/data/bank_marketing_data/client_0.txt
--logger-file ${PIVOT_HOME}/log/release_test/bank_marketing_data
--max-bins 16 --max-depth 3 --num-trees 1
./Pivot --client-id 1 --client-num 3 --class-num 2 --algorithm-type 0
--tree-type 0 --solution-type 0 --optimization-type 1
--network-file ${PIVOT_HOME}/data/networks/Parties.txt
--data-file ${PIVOT_HOME}/data/bank_marketing_data/client_1.txt
--logger-file ${PIVOT_HOME}/log/release_test/bank_marketing_data
--max-bins 16 --max-depth 3 --num-trees 1
./Pivot --client-id 2 --client-num 3 --class-num 2 --algorithm-type 0
--tree-type 0 --solution-type 0 --optimization-type 1
--network-file ${PIVOT_HOME}/data/networks/Parties.txt
--data-file ${PIVOT_HOME}/data/bank_marketing_data/client_2.txt
--logger-file ${PIVOT_HOME}/log/release_test/bank_marketing_data
--max-bins 16 --max-depth 3 --num-trees 1
vfl_dt_enhanced_prediction for the model prediction stage.
./semi-party.x -F -N 3 -I -p 0 -pn 6000 vfl_dt_enhanced_prediction
./semi-party.x -F -N 3 -I -p 1 -pn 6000 vfl_dt_enhanced_prediction
./semi-party.x -F -N 3 -I -p 2 -pn 6000 vfl_dt_enhanced_prediction
-pn parameter is the port for vfl_dt_enhanced_prediction connections (if not specified, default is 5000, as used for vfl_decision_tree)If you use our code in your research, please kindly cite:
@article{DBLP:journals/pvldb/WuCXCO20,
author = {Yuncheng Wu and
Shaofeng Cai and
Xiaokui Xiao and
Gang Chen and
Beng Chin Ooi},
title = {Privacy Preserving Vertical Federated Learning for Tree-based Models},
journal = {Proc. {VLDB} Endow.},
volume = {13},
number = {11},
pages = {2090--2103},
year = {2020}
}
To ask questions or report issues, please drop us an email.