This artifact is for paper "Demystifying the Dependency Challenge in Kernel Fuzzing". Fuzz testing operating system kernels remains a daunting task to date. One known challenge is that much of the kernel code is locked under specific kernel states and current kernel fuzzers are not effective in exploring such an enormous state space. We refer to this problem as the dependency challenge. Though there are some efforts trying to address the dependency challenge, the prevalence and categorization of dependencies have never been studied. Most prior work simply attempted to recover dependencies opportunistically whenever they are relatively easy to recognize. We undertake a substantial measurement study to systematically understand the real challenge behind dependencies. In one word, the artifact is to help researchers to understand the dependency challenge in kernel fuzzing.
https://doi.org/10.5281/zenodo.6029158https://drive.google.com/drive/folders/1Ts4P4iC2PHihtBviSXMUkn3My0PLkowN?usp=sharinghttps://doi.org/10.5281/zenodo.6029520https://github.com/ZHYfeng/Dependencyhttps://doi.org/10.5281/zenodo.5441138data.tar.gz in https://drive.google.com/drive/folders/1Ts4P4iC2PHihtBviSXMUkn3My0PLkowN?usp=sharingsudo apt install -y git
git clone https://github.com/ZHYfeng/Dependency.git
cd Dependency
bash build_script/build.bash
path-of-Dependency/workdir/image
doc of syzkaller: https://github.com/google/syzkaller/blob/master/docs/linux/setup_ubuntu-host_qemu-vm_x86-64-kernel.md
the image we build: image.tar.gz inhttps://drive.google.com/drive/folders/1Ts4P4iC2PHihtBviSXMUkn3My0PLkowN?usp=sharing
-fsanitize-coverage=no-prune to CFLAGS_KCOV in kernel configpath-of-Dependency/workdir/13-linux-clang-np
the kernel we build: linux-clang-np.tar.gz in
https://drive.google.com/drive/folders/1Ts4P4iC2PHihtBviSXMUkn3My0PLkowN?usp=sharing
-fembed-bitcode -save-temps=obj
https://github.com/ZHYfeng/Generate_Linux_Kernel_Bitcode/tree/master/Achieve/01-change-makefile
the bitcode we build: linux-clang-np-bc-f.tar.gz inhttps://drive.google.com/drive/folders/1Ts4P4iC2PHihtBviSXMUkn3My0PLkowN?usp=sharing
cd path-of-Dependency/workdir/13-linux-clang-np
objdump -d vmlinux > vmlinux.objdump
a2l -objdump=vmlinux.objdump
the workdir we prepare: workdir.tar.gz in
https://drive.google.com/drive/folders/1Ts4P4iC2PHihtBviSXMUkn3My0PLkowN?usp=sharing
dev_xxx in path-of-Dependency/workdirbuilt-in.bc and built-in.spath-of-Dependency/04-experiment_script/json/dra.json and path-of-Dependency/04-experiment_script/json/syzkaller.json.
change the value of
file_bcindra.jsonto the relative path for the bitcode of device driver you test
change the value ofpath_sindra.jsonto the relative path of device driver you test
path-of-Dependency/04-experiment_script/python/run.pyhttps://zenodo.org/record/5348989/files/static-taint-analysis-component.zip(the path based on virtual machine)
active the environment
source /home/icse22ae/Dependency/environment.sh
pick one device driver in /home/icse22ae/Dependency/workdir/workdir, for examplecdrom:
cd /home/icse22ae/Dependency/workdir/workdir/dev_cdrom
configure the run script
time_run: the second of fuzzing time.
number_execute: the number of fuzzing runs.
number_vm_count: the number of vm in each fuzzing.
In our paper, time_run is at least 48 hours, number_execute is 3 and number_vm_count is 32.
For artifact evaluation, number_execute and number_vm_count could be 1.time_run should be at least 5 mins(20 mins for device driver kvm)
run our tool using script It will automatically stop after time_run.
python3 run.py
read the results
still in the same environment in step 1 and the same path in step 2.
go run /home/icse22ae/Dependency/03-syzkaller/tools/read_result/ -a2i
Based on the different fuzzing configuration and device driver, the time would be differnet.
For cdrom, it should be several mins. For kvm, it needs several hours.
You can find the results used in our paper in /home/icse22ae/Dependency/workdir/data.
dataDependency.bin, dataResult.bin, dataRunTime.bin, statistics.bin in ./0 or ./1 or ./2 are the resutls in protobuf format.
The protobuf files are in
/home/icse22ae/Dependency/05-proto
0_coverage.txt is the coverage of the fuzzing in ./0. coverage.txt is the average coverage of all runs.Each line is time@number-of-edge.conditionD.txt lists all unresolved condition related to dependency.conditionND.txt lists all unresolved condition not related to dependency.conditionDN.txt lists all unresolved condition related to dependency but our static analysis can not find their write statements.intersection.txt is the intersection coverage of all runs and union_coverage.txt is the union coverage of all runs. Each line is the address of the edge.OutsideFunctions.txt is the Unreachable Functions Elimination mentioned in our paper.statistic.txt is the statistic used in our paper.uncovered.txt lists all uncovered edge and its unresovled conditions, and uncovered_more.txt lists more details about them.Still use dev_cdrom as example and the results can be found in data.tar.gz as mentioned in Section Evaluation Data
All unresolved condition related to dependency in conditionD.txt, for example:
0xffffffff8579b9b7@https://elixir.bootlin.com/linux/v4.16/source/drivers/cdrom/cdrom.c#L2279@0xffffffff8579b960@https://elixir.bootlin.com/linux/v4.16/source/drivers/cdrom/cdrom.c#L2279@mmc_ioctl_cdrom_read_audio@if.end11.i@
@ @0xffffffff857a3eaa@https://elixir.bootlin.com/linux/v4.16/source/drivers/cdrom/cdrom.c#L2124@1@
@ @0xffffffff8579b421@https://elixir.bootlin.com/linux/v4.16/source/drivers/cdrom/cdrom.c#L2228@0@
@ @0xffffffff8579b05a@https://elixir.bootlin.com/linux/v4.16/source/drivers/cdrom/cdrom.c#L2187@1@
0xffffffff8579b9b7 is the assembly address of unresovled branch in binary and https://elixir.bootlin.com/linux/v4.16/source/drivers/cdrom/cdrom.c#L2279 is the source code of the unresolved dependency. 0xffffffff8579b960 is the assembly address of condition of the unresovled branch and also https://elixir.bootlin.com/linux/v4.16/source/drivers/cdrom/cdrom.c#L2279 is the source code. if.end11.i is the name of basic block in LLVM bitcode.
Next lines are the write addresses for the unresolved dependency.
Then we can find a file 0xffffffff8579b9b7.txt, which is named by the assembly address of unresovled branch. Inside this file, we can find the number of dominator instructions of this unresolved dpendnecy, the inputs (test cases) from syzkaller which can arrive unresolved dpendnecy, the inputs which can arrive the write address. We can also find the call chain of write address starting from entry function.
02-dependency
02-dependency/lib/DMM/: mapping between assembly address in the binary and basic block in LLVM bitcode02-dependency/lib/RPC/: work with fuzzing component (syzkaller) using Protobuf and gRPC02-dependency/lib/STA/: work with static analysis component using JSON02-dependency/lib/DCC/: output human-readable information and statistics for unresolved conditions03-syzkaller
03-syzkaller/syz-fuzzer/: modification for collecting more complete coverage and other related useful information from fuzzing03-syzkaller/pkg/dra/: work with mapping component and output results using Protobuf and gRPC05-proto: all Protobuf files