Clang static analyzer是一个基于clang的c/c++/object-c源代码检测框架。它首先对源文件进行预处理然后使用符号执行遍历整个源文件。开发者们可以开发自己的插件通过hook的方式在符号执行过程中与框架进行交互。该框架提供了大量的api接口,开发者可以利用这些接口得到大量有用的信息来帮助发现潜在问题。
目前我们实现的有三个chekcer,包括:
推荐安装环境:ubuntu 16.04 LTS x64
安装cmake及Z3
apt-get install cmake
git clone https://github.com/Z3Prover/z3.git ~/z3
cd z3
python scripts/mk_make.py
cd build
make
sudo make install
下载源码及编译
cd ~
mkdir clang
cd clang
git clone https://github.com/GoSSIP-SJTU/TripleDoggy.git ./llvm
mkdir build
cd build
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../llvm
make
cd ..
#测试NewDereferenceChecker
./build/bin/clang -cc1-analyze-analyzer-checker=alpha.unix.NewDereference ./llvm/tripledoggy_test/nulldereference.c
#测试DoubleFreeChecker
./build/bin/clang -cc1-analyze-analyzer-checker=alpha.unix.DoubleFree ./llvm/tripledoggy_test/doublefree.c
#测试OverflowChecker
./build/bin/clang -cc1-analyze-analyzer-checker=alpha.unix.OverFlow ./llvm/tripledoggy_testoverflow.c
基于clang此前已经开发的DereferenceChecker(空指针解引用)插件,我们设计了自己新的NewDereferenceChecker。
首先我们提出几点观点,即在源代码中指针最原始的来源途径,无关操作系统与底层架构:
我们收集了一些在2017年评分高于7.5的CVE,手动检查了他们最后得出一个结论,绝大部分的空指针解引用bug来源于未对内存分配的返回作校验。所以我们作出如下几点假设:
注意在第一种情况中包括了这种形式,return "abc",这等价于对一个全局变量取地址。两个别名变量的指针是相等的。如**p=q。
所以我们的算法如下:
我们基于几点假设来识别内存分配函数:
注意这些只是经验总结,可能会发生改变。
我们在8个CVE上进行测试并获得87%的检测率,对于每个CVE文件平均产生6个warning,我们甚至找到一个由于不正确初始化导致的空指针解引用bug,目前还未被报出来。对于一个未被检测出来的CVE,我们手工调试了该过程发现导致该错误的原因为符号执行的路径爆炸。
if (xxx)
return "sdfsdf";
else
return null;
Based on the pre-designed DereferenceChecker that has been developed by the clang project itself, we implement our NewDereferenceChecker.
Fist at all, we came up with the idea that pointer originally comes from five ways regardless of the operatins system and the low level architecture:
we collected several CVEs with score higher than 7.5 in 2017, inspected them and finally came to the conclusion that majority of bugs came from lack check of the return value of memory allocate function. so we made an assumption:
note that in case 1, it also includes such situation, return "abc", which equals to retrieving address from global variable. Two alais are the same pointer, for example **p=q.
So our algorithm is:
Here, we identify memory allocation function base on several assumption:
Note that these are empirical assumptions which can be changed in the future.
We have tested the plugin in 8 CVEs with 87% rate of discovery of bugs. For each CVE, it generates average 6 warnings. We even found a bugs that can be caused by incorrect initialization of a structure which has not been reported. For the one that has not been discovered, we manually debug the process to see that it was because of the limited power of symbolic execution.
if (xxx)
return "sdfsdf";
else
return null;
Double free漏洞的成因顾名思义,就是指一块内存被重复的释放两次以上。Clang static analyzer中的自己实现的检测算法为通过hook对应的内存分配释放函数来记录一块内存的状态,当发现有释放同一块内存的操作时,报告漏洞。然而在实际使用的源文件中,情况要比描述的复杂,主要由以下两个情况导致:
基于以上两点,我们提出的double free,内存泄漏及USE AFTER FREE的分析方法为:基于此前我们使用的启发式的内存分配函数的识别方法识别出内存分配函数和释放函数。定义两个集合,分别为已经分配未释放的内存集合A,已释放的内存集合B。通过符号执行,分析遇到内存分配函数时将该内存记录到A中,分析遇到释放函数时,将该块内存记录到B集合中并且删除A中对应的内存(如果存在的话),再次遇到对该块内存的释放操作时报告漏洞。在符号dead时检测是否存在A集合中的元素,存在则报告内存泄漏。在访问内存数据 时,检测所在内存是否在B集合中,在则报告UAF漏洞。
double-free vulnerability,as its name said, means that the same memory being freed more than twice. The checker used to find such vulnerability which has been developed by clang project hooks the memory-related function to record the state of one memory region. when a freed memory is going to be freed, it reports a warning. Howeverk, in reality, things get more complicated. here are the two main reasons:
Based on the two reasons, we proposed our algorithm: we defined two set:A, record the allocated memory, B, record the freed memory.During the symbolic execution, we add the memory to A when it meets a alloc-function while add the memory to b when its meet a free-function. we report a double free vulnerability when it try to free a memory that has been added to B.
Integer Overflow的漏洞检测较为复杂,导致该种漏洞复杂的原因在于:
integeroverflow vulnerability is more complicated, the main reasons are:
we use taint analysis to implements ours checker: