LLVM 代码混淆

这里是看雪学院课程LLVM与代码混淆技术的笔记,讲师是@34r7hm4n,讲的很好,希望大家有能力的支持一下原作者。

本文环境与作者稍有不同(WSL Kali + LLVM 16.0.0),作者提供的源代码有许多编译或者运行时会出现错误,本文中对其进行了修改,使之能够正常编译和运行,多数修改部分给出了注释或说明。

LLVM 架构

LLVM 分为前端和后端,前端的输入为源代码,经过词法分析、语法分析和语义分析,输出中间代码(IR)。后端的输入为中间代码,优化器对中间代码进行优化,最后输出目标程序。

编译 LLVM

LLVM 编译可执行文件的过程

LLVM IR 分为人类可阅读的形式(.ll),和易于机器处理的二进制格式(.bc)。首先将 Cpp 源码转化为 LLVM IR:

1
2
3
4
5
6
7
clang -S -emit-llvm hello.cpp -o hello.ll
# 生成文本形式

clang -c -emit-llvm hello.cpp -o hello.bc
# 生成二进制形式

# -emit-llvm 表示生成 LLVM IR

随后使用 opt 对 IR 进行优化:

1
2
3
4
opt -load LLVMObfuscator.so -hlw -S hello.ll -o hello.opt.ll

# -load 指定加载的 LLVM Pass 集合
# -hlw LLVM Pass 中自定义的参数,指定使用的 Pass

最后编译为可执行文件:

1
clang hello.opt.ll -o hello

第一个 LLVM Pass

前置知识

LLVM Pass 框架可以干预代码优化过程,使用 Pass 进行代码混淆。Pass 编译后通过 opt 进行加载,可以对 IR 进行分析和修改,最终影响生成的目标代码。Pass 提供了丰富的 API,可以在文档中查看。

部分重要的文件夹:

  • llvm/include/llvm,包含 llvm 提供的公共头文件
  • llvm/lib,存放了大部分源代码和部分不公开头文件
  • llvm/lib/Transforms,存放了 Pass 的源代码,LLVM 自带了一些 Pass

LLVM Pass 有三种编译方式:

  • 与整个 LLVM 一起重新编译,Pass 代码存放到 llvm/lib/Transforms 文件夹中
  • 通过 CMake 单独编译,此处使用这种方式
  • 通过命令行单独编译

编写 Pass 前需要确定 Pass 的类型,LLVM 有多种类型的 Pass,包括:ModulePass、FunctionPass,CallGraphPass、LoopPass 等。最常用的是 FunctionPass,它以函数为单位进行处理,FunctionPass 的子类必须实现 runOnFunction函数,FunctionPass 运行时会对程序中的每个函数执行 runOnFunction函数。

编写

目标是编写一个输出程序所有函数的 Pass,类型是 FunctionPass。基础步骤:

  • 创建一个类,继承FunctionPass
  • 在创建的类中实现runOnFunction(Function &F)函数
  • 向 LLVM 注册 Pass 类

附编写 Pass 模板:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
using namespace llvm;

namespace {
class Demo : public FunctionPass{
public:
static char ID;
Demo() : FunctionPass(ID) {}
bool runOnFunction(Function &F);
};
}

// runOnFunction 函数实现
bool Demo::runOnFunction(Function &F){
// do something
}

char Demo::ID = 0;
// 注册该 Demo Pass

static RegisterPass<Demo> X("xxx", "Description");
// xxx 为指定 Pass 时的参数,见实践部分

编译

使用 CMake 将 Pass 编译为 .so 文件。

CMakeLists.txt 为编译

加载

使用 opt 加载编译好的 Pass,处理中间代码,生成新的中间代码。

实践

创建目录结构:

1
2
3
4
5
6
7
8
9
.
├── Build
├── Test
├── test.sh
└── Transforms
├── CMakeLists.txt
├── include
└── src
└── HelloWorld.cpp

TestProgram.cpp(一个简单的 CTF 逆向题目,Pass 应用的目标):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <cstdio>
#include <cstring>
char input[100] = {0};
char enc[100] = "\x86\x8a\x7d\x87\x93\x8b\x4d\x81\x80\x8a\
\x43\x7f\x49\x49\x86\x71\x7f\x62\x53\x69\x28\x9d";

void encrypt(unsigned char *dest, char *src) {
int len = strlen(src);
for (int i = 0; i < len; i++) {
dest[i] = (src[i] + (32 - i)) ^ i;
}
}

// flag{s1mpl3_11vm_d3m0}
int main() {
printf("Please input your flag: ");
scanf("%s", input);
unsigned char dest[100] = {0};
encrypt(dest, input);
bool result = strlen(input) == 22 && !memcmp(dest, enc, 22);
if (result) {
printf("Congratulations~\n");
} else {
printf("Sorry try again.\n");
}
}

CMakeLists.txt(管理项目):

1
2
3
4
5
6
7
8
9
10
11
12
project(OLLVM++)
cmake_minimum_required(VERSION 3.13.4)
find_package(LLVM REQUIRED CONFIG)

list(APPEND CMAKE_MODULE_PATH "${LLVM_CMAKE_DIR}")
include(AddLLVM)
include_directories("./include") # 包含 ./include 文件夹中的头文件
separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
add_definitions(${LLVM_DEFINITIONS_LIST})
include_directories(${LLVM_INCLUDE_DIRS})
add_llvm_library( LLVMObfuscator MODULE
src/HelloWorld.cpp

HelloWorld.cpp(参照模板编写的 Pass):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
using namespace llvm;

namespace {
class HelloWorld: public FunctionPass{
public:
static char ID;
HelloWorld(): FunctionPass(ID) {}

bool runOnFunction(Function &F);
};
}

bool HelloWorld::runOnFunction(Function &F){
outs() << "Hello, " << F.getName() << "\n";
}

char HelloWorld::ID = 0;
static RegisterPass<HelloWorld> X("hlw", "First Pass");

test.sh(测试脚本,运行即可完成测试):

1
2
3
4
5
6
7
8
cd ./Build
cmake ../Transforms
make
cd ../Test
clang -S -emit-llvm TestProgram.cpp -o TestProgram.ll
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -hlw -S TestProgram.ll -o TestProgram_hlw.ll
clang TestProgram_hlw.ll -o TestProgram_hlw
./TestProgram_hlw

-enable-new-pm=0代表不适用新的 Pass Manager,不使用此选项在使用 opt 加载 Pass 时可能出现Unknown Pass错误。

运行 test.sh 脚本,输入flag{s1mpl3_11vm_d3m0},输出Congratulations,完成第一个 Pass。

LLVM IR

LLVM IR 是一种低级编程语言,类似于汇编,可以方便地进行代码优化。LLVM IR 有两种表示方式,文本形式(.ll)和二进制形式(.bc),两者是等价的,可以通过llvm-disllvm-as进行转化。

结构

image-20220901100214809

一个模块对应一个源代码文件,模块头部信息包含程序目标平台和一些其他信息。

IR 中的函数对应源代码中的一个函数,一个函数由若干基本块组成,函数最先执行的基本块称为入口块。

基本块由若干指令和标签组成,正常情况下,基本块的最后一条指令为跳转指令(brswitch)或返回指令(retn),也叫做终结指令(Terminator Instruction)。

基于 IR 的代码混淆

基于 IR 的代码混淆主要关注函数或比函数更小的单位:

  • 以函数为单位——控制流平坦化
  • 以基本块为单位——虚假控制流
  • 以指令为单位——指令替代

静态单赋值

静态单赋值(Static Single Assignment,SSA)是 IR 的一个属性,可以简单认为程序中一个变量仅能有一条赋值语句,LLVM IR 基于 SSA 原则进行设计。

为了实现 SSA,C/C++ 中常用的循环如:

1
2
for(int i=0; i<100; i++)
printf("Hello, %d", i);

需修改为:

1
2
3
int *i = (int *)malloc(4);
for(*i=0; *i<100; (*i)++)
printf("Hello, %d", *i);

LLVM IR 中就存在这种方式。

IR 的常用指令

终结指令

  • ret,对应 C/C++ 中的 return,语法:

    1
    2
    ret <type> <value>	;返回指定类型的值
    ret void ;返回类型为 void

    实例:

    1
    2
    3
    4
    ret i32, 5			;返回整数5
    ret void ;无返回值
    ret { i32, i8 } { i32 4, i8 2 }
    ;返回结构体
  • br,分为有条件分支和无条件分支,对应 C/C++ 的 if 和汇编中的有条件和无条件跳转指令

    1
    2
    3
    br i1 <cond>, label <iftrue>, label <iffalse>	
    ;条件分支, cond 为真跳转至 iftrue,否则跳转至 iffalse
    br label <dest> ;无条件分支

    实例:

    1
    2
    3
    4
    5
    6
    7
    Test:
    %cond = icmp eq, i32 %a, %b
    br i1 %cond, label %IfEqual, label %IfUnequal
    IfEqual:
    ret i32 i
    IfUnequal:
    ret i32 0
  • switch 类似于 C/C++ 中的 switch

    1
    switch <intty> <value>, label <defaultdest> [<intty> <value> label <dest> ...]

    实例:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    %Val = zext i1 %value to i32
    switch i32 %Val, label %trudest [ i32 0, label %falsedest ]
    ; 等效于条件跳转

    switch i32 0, label %dest []
    ; 等效于无条件跳转

    switch i32 %val, label %otherwise [ i32 0, label %onzero
    i32 1, label %onone
    i32 2, label %ontwo ]

比较指令

  • icmp ,比较整数或指针

    1
    2
    3
    4
    <result> = icmp <cond> <ty> <op1>, <op2>
    ; cond 可以是 eq(相等),ne(不相等),
    ; ugt(无符号大于),sle(有符号小于或等于)等等
    ; ty 是 op1 和 op2 的类型
  • fcmp,比较浮点数

    1
    2
    3
    4
    <result> = fcmp <cond> <ty> <op1>, <op2>
    ; cond 可以为 oeq(ordered and equal),
    ; ueq(unordered or equal), false(必定不成立),
    ; ordered 指两个操作数都不是 NaN

二元运算指令

  • add,整数加法

    1
    <result> = dd <ty> <op1>, <op2>
  • sub,整数减法

    1
    <result> = sub <ty> <op1>, <op2>	; result = op1 - op2
  • mul,整数乘法

    1
    <result> = mul <ty> <op1>, <op2>
  • udiv,无符号整数除法,可以加入exact关键字

    1
    2
    <result> = udiv <ty> <op1>, <op2>		; result = op1/op2
    <reuslt> = udiv exact <ty> <op1>, <op2> ; 如果 op1 不是 op2 的倍数,会出现错误
  • sdiv,有符号整数除法,同样可以加入 exact 关键字

    1
    2
    <result> = sdiv <ty> <op1>, <op2>		; result = op1/op2
    <reuslt> = sdiv exact <ty> <op1>, <op2> ; 如果 op1 不是 op2 的倍数,会出现错误
  • urem,无符号整数取余

    1
    <result> = urem <ty> <op1>, <op2> 		; result = op1 % op2
  • srem,有符号整数取余

    1
    <result> = srem <ty> <op1>, <op2> 		; result = op1 % op2

按位二元运算指令

  • shl,整数左移

    1
    <result> = shl <ty> <op1>, <op2> 	; result = op1 << op2
  • lshr,整数逻辑右移(左侧补零)

    1
    <result> = lshr <ty> <op1>, <op2> 	; result = op1 >> op2
  • ashr,整数算数右移(左侧补符号位)

    1
    <result> = ashr <ty> <op1>, <op2> 	
  • and,整数按位与

    1
    <result> = and <ty> <op1>, <op2> 	; result = op1 & op2
  • or,整数按位或

    1
    <result> = or <ty> <op1>, <op2> 	; result = op1 | op2
  • xor,按位异或

    1
    <result> = xor <ty> <op1>, <op2> 	; result = op1 ^ op2

内存访问和寻址操作指令

  • alloca,在栈中分配一块空间,并获得指向该空间的指针

    1
    2
    <result> = alloca <type> [, <ty> <NumElement>] [, align <alignment]
    ; 分配 sizeof(type) * NumElements 字节的内存,分配地址与 alignment 对齐, 指针指向 type 类型

    实例:

    1
    2
    3
    %ptr = alloca i32						; 分配 4 字节内存,指针指向 i32 类型
    %ptr = alloca i32, i32 4 ; 分配 4*4 = 16 字节内存 ;
    %ptr = alloca i32, i32 4, align 1024 ; 分配的地址与1024对齐
  • store,向指针指向的内存中存储数据

    1
    store <ty> <value>, <ty>* <pointer>

    实例:

    1
    2
    %ptr = alloca i32
    store i32 3, i32* %ptr ; 向 %ptr 指向的的内存写入 3
  • load,从指针指向的内存中读取数据

    1
    <result> = load <ty>, <ty>* <pointer>

    实例:

    1
    2
    3
    %ptr = alloca i32
    store i32 3, i32* %ptr
    %val = load i32, i32* %ptr ; 从 %ptr 指向的地址读取 3

类型转换指令

  • trunk .. to,将一种类型的变量截断为另一种类型的变量(从较大的类型到较小的类型)

    1
    <result> = trunc <ty> <value> to <ty2>

    实例:

    1
    2
    3
    %X = trunc i32 257 to i8		; %X = 1
    %W = trunc <2 x i16> <i16 8, i16 7> to <2 x i8>
    ; %W = <i8 8, i8 7>
  • zext. .. to,将一种类型的变量零拓展为另一种类型(从较小的类型到较大的类型),零拓展后值不会改变

    1
    <result> = zext <ty> <value> to <ty2>
  • sext .. to,符号拓展,通过复制符号位进行拓展

    1
    <result> = sext <ty> <value> to <ty2>

    实例:

    1
    2
    3
    %X sext i8 -1 to i16		; -1 = 0b10000001 补码为 11111111
    ; sext 在高位填充符号位(1
    ; 结果为 11111111 11111111 = -1

其他指令

  • phi,为了解决 SSA 一个变量之恶能被赋值一次的问题产生的指令,其计算结果由 phi 指令所在基本块的前驱块确定

    1
    <result> = phi <ty> [ <val0>, <label0>], ... ; 如果前驱块为 label0,则 result = val0

    用 phi 指令可以实现另外一种具有 SSA 属性的 for 循环:

    1
    2
    3
    4
    Loop:
    %indvar = phi i32 [ 0, %LoopHeader], [ %nextindvar, %Loop]
    %nextindvar = add i32 %indvar, 1
    br label %Loop
  • select,类似三元运算符? :

    1
    2
    <result> = select i1 <cond>, <ty> <value1>, <ty> <value2>
    ; cond 为真,则 result = value1,否则为 value2
  • call,调用函数,与 x86 汇编不同在于 IR 的 call 可以传递参数

    1
    <result> = call <ty>|<fnty> <fntrval>(<function args>)

C++

LLVM 使用 C++ 开发,采用 C++ 11 标准开发,并使用了 STL ,需要掌握一定 C++ 基础。

LLVM Pass 常用 API

常用类

  • Function,获取函数属性,例如名称、入口块,利用 Function 类可以遍历函数中的基本块

    1
    2
    3
    4
    5
    bool runOnFunctino(Function &F){
    for(BasicBlock &BB:F){
    // do something
    }
    }
  • BasicBlock,可以进行基本块的克隆、分裂、移动等,可以遍历基本块中的指令

    1
    2
    3
    4
    5
    6
    7
    bool runOnFunction(Function &F){
    for(BasicBlock &BB:F){
    for(Instruction &I: BB){
    // do something
    }
    }
    }
  • Instruction,包含一些子类,例如 BinaryOperator,AllocaInst,BranchInst 等,可以对指令进行创建、删除和修改,也可以遍历指令中的操作数

    1
    2
    3
    4
    5
    6
    7
    8
    9
    bool runOnFunction(Function &F){
    for(BasicBlock &BB:F){
    for(Instruction &I: BB){
    for(int i = 0; i< I.getNumOperands(); i++){
    Value *V = I.getOperand(i);
    }
    }
    }
    }
  • Value,基本类,所有可以被当作指令操作数的类都是 Value 的子类,包括 Constant,Argument,Instruction,Function,BasicBlock 五个子类。

输出流

LLVM 中建议使用 outs()errs()dbgs() 三个输出流

常用文档

文档页面:About — LLVM 16.0.0git documentation

编程手册LLVM Programmer’s Manual — LLVM 16.0.0git documentation

在 Windows 上搭建环境:Getting Started with the LLVM System using Microsoft Visual Studio — LLVM 16.0.0git documentation

用 CMake 单独编译 PassBuilding LLVM with CMake — LLVM 16.0.0git documentation

编写 Pass,本文内容与这篇关系较大:Writing an LLVM Pass — LLVM 16.0.0git documentation

编写 Pass(使用新的 Pass Manager):Writing an LLVM Pass — LLVM 16.0.0git documentation

介绍 SSA:MemorySSA — LLVM 16.0.0git documentation

API 文档LLVM: LLVM

命令行工具文档:LLVM Command Guide — LLVM 16.0.0git documentation

LLVM IR 语言参考:LLVM Language Reference Manual — LLVM 16.0.0git documentation

基本块分割

原理

将一个基本块分割为若干个等价的基本块,在分割后的基本块之间加上无条件跳转。

基本块分割可以提高某些代码混淆的效果。

只需遍历每个函数中的每个基本块,对其进行分割即可,其中包含 phi 指令的基本块需要跳过,否则可能发生错误。

使用的 API:

  • 额外参数指定,可以从外部获取自定义参数。在 LLVM 中,可以通过 cl::opt模板类获取指令中的参数,此处 opt 是 option 的缩写。

    1
    2
    3
    4
    #include "llvm/Support/CommandLine.h"

    // 可选参数,指定一个基本块会被分割为几个基本块,默认为 2
    static cl::opt<int> splitNum("split_num", cl::init(2), cl::desc("Split <split_num> time(s) each BasicBlock"));

    命令行中的使用方法:

    1
    opt -load ../Build/LLVMObfuscator.so -split -split_num 5 -S TestProgram.ll -o TestProgram_split.ll
  • splitBasicBlock 函数,是 BasicBlock 类的成员函数,在 BasicBlock.h 中有详细的解释。有两个重载,可以看到它们本质是一样的:

    1
    2
    3
    4
    5
    6
    BasicBlock *splitBasicBlock(iterator I, const Twine &BBName = "",
    bool Before = false);
    BasicBlock *splitBasicBlock(Instruction *I, const Twine &BBName = "",
    bool Before = false) {
    return splitBasicBlock(I->getIterator(), BBName, Before);
    }

    此处使用第二种,将基本块以指令 I 为边界分为两个基本块,指令 I 放在后面的基本块中,并在两个基本块之间建立无条件跳转。BBName 为新基本块的名称,Before为 True 时会将第二个基本块放在第一个基本块之前。

  • isa <>,模板函数,用于判断一个指针指向的数据类型是不是给定的类型,此处用于判断一个指令是不是 PHI 指令。

实现

在之前 第一个 LLVM Pass 中的实践部分描述的工程中继续编写,在 Transforms/src 目录新建 SplitBasicBlock.cpp,结构类似于之前的 HelloWorld.cpp。内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
#include <llvm/IR/BasicBlock.h>
#include <llvm/IR/Instruction.h>
#include <llvm/IR/Instructions.h>
#include "llvm/Support/CommandLine.h"
#include <vector>
using std::vector;
using namespace llvm;

static cl::opt<int> splitNum("split_num", cl::init(2), cl::desc("Split <split_num> time(s) each BasicBlock"));

namespace {
class SplitBasicBlock: public FunctionPass{
public:
static char ID;
SplitBasicBlock(): FunctionPass(ID) {}

bool runOnFunction(Function &F);

bool containsPHI(BasicBlock *BB);
bool split(BasicBlock *BB);
};
}

bool SplitBasicBlock::runOnFunction(Function &F){
vector<BasicBlock*> origBB;
for(BasicBlock &BB : F){
origBB.push_back(&BB);
}
// 将原来的基本块指针保存到 vector 容器中,以免在分割时发生冲突

for(BasicBlock* BB : origBB){
if(!containsPHI(BB)){
split(BB);
}
}
}

bool SplitBasicBlock::containsPHI(BasicBlock *BB){
for(Instruction &I : *BB){
if(isa<PHINode>(&I)){
return false;
}
}
return true;
}

bool SplitBasicBlock::split(BasicBlock *BB){
// 计算分割后每个基本块的大小
// 为 原基本块大小 / 分割数目(向下取整)
int splitsize = BB->size() /splitNum;
BasicBlock * curBB = BB;
for(int i=1; i<splitNum; i++){
int cnt = 0;
for(Instruction &I: *curBB){
if( cnt++ == splitsize){
curBB = curBB->splitBasicBlock(&I);
break;
}
}
}
}

char SplitBasicBlock::ID = 0;
static RegisterPass<SplitBasicBlock> X("split", "Split Basic Block");

关键在于 split 函数。其中 splitBasicBlock 函数会返回分割生成的第二个基本块的指针。

将新编写的 Pass 编译进模块,修改 CMakeList.txt(只在倒数第二行添加了新编写的源文件):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
project(OLLVM++)
cmake_minimum_required(VERSION 3.13.4)
find_package(LLVM REQUIRED CONFIG)

list(APPEND CMAKE_MODULE_PATH "${LLVM_CMAKE_DIR}")
include(AddLLVM)
include_directories("./include") # 包含 ./include 文件夹中的头文件
separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
add_definitions(${LLVM_DEFINITIONS_LIST})
include_directories(${LLVM_INCLUDE_DIRS})
add_llvm_library( LLVMObfuscator MODULE
src/HelloWorld.cpp
src/SplitBasicBlock.cpp
)

为了方便以后的修改,做一些修改,首先在 Transforms/include 目录下添加 SplitBasicBlock.h 文件:

1
2
3
4
5
6
7
#pragma once
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"

namespace llvm {
FunctionPass* createSplitBasicBlockPass();
}

并在 Transforms/src/SplitBasicBlock.cpp 中添加它的实现:

1
2
3
4
...
FunctionPass* createSplitBasicBlockPass(){
return new SplitBasicBlock();
}

这样修改之后,新的 Pass 可以通过llvm::createSplitBasicBlockPass的方式复用基本块分割的 Pass。

随后在 Build 目录下添加 IR 和 Bin 两个文件夹,方便管理。

方便测试,对 TestProgram 也进行修改,使其接受命令行参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <asm-generic/errno.h>
#include <cstdio>
#include <cstring>
char input[100] = {0};
char enc[100] = "\x86\x8a\x7d\x87\x93\x8b\x4d\x81\x80\x8a\
\x43\x7f\x49\x49\x86\x71\x7f\x62\x53\x69\x28\x9d";

void encrypt(unsigned char *dest, char *src) {
int len = strlen(src);
for (int i = 0; i < len; i++) {
dest[i] = (src[i] + (32 - i)) ^ i;
}
}

// flag{s1mpl3_11vm_d3m0}
int main(int argc, char* argv[]) {
if(! (argc==2) ){
printf("Please input your flag: ");
scanf("%s", input);
}
else{
strcpy(input, argv[1]);
}
unsigned char dest[100] = {0};
encrypt(dest, input);
bool result = strlen(input) == 22 && !memcmp(dest, enc, 22);
if (result) {
printf("Congratulations~\n");
} else {
printf("Sorry try again.\n");
}
}

最后修改 test.sh 测试脚本:

1
2
3
4
5
6
7
8
9
10
11
12
cd ./Build
cmake ../Transforms
make
cd ../Test
clang -S -emit-llvm TestProgram.cpp -o IR/TestProgram.ll
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -hlw -S IR/TestProgram.ll -o IR/TestProgram_hlw.ll
clang IR/TestProgram_hlw.ll -o Bin/TestProgram_hlw
./Bin/TestProgram_hlw flag{s1mpl3_11vm_d3m0}

opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -split -S IR/TestProgram.ll -o IR/TestProgram_split.ll
clang IR/TestProgram_split.ll -o Bin/TestProgram_split
./Bin/TestProgram_split flag{s1mpl3_11vm_d3m0}

代码混淆 与 OLLVM

常见概念

  • 代码混淆——将程序转换成一种功能上等价,但是难以阅读和理解的形式的行为。
  • 函数——代码混淆的基本单位,一个函数由若干基本块组成,有一个入口块,可能有多个出口块,一个函数可以用一个控制流图表示
  • 基本块——由一组先行指令组成,每个基本块有一个入口点(第一条执行的指令)和一个出口点(最后一条执行的指令)。终结指令要么跳转到另一个基本块,要么从函数返回。
  • 控制流——代表了一个程序在执行过程中可能遍历到的所有路径。通常情况下反应了程序的逻辑,混淆后的控制流会难以分辨正常逻辑。
  • 不透明谓词——混淆者明确知晓,但反混淆者却难以推断的变量。

常见混淆思路

  • 符号混淆——将函数的符号去除或者混淆。
  • 控制流混淆——混淆程序正常的控制流,使其功能不变的情况下不能反映原此程序的逻辑,包括控制流平坦化、虚假控制流、随机控制流。
  • 计算混淆——混淆程序的计算流程或计算流程中使用的数据,使分析者难以分辨某一段代码执行的具体计算,包括指令替代和常量替代。
  • 虚拟机混淆——将一组指令集和转化为一组位置的自定义指令集,并与程序绑定的解释器解释执行。

OLLVM

Obfuscator-LLVM 简称(OLLVM),能提供代码混淆和防篡改工具。他提供了三种经典的代码混淆,控制流平坦化(Control Flow Flattening)、虚假控制流(Bogus Control Flow)和指令替代(Instruction Substitution)。OLLVM 在2017 年停止开发,但仍有很大学习价值。

编译

由于长时间停止更新,编译 OLLVM 比较困难,因此使用 docker 容器进行学习。

1
2
3
docker pull nickdiego/ollvm-build
git clone https://github.com/nickdiego/docker-ollvm.git
git clone -b llvm-4.0 https://github.com/obfuscator-llvm/obfuscator.git

在 docker-ollvm/ollvm-build.sh 第 150 行(docker run之前),加入:

1
DOCKER_CMD+=" -DLLVM_INCLUDE_TESTS=OFF"

执行 build 脚本:

1
2
chmod 777 ollvm-build.sh
sudo ./ollvm-build.sh

编译后的二进制文件存放在 obfuscator/build_release 文件夹,在 obfuscator/build_release/bin 中执行 ./clang --version 确定是否编译成功,该 clang 版本为 4.0.1。首先编译一个未混淆的 TestProgram,与后面混淆的结果进行对比。这里没有将 OLLVM 的 clang 链接到 /usr/bin,因此需要指定 obfuscator/build_release/bin 中的clang。

1
./clang /tmp/TestProgram.cpp -o /tmp/TestProgram

在 IDA 中的效果:

正常的TestProgram

控制流平坦化

1
./clang -mllvm -fla -mllvm -split -mllvm -split_num=3 /tmp/TestProgram.cpp -o /tmp/TestProgram_fla

使用的选项:

  • -mllvm -fla,激活控制流平坦化
  • -mllvm -split,激活基本块分割
  • -mllvm -split_num=3,指定基本块分割的数量

效果:

控制流平坦化效果

虚假控制流

1
./clang -mllvm -bcf -mllvm -bcf_loop=3 -mllvm -bcf_prob=40 /tmp/TestProgram.cpp -o /tmp/TestProgram_bcf

选项:

  • -mllvm -bcf,激活虚假控制流
  • -mllvm -bcf_loop=3,混淆次数,默认1
  • -mllvm -bcf_prob=40,每个基本块被混淆的概率,默认30

效果:

虚假控制流效果

指令替换

1
./clang -mllvm -sub -mllvm -sub_loop=3 /tmp/TestProgram.cpp -o /tmp/TestProgram_sub

选项:

  • -mllvm -sub,激活指令替换
  • mllvm -sub_loop=3,混淆次数,默认1

指令替换不改变控制流,只是将指令用等价的其他指令替换,指令替换后的 encrypt 函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
__int64 __fastcall encrypt(unsigned __int8 *a1, char *a2)
{
__int64 result; // rax
char v3; // r10
char v4; // r10
char v5; // r11
char v6; // cl
char v7; // r14
char v8; // dl
char v9; // si
char v10; // r14
char v11; // dl
char v12; // cl
char v13; // si
int i; // [rsp+0h] [rbp-40h]
int v15; // [rsp+4h] [rbp-3Ch]

v15 = strlen(a2);
for ( i = 0; ; i = -(-i - 86661348 + 86661347) )
{
result = (unsigned int)i;
if ( i >= v15 )
break;
v3 = ~(~(a2[i] + -81 - i + 113) | 0xBA);
v4 = ((((v3 ^ ~((a2[i] + -81 - i + 113) | 0x45) | v3 & ~((a2[i] + -81 - i + 113) | 0x45)) & 0x4D | ~(v3 ^ ~((a2[i] + -81 - i + 113) | 0x45) | v3 & ~((a2[i] + -81 - i + 113) | 0x45)) & 0xB2) ^ 0xF7) & 0xAA | ~(((v3 ^ ~((a2[i] + -81 - i + 113) | 0x45) | v3 & ~((a2[i] + -81 - i + 113) | 0x45)) & 0x4D | ~(v3 ^ ~((a2[i] + -81 - i + 113) | 0x45) | v3 & ~((a2[i] + -81 - i + 113) | 0x45)) & 0xB2) ^ 0xF7) & 0x55) ^ 0xAA;
v5 = ~(~((i & 0x88 | ~(_BYTE)i & 0x77) ^ 0x88) | ~((((i & 0x88 | ~(_BYTE)i & 0x77) ^ 0x88) & 0xAF | ~((i & 0x88 | ~(_BYTE)i & 0x77) ^ 0x88) & 0x50) ^ 0x43));
v6 = ~(~(((~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0xA1 | ~(~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0x5E) ^ 0xA1) | v4 ^ 0x13 | v4 & 0x13);
v7 = ~(((~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0xA1 | ~(~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0x5E) ^ 0xA1) & 0xCD;
v8 = ~(((~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0xA1 | ~(~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0x5E) ^ 0xA1) | v4 ^ 0x13 | v4 & 0x13;
v9 = ~(((((~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0xA1 | ~(~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0x5E) ^ 0xA1) & 0x32 | v7) ^ (~(v4 ^ 0x13 | v4 & 0x13) & 0x32 | (v4 ^ 0x13 | v4 & 0x13) & 0xCD));
v10 = ~(v9 | v8) | ((((((~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0xA1 | ~(~((a2[i] + -81 - i + 113) | 0x13) | (~(a2[i] + -81 - i + 113) & 0x12 | (a2[i] + -81 - i + 113) & 0xED) ^ 1) & 0x5E) ^ 0xA1) & 0x32 | v7) ^ (~(v4 ^ 0x13 | v4 & 0x13) & 0x32 | (v4 ^ 0x13 | v4 & 0x13) & 0xCD)) & 0x21 | v9 & 0xDE) ^ (v6 & 0x21 | v8 & 0xDE);
v11 = v5 & ((~(_BYTE)i | ~(i & 0x13 | ~(_BYTE)i & 0xEC)) ^ v5);
v12 = (~(~(_BYTE)i | ~(i & 0x13 | ~(_BYTE)i & 0xEC)) & 0xB0 | (~(_BYTE)i | ~(i & 0x13 | ~(_BYTE)i & 0xEC)) & 0x4F) ^ (v5 & 0xB0 | (~((i & 0x88 | ~(_BYTE)i & 0x77) ^ 0x88) | ~((((i & 0x88 | ~(_BYTE)i & 0x77) ^ 0x88) & 0xAF | ~((i & 0x88 | ~(_BYTE)i & 0x77) ^ 0x88) & 0x50) ^ 0x43)) & 0x4F);
v13 = v12 ^ v11 | v12 & v11;
a1[i] = ~(v13 | ~v10) ^ v13 & (v10 ^ v13) | ~(v13 | ~v10) & v13 & (v10 ^ v13);
}
return result;

控制流平坦化(Control Flow Flattening)

指的是将正常控制流中基本块之间的跳转关系删除,用一个集中的分发块来调度基本块的执行顺序。这是以函数为单位进行的混淆方式。

结构:

控制流平坦化——结构

分析进行过控制流平坦化混淆的程序时,在不知道基本块执行顺序的情况下,分别分析每个基本块难度很高,而且如果希望了解执行顺序,就必须分析分发块的调度逻辑,这是非常困难的。

步骤

  • 保存原基本块。将除入口块之外的基本块保存到 vector 容器中,方便后续处理。如果入口块的终结指令是条件分支指令,则将该指令单独分离出来作为一个基本块,放到 vector 容器的最前面。这样可以保证入口块只有一个后继块。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    vector<BasicBlock*> origBB;
    for(BasicBlock &BB: F){
    origBB.push_back(&BB);
    }

    origBB.erase(origBB.begin());
    // 删除第一个基本块(入口块)

    BasicBlock &entryBB = F.getEntryBlock();
    if(BranchInst *br = dyn_cast<BranchInst>(entryBB.getTerminator()){
    if(br->isConditional()){
    BasicBlock *newBB = entryBB.splitBasicBlock(br, "newBB");
    origBB.insert(origBB.begin(), newBB);
    }
    }
    // 如果入口块的终结指令是条件跳转,将其分离成一个单独的基本块并放到 vector 开头
  • 创建分发块和返回块。分发块负责调度基本块的执行顺序,并且需要建立入口块到分发块的绝对跳转。基本块执行完后都需要跳转到返回块,返回块直接跳转回分发块。

    1
    2
    3
    4
    5
    6
    7
    8
    BasicBlock *dispatchBB = BasicBlock::Create(F, getContext(), "dispatchBB", &F, &entryBB); // 创建到入口块之前
    BasicBlock *returnBB = BasicBlock::Create(F.getContext(), "returnBB", &F, &entryBB);
    BranchInst::(Create(dispatchBB, returnBB));
    entryBB.moveBefore(dispatchBB); // 重新将入口块移动到最前
    // 去除第一个基本块结尾的跳转
    entryBB.getTerminator()->eraseFromParent();
    // 建立第一个基本块到 dispatchBB 的跳转
    BranchInst *brDispatchBB = BranchInst::Create(dispatchBB, &entryBB);
  • 实现分发块调度。首先在入口块中创建并初始化 switch 变量,在调度块中插入 switch 指令实现分发调度。随后将基本块移动到返回块之前,并分配随机 case 值,并将其添加到 switch 指令的分支中。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    // 在入口块插入 alloca 和 store 指令创建和初始化 switch 变量,初始值为随机
    int randNumCase = rand();
    AllocaInst *swVarPtr = new AllocaInst(TYPE_I32, 0, "swVar.ptr", brDispatchBB);
    new StoreInst(CONST_I32(randNumCase), swVarPtr, brDispatchBB);
    // 在分发快插入 load 指令读取 switch 变量
    LoadInst *swVar = new LoadInst(TYPE_I32, swVarPtr, "swVar", false, dispatchBB);
    // 在分发块插入 switch 指令实现基本块的调度
    BasicBlock *swDefault = BasicBlock::Create(F.getContext(), "swDesfault", &F, returnBB);
    BranchInst::Create(returnBB, swDefault);
    SwitchInst *swInst = SwitchInst::Create(swVar, swDefault, 0, dispatchBB);
    // 将基本块插入到返回块之前,并分配 case 值
    for(BasicBlock *BB, origBB){
    BB->moveBefore(returnBB);
    swInst->addCase(CONST_I32(randNumCase), BB);
    randNumCase = rand();
    }
  • 实现调度变量自动调整。在每个原基本块最后添加修改 switch 变量值的指令,便于返回分发块之后能够正确执行到下一个基本块。删除原基本块末尾的跳转,使其结束执行后跳转到返回块。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    // 在每个基本块后添加修改 switch 变量的指令和跳转到返回块的指令
    for(BasicBlock *BB: origBB){
    // 无后继块,说明函数在这里返回,不需要处理
    if(BB->getTerminator()->getNumberSuccessors() == 0){
    continue;
    }
    // 无条件跳转,有一个后继,新增一条修改 switch 变量的指令,修改为它原来后继块对应的 case 值
    else if(BB->getTerminator()->getNumberSuccessors() == 1){
    BasicBlock *sucBB = BB->getTerminator()->getSuccessor(0);
    BB->getTerminator()->eraseFromParent();
    ConstantInt *numCase = swInst->findCaseDest(sucBB);
    new StoreInst(numCase, swVarPtr, BB);
    BranchInst::Create(returnBB, BB);
    }
    // 条件跳转,有两个后继。获取原来的跳转指令,用 select 替换,将对应的不同 case 值存储到 switch 变量中。
    else if(BB->getTerminator()->getNumberSuccessors() == 2){
    ConstantInt *numCaseTrue = swInst->findCaseDest(BB->getTerminator()->getSuccessor(0));
    ConstantInt *numCaseFalse = swInst->findCaseDest(BB->getTerminator()->getSuccessor(1));
    BranchInst *br = cast<BranchInst>(BB->getTerminator());
    SelectInst *sel = SelectInst::Create(br->getCondition(), numCaseTrue, numCaseFalse, "", BB->getTerminator());
    BB->getTerminator()->eraseFromParent();
    new StoreInst(sel, swVarPtr, BB);
    BranchInst::Create(returnBB, BB);
    }
    }
  • 修复 PHI 指令和逃逸变量。平坦化之后原有的基本块的前驱块都变成了分发块,因此 PHI 指令发生了损坏。逃逸变量指的是在一个基本块中定义,在另一个基本块中引用的变量。源程序中某些基本块可能引用之前某个基本块中的变量,平坦化之后原基本块之间没有确定的前驱后继关系,因此某些变量的引用可能损坏(这里指的应该是寄存器变量)。修复方法是,将 PHI 指令和逃逸变量都转化为内存存取指令。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    void Flattening::fixStack(Function &F){
    vector<PHINode*> origPHI;
    vector<Instruction*> origReg;
    BasicBlock &entryBB = F.getEntryBlock();
    for(BasicBlock &BB:F){
    for(Instruction &I: BB){
    if(PHINode *PN = dyn_cast<PHINode>(&I){
    origPHI.push_back(PN);
    }else if( (!(isa<AllocaInst>(&I) && I.getParent() == &entryBB)
    // 如果在入口块中进行 alloca,则不是逃逸变量
    && I.isUsedOutsideOfBlock(&BB)
    // 不满足上面的条件,又在当前块外使用,则是逃逸变量
    {
    origReg.push_back(&I);
    }
    }
    }
    for(PHINode *PHI: origPHI){
    DemotePHIToStack(PH, entryBB, getTerminator());
    }
    for(Instruction *I: origReg){
    DemoteRegToStack(*I, entryBB.getTerminator());
    // LLVM 中变量又叫做虚拟寄存器
    }
    }

实现

原作者的代码在我的环境中编译或者运行时会报错,进行了多处修改。

在之前工程的基础上,在 Transform/include 目录中新建 Utils.h 文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#pragma once

#include "llvm/IR/Function.h"
#include "llvm/IR/Constants.h"

#define INIT_CONTEXT(F) CONTEXT=&F.getContext()
#define TYPE_I32 Type::getInt32Ty(*CONTEXT)
#define CONST_I32(V) ConstantInt::get(TYPE_I32, V,false)

// extern llvm::LLVMContext *CONTEXT;
// 在这里声明会出现 Undefined Symbol CONTEXT 错误
// 因此在各个 Pass 的成员中声明 CONTEXT

namespace llvm{
void fixStack(Function &F);
}

Transform/Flattening.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/Instructions.h"
#include "llvm/Transforms/Utils.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Transforms/Utils/Local.h"
#include "SplitBasicBlock.h"
#include "Utils.h"
#include <vector>
#include <cstdlib>
#include <ctime>

using namespace llvm;
using std::vector;

namespace{
class Flattening : public FunctionPass{
public:
static char ID;
Flattening() : FunctionPass(ID){
srand(time(0));
}

// 对函数 F 进行平坦化
void flatten(Function &F);

bool runOnFunction(Function &F);

llvm::LLVMContext *CONTEXT;

};
}

bool Flattening::runOnFunction(Function &F){
INIT_CONTEXT(F);
// 与原作者遇到的问题相同,调用 SplitBasicBlock 会遇到链接错误
// FunctionPass *pass = createSplitBasicBlockPass();
// pass->runOnFunction(F);
flatten(F);
return true;
}

void Flattening::flatten(Function &F){
// 基本块数量不超过1则无需平坦化
if(F.size() <= 1){
return;
}
// Lower switch
// 调用 Lower switch 会导致崩溃,解决方法未知
//FunctionPass *pass = createLowerSwitchPass();
//pass->runOnFunction(F);
// 将除入口块(第一个基本块)以外的基本块保存到一个 vector 容器中,便于后续处理
// 首先保存所有基本块
vector<BasicBlock*> origBB;
for(BasicBlock &BB: F){
origBB.push_back(&BB);
}
// 从vector中去除第一个基本块
origBB.erase(origBB.begin());
BasicBlock &entryBB = F.getEntryBlock();
// 如果第一个基本块的末尾是条件跳转,单独分离
if(BranchInst *br = dyn_cast<BranchInst>(entryBB.getTerminator())){
if(br->isConditional()){
BasicBlock *newBB = entryBB.splitBasicBlock(br, "newBB");
origBB.insert(origBB.begin(), newBB);
}
}

// 创建分发块和返回块
BasicBlock *dispatchBB = BasicBlock::Create(*CONTEXT, "dispatchBB", &F, &entryBB);
BasicBlock *returnBB = BasicBlock::Create(*CONTEXT, "returnBB", &F, &entryBB);
BranchInst::Create(dispatchBB, returnBB);
entryBB.moveBefore(dispatchBB);
// 去除第一个基本块末尾的跳转
entryBB.getTerminator()->eraseFromParent();
// 使第一个基本块跳转到dispatchBB
BranchInst *brDispatchBB = BranchInst::Create(dispatchBB, &entryBB);

// 在入口块插入alloca和store指令创建并初始化switch变量,初始值为随机值
int randNumCase = rand();
AllocaInst *swVarPtr = new AllocaInst(TYPE_I32, 0, "swVar.ptr", brDispatchBB);
new StoreInst(CONST_I32(randNumCase), swVarPtr, brDispatchBB);
// 在分发块插入load指令读取switch变量
LoadInst *swVar = new LoadInst(TYPE_I32, swVarPtr, "swVar", false, dispatchBB);
// 在分发块插入switch指令实现基本块的调度
BasicBlock *swDefault = BasicBlock::Create(*CONTEXT, "swDefault", &F, returnBB);
BranchInst::Create(returnBB, swDefault);
SwitchInst *swInst = SwitchInst::Create(swVar, swDefault, 0, dispatchBB);
// 将原基本块插入到返回块之前,并分配case值
for(BasicBlock *BB : origBB){
BB->moveBefore(returnBB);
swInst->addCase(CONST_I32(randNumCase), BB);
randNumCase = rand();
}

// 在每个基本块最后添加修改switch变量的指令和跳转到返回块的指令
for(BasicBlock *BB : origBB){
// retn BB
if(BB->getTerminator()->getNumSuccessors() == 0){
continue;
}
// 非条件跳转
else if(BB->getTerminator()->getNumSuccessors() == 1){
BasicBlock *sucBB = BB->getTerminator()->getSuccessor(0);
BB->getTerminator()->eraseFromParent();
ConstantInt *numCase = swInst->findCaseDest(sucBB);
new StoreInst(numCase, swVarPtr, BB);
BranchInst::Create(returnBB, BB);
}
// 条件跳转
else if(BB->getTerminator()->getNumSuccessors() == 2){
ConstantInt *numCaseTrue = swInst->findCaseDest(BB->getTerminator()->getSuccessor(0));
ConstantInt *numCaseFalse = swInst->findCaseDest(BB->getTerminator()->getSuccessor(1));
BranchInst *br = cast<BranchInst>(BB->getTerminator());
SelectInst *sel = SelectInst::Create(br->getCondition(), numCaseTrue, numCaseFalse, "", BB->getTerminator());
BB->getTerminator()->eraseFromParent();
new StoreInst(sel, swVarPtr, BB);
BranchInst::Create(returnBB, BB);
}
}
fixStack(F);
}

char Flattening::ID = 0;
static RegisterPass<Flattening> X("fla", "Flatten the basic blocks in each function.");

Transform/src/Utils.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include "Utils.h"
#include <vector>
#include "llvm/IR/Instructions.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/ValueMapper.h"
#include "llvm/Transforms/Utils/Cloning.h"

using std::vector;
using namespace llvm;

void llvm::fixStack(Function &F){
vector<PHINode*> origPHI;
vector<Instruction*> origReg;
BasicBlock &entryBB = F.getEntryBlock();
for(BasicBlock &BB:F){
for(Instruction &I: BB){
if(PHINode *PN = dyn_cast<PHINode>(&I)){
origPHI.push_back(PN);
}else if( !(isa<AllocaInst>(&I) && I.getParent() == &entryBB)
// 如果在入口块中进行 alloca,则不是逃逸变量
&& I.isUsedOutsideOfBlock(&BB))
// 不满足上面的条件,又在当前块外引用,则是逃逸变量
{
origReg.push_back(&I);
}
}
}
for(PHINode *PH: origPHI){
DemotePHIToStack(PH, entryBB.getTerminator());
}
for(Instruction *I: origReg){
DemoteRegToStack(*I, entryBB.getTerminator());
// LLVM 中变量又叫做虚拟寄存器
}
}

在 CMakLists.txt 中添加 Utils.cpp 和 Flattening.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
project(OLLVM++)
cmake_minimum_required(VERSION 3.13.4)
find_package(LLVM REQUIRED CONFIG)

list(APPEND CMAKE_MODULE_PATH "${LLVM_CMAKE_DIR}")
include(AddLLVM)
include_directories("./include") # 包含 ./include 文件夹中的头文件
separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
add_definitions(${LLVM_DEFINITIONS_LIST})
include_directories(${LLVM_INCLUDE_DIRS})
add_llvm_library( LLVMObfuscator MODULE
src/HelloWorld.cpp
src/SplitBasicBlock.cpp
src/Flattening.cpp
src/Utils.cpp
)

修改 Build.sh 脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cd ./Build
cmake ../Transforms
make
cd ../Test
clang -S -emit-llvm TestProgram.cpp -o IR/TestProgram.ll

echo "-----Hello World Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -hlw -S IR/TestProgram.ll -o IR/TestProgram_hlw.ll
clang IR/TestProgram_hlw.ll -o Bin/TestProgram_hlw
./Bin/TestProgram_hlw flag{s1mpl3_11vm_d3m0}

echo "-----Split Basic Block Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -split --split_num 5 -S IR/TestProgram.ll -o IR/TestProgram_split.ll
clang IR/TestProgram_split.ll -o Bin/TestProgram_split
./Bin/TestProgram_split flag{s1mpl3_11vm_d3m0}

echo "-----Control Flow Flattening Test-----"
# 首先进行 lowerswitch 处理,见 Flattening.cpp 的注释
opt -enable-new-pm=0 -lowerswitch -S IR/TestProgram.ll -o IR/TestProgram_lowerswitch.ll
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -fla IR/TestProgram_lowerswitch.ll -o IR/TestProgram_fla.ll
clang IR/TestProgram_fla.ll -o Bin/TestProgram_fla
./Bin/TestProgram_fla flag{s1mpl3_11vm_d3m0}

编译完成后使用 IDA 观察混淆后的控制流。

虚假控制流(Bogus Control Flow)

指的是向正常控制流中插入若干不可达基本块和由不透明谓词造成的虚假跳转,产生大量垃圾代码来干扰分析的混淆。它以基本块为单位进行混淆,每个基本块要经过分裂、克隆、构造虚假跳转等操作。

步骤

  • 基本块拆分。将基本块拆分为头部、中部和尾部三个基本块。通过 getFirstNonPHI 函数获取一条指令,该指令之后没有 PHI 指令,以该指令作为头部和中部的界限进行分割;源基本块的终结指令为中部和尾部的界限进行分割。
  • 克隆中部的基本块。注意对原基本块变量的引用需要修改,使用 CloneBasicBlock 的 ValueToValueMap 返回值进行修改。
  • (可选)对克隆后的基本块进行变异,插入一些随机指令。OLLVM 中有这个操作。
  • 构造虚假跳转。将头部到中部和中部到尾部的绝对跳转修改为条件跳转,并添加克隆的中部基本块到原中部基本块的绝对跳转。

实现

新建 Transform/src/BogusControlFlow.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Transforms/Utils/ValueMapper.h"
#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Instructions.h"
#include "SplitBasicBlock.h"
#include "Utils.h"
#include <vector>
#include <cstdlib>
#include <ctime>
using std::vector;
using namespace llvm;

// 混淆次数,混淆次数越多混淆结果越复杂
static cl::opt<int> obfuTimes("bcf_loop", cl::init(1), cl::desc("Obfuscate a function <bcf_loop> time(s)."));

namespace{
class BogusControlFlow : public FunctionPass {
public:
static char ID;
BogusControlFlow() : FunctionPass(ID) {
srand(time(NULL));
}

bool runOnFunction(Function &F);

// 对基本块 BB 进行混淆
void bogus(BasicBlock *BB);

// 创建条件恒为真的 ICmpInst*
// 该比较指令的条件为:y < 10 || x * (x + 1) % 2 == 0
// 其中 x, y 为恒为0的全局变量
Value* createBogusCmp(BasicBlock *insertAfter);

llvm::LLVMContext *CONTEXT;
};

}

bool BogusControlFlow::runOnFunction(Function &F){
INIT_CONTEXT(F);

for(int i = 0;i < obfuTimes;i ++){
vector<BasicBlock*> origBB;
for(BasicBlock &BB : F){
origBB.push_back(&BB);
}
for(BasicBlock *BB : origBB){
bogus(BB);
}
}
return true;
}

Value* BogusControlFlow::createBogusCmp(BasicBlock *insertAfter){
// if((y < 10 || x * (x + 1) % 2 == 0))
// 等价于 if(true)
Module *M = insertAfter->getModule();
GlobalVariable *xptr = new GlobalVariable(*M, TYPE_I32, false, GlobalValue::CommonLinkage, CONST_I32(0), "x");
GlobalVariable *yptr = new GlobalVariable(*M, TYPE_I32, false, GlobalValue::CommonLinkage, CONST_I32(0), "y");
LoadInst *x = new LoadInst(TYPE_I32, xptr, "", insertAfter);
LoadInst *y = new LoadInst(TYPE_I32, yptr, "", insertAfter);
ICmpInst *cond1 = new ICmpInst(*insertAfter, CmpInst::ICMP_SLT, y, CONST_I32(10));
BinaryOperator *op1 = BinaryOperator::CreateAdd(x, CONST_I32(1), "", insertAfter);
BinaryOperator *op2 = BinaryOperator::CreateMul(op1, x, "", insertAfter);
BinaryOperator *op3 = BinaryOperator::CreateURem(op2, CONST_I32(2), "", insertAfter);
ICmpInst *cond2 = new ICmpInst(*insertAfter, CmpInst::ICMP_EQ, op3, CONST_I32(0));
return BinaryOperator::CreateOr(cond1, cond2, "", insertAfter);
}

void BogusControlFlow::bogus(BasicBlock *entryBB){
// 第一步,拆分得到 entryBB, bodyBB, endBB
// 其中所有的 PHI 指令都在 entryBB(如果有的话)
// endBB 只包含一条终结指令
BasicBlock *bodyBB = entryBB->splitBasicBlock(entryBB->getFirstNonPHI(), "bodyBB");
BasicBlock *endBB = bodyBB->splitBasicBlock(bodyBB->getTerminator(), "endBB");

// 第二步,克隆 bodyBB 得到克隆块 cloneBB
BasicBlock *cloneBB = createCloneBasicBlock(bodyBB);

// 第三步,构造虚假跳转
// 1. 将 entryBB, bodyBB, cloneBB 末尾的绝对跳转移除
entryBB->getTerminator()->eraseFromParent();
bodyBB->getTerminator()->eraseFromParent();
cloneBB->getTerminator()->eraseFromParent();
// 2. 在 entryBB 和 bodyBB 的末尾插入条件恒为真的虚假比较指令
Value *cond1 = createBogusCmp(entryBB);
Value *cond2 = createBogusCmp(bodyBB);
// 3. 将 entryBB 到 bodyBB 的绝对跳转改为条件跳转
BranchInst::Create(bodyBB, cloneBB, cond1, entryBB);
// 4. 将 bodyBB 到 endBB的绝对跳转改为条件跳转
BranchInst::Create(endBB, cloneBB, cond2, bodyBB);
// 5. 添加 bodyBB.clone 到 bodyBB 的绝对跳转
BranchInst::Create(bodyBB, cloneBB);
}

char BogusControlFlow::ID = 0;
static RegisterPass<BogusControlFlow> X("bcf", "Add bogus control flow to each function.");

在 Utils.h 中添加 createCloneBasicBlock 函数的声明:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#pragma once

#include "llvm/IR/Function.h"
#include "llvm/IR/Constants.h"

#define INIT_CONTEXT(F) CONTEXT=&F.getContext()
#define TYPE_I32 Type::getInt32Ty(*CONTEXT)
#define CONST_I32(V) ConstantInt::get(TYPE_I32, V,false)

// extern llvm::LLVMContext *CONTEXT;
// 在这里声明会出现 Undefined Symbol CONTEXT 错误
// 因此在各个 Pass 的成员中声明 CONTEXT

namespace llvm{
void fixStack(Function &F);
BasicBlock* createCloneBasicBlock(BasicBlock *BB);
}

并在 Utils.cpp 中添加实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#include "Utils.h"
#include <vector>
#include "llvm/IR/Instructions.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/ValueMapper.h"
#include "llvm/Transforms/Utils/Cloning.h"

using std::vector;
using namespace llvm;

void llvm::fixStack(Function &F){
vector<PHINode*> origPHI;
vector<Instruction*> origReg;
BasicBlock &entryBB = F.getEntryBlock();
for(BasicBlock &BB:F){
for(Instruction &I: BB){
if(PHINode *PN = dyn_cast<PHINode>(&I)){
origPHI.push_back(PN);
}else if( !(isa<AllocaInst>(&I) && I.getParent() == &entryBB)
// 如果在入口块中进行 alloca,则不是逃逸变量
&& I.isUsedOutsideOfBlock(&BB))
// 不满足上面的条件,又在当前块外引用,则是逃逸变量
{
origReg.push_back(&I);
}
}
}
for(PHINode *PH: origPHI){
DemotePHIToStack(PH, entryBB.getTerminator());
}
for(Instruction *I: origReg){
DemoteRegToStack(*I, entryBB.getTerminator());
// LLVM 中变量又叫做虚拟寄存器
}
}

BasicBlock* llvm::createCloneBasicBlock(BasicBlock *BB){
// 克隆之前先修复所有逃逸变量
vector<Instruction*> origReg;
BasicBlock &entryBB = BB->getParent()->getEntryBlock();
for(Instruction &I : *BB){
if(!(isa<AllocaInst>(&I) && I.getParent() == &entryBB)
&& I.isUsedOutsideOfBlock(BB)){
origReg.push_back(&I);
}
}
for(Instruction *I : origReg){
DemoteRegToStack(*I, entryBB.getTerminator());
}
ValueToValueMapTy VMap;
BasicBlock *cloneBB = CloneBasicBlock(BB, VMap, "cloneBB", BB->getParent());
// 对克隆基本块的引用进行修复
for(Instruction &I : *cloneBB){
for(int i = 0;i < I.getNumOperands();i ++){
Value *V = MapValue(I.getOperand(i), VMap);
if(V){
I.setOperand(i, V);
}
}
}
return cloneBB;
}

测试完成后使用 IDA 观察控制流。

指令替代(Instruction Substitution)

将正常的二元运算指令(加减法、位运算等)替换为等效但更加复杂的指令序列,达到混淆计算过程的目的。指令替代不改变控制流,但会使运算过程难以分辨。

加法替换

a = b + c 的替换方案:

  • addNeg,a = b - (-c)
  • addDoubleNeg,a= -(-b + (-c))
  • addRand,r = rand(); a = b + r; a = a + c; a = a - r
  • addRand2,r = rand(); a = b - r; a = a + b; a = a + r

减法替换

a = b - c 的替换方案:

  • subNeg,a = b + (-c)
  • subRand,r = rand(); a = b + r; a = a - c; a = a - r
  • subRand2,r = rand(); a = b - r; a = a - c; a = a + r

与替换

a = b & c 的替换方案:

  • andSubstitute,a = (b ^ ~c) & b
  • andSubstituteRand,a = (b | ~c) & (r | ~r)

或替换

a = b | c 的替换方案:

  • orSubstitute,a = (b & c) | (b ^ c)
  • orSubsitituteRand,a = (b & ~c) & (r | ~r)

异或替换

a = b ^ c 的替换方案:

  • xorSubstitute,a = (~a & b) | (a & ~b)
  • xorSubstituteRand,a = (b ^ r) ^ (c ^ r) <=> a = (b & r | b & ~r) ^ (c & r | c & ~r)

乘法替换(待续)

实现

添加 Transform/src/Substitution.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/Instructions.h"
#include "llvm/Support/CommandLine.h"
#include "Utils.h"
#include <vector>
#include <cstdlib>
#include <ctime>
using namespace llvm;
using std::vector;

#define NUMBER_ADD_SUBST 4
#define NUMBER_SUB_SUBST 3
#define NUMBER_AND_SUBST 2
#define NUMBER_OR_SUBST 2
#define NUMBER_XOR_SUBST 2

// 混淆次数,混淆次数越多混淆结果越复杂
static cl::opt<int> ObfuTime("sub_loop", cl::init(1), cl::desc("Obfuscate a function <obfu_time> time(s)."));

namespace{

class Substitution : public FunctionPass {
public:
static char ID;
Substitution() : FunctionPass(ID) {
srand(time(NULL));
}

bool runOnFunction(Function &F);

void substitute(BinaryOperator *BI);

// 替换 Add 指令
void substituteAdd(BinaryOperator *BI);
// 加法替换:a = b + c -> a = b - (-c)
void addNeg(BinaryOperator *BI);
// 加法替换:a = b + c -> a = -(-b + (-c))
void addDoubleNeg(BinaryOperator *BI);
// 加法替换:a = b + c -> r = rand (); a = b + r; a = a + c; a = a - r
void addRand(BinaryOperator *BI);
// 加法替换:a = b + c -> r = rand (); a = b - r; a = a + b; a = a + r
void addRand2(BinaryOperator *BI);

// 替换 Sub 指令
void substituteSub(BinaryOperator *BI);
// 减法替换:a = b - c -> a = b + (-c)
void subNeg(BinaryOperator *BI);
// 减法替换:a = b - c -> r = rand (); a = b + r; a = a - c; a = a - r
void subRand(BinaryOperator *BI);
// 减法替换:a = b - c -> a = b - r; a = a - c; a = a + r
void subRand2(BinaryOperator *BI);

// 替换 And 指令
void substituteAnd(BinaryOperator *BI);
// 与替换:a = b & c -> a = (b ^ ~c) & b
void andSubstitute(BinaryOperator *BI);
// 与替换:a = b & c -> a = ~(~b | ~c) & (r | ~r)
void andSubstituteRand(BinaryOperator *BI);

// 替换 Or 指令
void substituteOr(BinaryOperator *BI);
// 或替换:a = b | c -> a = (b & c) | (b ^ c)
void orSubstitute(BinaryOperator *BI);
// 或替换:a = b | c -> a = ~(~b & ~c) & (r | ~r)
void orSubstituteRand(BinaryOperator *BI);

// 替换 Xor 指令
void substituteXor(BinaryOperator *BI);
// 异或替换:a = b ^ c -> a = ~b & c | b & ~c
void xorSubstitute(BinaryOperator *BI);
// 异或替换:a = b ^ c -> (b ^ r) ^ (c ^ r) <=> (~b & r | b & ~r) ^ (~c & r | c & ~r)
void xorSubstituteRand(BinaryOperator *BI);
};
}

bool Substitution::runOnFunction(Function &F){
for(int i = 0;i < ObfuTime;i ++){
for(BasicBlock &BB : F){
vector<Instruction*> origInst;
for(Instruction &I : BB){
origInst.push_back(&I);
}
for(Instruction *I : origInst){
if(isa<BinaryOperator>(I)){
BinaryOperator *BI = cast<BinaryOperator>(I);
substitute(BI);
}
}
}
}
}

void Substitution::substitute(BinaryOperator *BI){
bool flag = true;
switch (BI->getOpcode()) {
case BinaryOperator::Add:
substituteAdd(BI);
break;
case BinaryOperator::Sub:
substituteSub(BI);
break;
case BinaryOperator::And:
substituteAnd(BI);
break;
case BinaryOperator::Or:
substituteOr(BI);
break;
case BinaryOperator::Xor:
substituteXor(BI);
break;
default:
flag = false;
break;
}
if(flag){
BI->eraseFromParent();
}
}


void Substitution::substituteAdd(BinaryOperator *BI){
int choice = rand() % NUMBER_ADD_SUBST;
switch (choice) {
case 0:
addNeg(BI);
break;
case 1:
addDoubleNeg(BI);
break;
case 2:
addRand(BI);
break;
case 3:
addRand2(BI);
break;
default:
break;
}
}

void Substitution::addNeg(BinaryOperator *BI){
BinaryOperator *op;
op = BinaryOperator::CreateNeg(BI->getOperand(1), "", BI);
op = BinaryOperator::CreateSub(BI->getOperand(0), op, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::addDoubleNeg(BinaryOperator *BI){
BinaryOperator *op, *op1, *op2;
op1 = BinaryOperator::CreateNeg(BI->getOperand(0), "", BI);
op2 = BinaryOperator::CreateNeg(BI->getOperand(1), "", BI);
op = BinaryOperator::CreateAdd(op1, op2, "", BI);
op = BinaryOperator::CreateNeg(op, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::addRand(BinaryOperator *BI){
ConstantInt *r = (ConstantInt*)CONST(BI->getType(), rand());
BinaryOperator *op, *op1, *op2;
op = BinaryOperator::CreateAdd(BI->getOperand(0), r, "", BI);
op = BinaryOperator::CreateAdd(op, BI->getOperand(1), "", BI);
op = BinaryOperator::CreateSub(op, r, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::addRand2(BinaryOperator *BI){
ConstantInt *r = (ConstantInt*)CONST(BI->getType(), rand());
BinaryOperator *op, *op1, *op2;
op = BinaryOperator::CreateSub(BI->getOperand(0), r, "", BI);
op = BinaryOperator::CreateAdd(op, BI->getOperand(1), "", BI);
op = BinaryOperator::CreateAdd(op, r, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::substituteSub(BinaryOperator *BI){
int choice = rand() % NUMBER_SUB_SUBST;
switch (choice) {
case 0:
subNeg(BI);
break;
case 1:
subRand(BI);
break;
case 2:
subRand2(BI);
break;
default:
break;
}
}

void Substitution::subNeg(BinaryOperator *BI){
BinaryOperator *op;
op = BinaryOperator::CreateNeg(BI->getOperand(1), "", BI);
op = BinaryOperator::CreateAdd(BI->getOperand(0), op, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::subRand(BinaryOperator *BI){
ConstantInt *r = (ConstantInt*)CONST(BI->getType(), rand());
BinaryOperator *op, *op1, *op2;
op = BinaryOperator::CreateAdd(BI->getOperand(0), r, "", BI);
op = BinaryOperator::CreateSub(op, BI->getOperand(1), "", BI);
op = BinaryOperator::CreateSub(op, r, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::subRand2(BinaryOperator *BI){
ConstantInt *r = (ConstantInt*)CONST(BI->getType(), rand());
BinaryOperator *op, *op1, *op2;
op = BinaryOperator::CreateSub(BI->getOperand(0), r, "", BI);
op = BinaryOperator::CreateSub(op, BI->getOperand(1), "", BI);
op = BinaryOperator::CreateAdd(op, r, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::substituteXor(BinaryOperator *BI){
int choice = rand() % NUMBER_XOR_SUBST;
switch (choice) {
case 0:
xorSubstitute(BI);
break;
case 1:
xorSubstituteRand(BI);
break;
default:
break;
}
}

void Substitution::xorSubstitute(BinaryOperator *BI){
BinaryOperator *op, *op1, *op2, *op3;
op1 = BinaryOperator::CreateNot(BI->getOperand(0), "", BI);
op1 = BinaryOperator::CreateAnd(op1, BI->getOperand(1), "", BI);
op2 = BinaryOperator::CreateNot(BI->getOperand(1), "", BI);
op2 = BinaryOperator::CreateAnd(BI->getOperand(0), op2, "", BI);
op = BinaryOperator::CreateOr(op1, op2, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::xorSubstituteRand(BinaryOperator *BI){
ConstantInt *r = (ConstantInt*)CONST(BI->getType(), rand());
BinaryOperator *op, *op1, *op2, *op3;
op1 = BinaryOperator::CreateNot(BI->getOperand(0), "", BI);
op1 = BinaryOperator::CreateAnd(op1, r, "", BI);
op2 = BinaryOperator::CreateNot(r, "", BI);
op2 = BinaryOperator::CreateAnd(BI->getOperand(0), op2, "", BI);
op = BinaryOperator::CreateOr(op1, op2, "", BI);
op1 = BinaryOperator::CreateNot(BI->getOperand(1), "", BI);
op1 = BinaryOperator::CreateAnd(op1, r, "", BI);
op2 = BinaryOperator::CreateNot(r, "", BI);
op2 = BinaryOperator::CreateAnd(BI->getOperand(1), op2, "", BI);
op3 = BinaryOperator::CreateOr(op1, op2, "", BI);
op = BinaryOperator::CreateXor(op, op3, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::substituteAnd(BinaryOperator *BI){
int choice = rand() % NUMBER_AND_SUBST;
switch (choice) {
case 0:
andSubstitute(BI);
break;
case 1:
andSubstituteRand(BI);
break;
default:
break;
}
}

void Substitution::andSubstitute(BinaryOperator *BI){
BinaryOperator *op;
op = BinaryOperator::CreateNot(BI->getOperand(1), "", BI);
op = BinaryOperator::CreateXor(BI->getOperand(0), op, "", BI);
op = BinaryOperator::CreateAnd(op, BI->getOperand(0), "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::andSubstituteRand(BinaryOperator *BI){
ConstantInt *r = (ConstantInt*)CONST(BI->getType(), rand());
BinaryOperator *op, *op1;
op = BinaryOperator::CreateNot(BI->getOperand(0), "", BI);
op1 = BinaryOperator::CreateNot(BI->getOperand(1), "", BI);
op = BinaryOperator::CreateOr(op, op1, "", BI);
op = BinaryOperator::CreateNot(op, "", BI);
op1 = BinaryOperator::CreateNot(r, "", BI);
op1 = BinaryOperator::CreateOr(r, op1, "", BI);
op = BinaryOperator::CreateAnd(op, op1, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::substituteOr(BinaryOperator *BI){
int choice = rand() % NUMBER_OR_SUBST;
switch (choice) {
case 0:
orSubstitute(BI);
break;
case 1:
orSubstituteRand(BI);
break;
default:
break;
}
}

void Substitution::orSubstitute(BinaryOperator *BI){
BinaryOperator *op, *op1;
op = BinaryOperator::CreateAnd(BI->getOperand(0), BI->getOperand(1), "", BI);
op1 = BinaryOperator::CreateXor(BI->getOperand(0), BI->getOperand(1), "", BI);
op = BinaryOperator::CreateOr(op, op1, "", BI);
BI->replaceAllUsesWith(op);
}

void Substitution::orSubstituteRand(BinaryOperator *BI){
ConstantInt *r = (ConstantInt*)CONST(BI->getType(), rand());
BinaryOperator *op, *op1;
op = BinaryOperator::CreateNot(BI->getOperand(0), "", BI);
op1 = BinaryOperator::CreateNot(BI->getOperand(1), "", BI);
op = BinaryOperator::CreateAnd(op, op1, "", BI);
op = BinaryOperator::CreateNot(op, "", BI);
op1 = BinaryOperator::CreateNot(r, "", BI);
op1 = BinaryOperator::CreateOr(r, op1, "", BI);
op = BinaryOperator::CreateAnd(op, op1, "", BI);
BI->replaceAllUsesWith(op);
}

char Substitution::ID = 0;
static RegisterPass<Substitution> X("sub", "Replace a binary instruction with equivalent instructions.");

在 Utils.h 中加入 CONST 宏:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#pragma once

#include "llvm/IR/Function.h"
#include "llvm/IR/Constants.h"

#define INIT_CONTEXT(F) CONTEXT=&F.getContext()
#define TYPE_I32 Type::getInt32Ty(*CONTEXT)
#define CONST_I32(V) ConstantInt::get(TYPE_I32, V,false)
#define CONST(T, V) ConstantInt::get(T, V,false)

// extern llvm::LLVMContext *CONTEXT;
// 在这里声明会出现 Undefined Symbol CONTEXT 错误
// 因此在各个 Pass 的成员中声明 CONTEXT

namespace llvm{
void fixStack(Function &F);
BasicBlock* createCloneBasicBlock(BasicBlock *BB);
}

在 test.sh 中增加测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cd ./Build
cmake ../Transforms
make
cd ../Test
clang -S -emit-llvm TestProgram.cpp -o IR/TestProgram.ll

echo "-----Hello World Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -hlw -S IR/TestProgram.ll -o IR/TestProgram_hlw.ll
clang IR/TestProgram_hlw.ll -o Bin/TestProgram_hlw
./Bin/TestProgram_hlw flag{s1mpl3_11vm_d3m0}

echo "-----Split Basic Block Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -split --split_num 5 -S IR/TestProgram.ll -o IR/TestProgram_split.ll
claneeg IR/TestProgram_split.ll -o Bin/TestProgram_split
./Bin/TestProgram_split flag{s1mpl3_11vm_d3m0}

echo "-----Control Flow Flattening Test-----"
opt -enable-new-pm=0 -lowerswitch -S IR/TestProgram.ll -o IR/TestProgram_lowerswitch.ll
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -fla IR/TestProgram_lowerswitch.ll -o IR/TestProgram_fla.ll
clang IR/TestProgram_fla.ll -o Bin/TestProgram_fla
./Bin/TestProgram_fla flag{s1mpl3_11vm_d3m0}

echo "-----Bogus Control Flow Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -bcf IR/TestProgram.ll -o IR/TestProgram_bcf.ll
clange/TestProgram_bcf flag{s1mpl3_11vm_d3m0}

echo "-----Instruction Substitution Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -bcf IR/TestProgram.ll -o IR/TestProgram_sub.ll
clang IR/TestProgram_sub.ll -o Bin/TestProgram_sub
./Bin/TestProgram_sub flag{s1mpl3_11vm_d3m0}

编译完成后,比较混淆前后的 encrypt 函数。混淆前:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
.text:0000000000401180 ; =============== S U B R O U T I N E =======================================
.text:0000000000401180
.text:0000000000401180 ; Attributes: bp-based frame
.text:0000000000401180
.text:0000000000401180 ; __int64 __fastcall encrypt(unsigned __int8 *, char *)
.text:0000000000401180 public _Z7encryptPhPc
.text:0000000000401180 _Z7encryptPhPc proc near ; CODE XREF: main+B6↓p
.text:0000000000401180
.text:0000000000401180 var_18 = dword ptr -18h
.text:0000000000401180 var_14 = dword ptr -14h
.text:0000000000401180 s = qword ptr -10h
.text:0000000000401180 var_8 = qword ptr -8
.text:0000000000401180
.text:0000000000401180 ; __unwind {
.text:0000000000401180 push rbp
.text:0000000000401181 mov rbp, rsp
.text:0000000000401184 sub rsp, 20h
.text:0000000000401188 mov [rbp+var_8], rdi
.text:000000000040118C mov [rbp+s], rsi
.text:0000000000401190 mov rdi, [rbp+s] ; s
.text:0000000000401194 call _strlen
.text:0000000000401199 mov ecx, eax
.text:000000000040119B mov [rbp+var_14], ecx
.text:000000000040119E mov [rbp+var_18], 0
.text:00000000004011A5
.text:00000000004011A5 loc_4011A5: ; CODE XREF: encrypt(uchar *,char *)+62↓j
.text:00000000004011A5 mov eax, [rbp+var_18]
.text:00000000004011A8 cmp eax, [rbp+var_14]
.text:00000000004011AB jge loc_4011E7
.text:00000000004011B1 mov eax, 20h ; ' '
.text:00000000004011B6 mov rcx, [rbp+s]
.text:00000000004011BA movsxd rdx, [rbp+var_18]
.text:00000000004011BE movsx esi, byte ptr [rcx+rdx]
.text:00000000004011C2 sub eax, [rbp+var_18]
.text:00000000004011C5 add esi, eax
.text:00000000004011C7 xor esi, [rbp+var_18]
.text:00000000004011CA mov dil, sil
.text:00000000004011CD mov rcx, [rbp+var_8]
.text:00000000004011D1 movsxd rdx, [rbp+var_18]
.text:00000000004011D5 mov [rcx+rdx], dil
.text:00000000004011D9 mov eax, [rbp+var_18]
.text:00000000004011DC add eax, 1
.text:00000000004011DF mov [rbp+var_18], eax
.text:00000000004011E2 jmp loc_4011A5
.text:00000000004011E7 ; ---------------------------------------------------------------------------
.text:00000000004011E7
.text:00000000004011E7 loc_4011E7: ; CODE XREF: encrypt(uchar *,char *)+2B↑j
.text:00000000004011E7 add rsp, 20h
.text:00000000004011EB pop rbp
.text:00000000004011EC retn
.text:00000000004011EC ; } // starts at 401180
.text:00000000004011EC _Z7encryptPhPc endp

混淆后:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
.text:0000000000401174 ; ---------------------------------------------------------------------------
.text:0000000000401176 align 20h
.text:0000000000401180
.text:0000000000401180 ; =============== S U B R O U T I N E =======================================
.text:0000000000401180
.text:0000000000401180 ; Attributes: bp-based frame
.text:0000000000401180
.text:0000000000401180 ; __int64 __fastcall encrypt(unsigned __int8 *, char *)
.text:0000000000401180 public _Z7encryptPhPc
.text:0000000000401180 _Z7encryptPhPc proc near ; CODE XREF: main+C2↓p
.text:0000000000401180
.text:0000000000401180 var_40 = dword ptr -40h
.text:0000000000401180 var_3C = dword ptr -3Ch
.text:0000000000401180 s = qword ptr -38h
.text:0000000000401180 var_30 = qword ptr -30h
.text:0000000000401180
.text:0000000000401180 ; __unwind {
.text:0000000000401180 push rbp
.text:0000000000401181 mov rbp, rsp
.text:0000000000401184 push r15
.text:0000000000401186 push r14
.text:0000000000401188 push r13
.text:000000000040118A push r12
.text:000000000040118C push rbx
.text:000000000040118D sub rsp, 18h
.text:0000000000401191 mov [rbp+var_30], rdi
.text:0000000000401195 mov [rbp+s], rsi
.text:0000000000401199 mov rdi, [rbp+s] ; s
.text:000000000040119D call _strlen
.text:00000000004011A2 mov ecx, eax
.text:00000000004011A4 mov [rbp+var_3C], ecx
.text:00000000004011A7 mov [rbp+var_40], 0
.text:00000000004011AE
.text:00000000004011AE loc_4011AE: ; CODE XREF: encrypt(uchar *,char *)+98D↓j
.text:00000000004011AE mov eax, [rbp+var_40]
.text:00000000004011B1 cmp eax, [rbp+var_3C]
.text:00000000004011B4 jge loc_401B12
.text:00000000004011BA mov eax, 0FFFFFFFFh
.text:00000000004011BF xor ecx, ecx
.text:00000000004011C1 mov edx, 0B7BAA771h
.text:00000000004011C6 mov esi, 4F61C73Dh
.text:00000000004011CB mov rdi, [rbp+s]
.text:00000000004011CF movsxd r8, [rbp+var_40]
.text:00000000004011D3 movsx r9d, byte ptr [rdi+r8]
.text:00000000004011D8 mov r10d, [rbp+var_40]
.text:00000000004011DC add esi, 20h ; ' '
.text:00000000004011DF sub esi, 0D202945Bh
.text:00000000004011E5 sub esi, 4F61C73Dh
.text:00000000004011EB add esi, 78BFC784h
.text:00000000004011F1 add esi, 8376570Dh
.text:00000000004011F7 sub esi, 78BFC784h
.text:00000000004011FD add esi, 0F9A40F84h
.text:0000000000401203 add esi, 0D202945Bh
.text:0000000000401209 sub esi, 0F9A40F84h
.text:000000000040120F mov r11d, ecx
.text:0000000000401212 sub r11d, 0DAD96671h
.text:0000000000401219 sub r11d, r10d
.text:000000000040121C add r11d, 0DAD96671h
.text:0000000000401223 sub esi, 866B8B02h
.text:0000000000401229 add esi, r11d
.text:000000000040122C add esi, 866B8B02h
.text:0000000000401232 mov r10d, ecx
.text:0000000000401235 sub r10d, 6229D27Eh
.text:000000000040123C add esi, r10d
.text:000000000040123F sub esi, 0AF76AEF6h
.text:0000000000401245 sub esi, 8376570Dh
.text:000000000040124B add esi, 0AF76AEF6h
.text:0000000000401251 sub esi, 1BB9C015h
.text:0000000000401257 add esi, 6229D27Eh
.text:000000000040125D add esi, 1BB9C015h
.text:0000000000401263 add edx, 0
.text:0000000000401266 sub edx, 8FA4185Eh
.text:000000000040126C sub edx, 0B7BAA771h
.text:0000000000401272 mov r10d, ecx
.text:0000000000401275 sub r10d, esi
.text:0000000000401278 add edx, r10d
.text:000000000040127B mov esi, ecx
.text:000000000040127D sub esi, 8FA4185Eh
.text:0000000000401283 sub edx, esi
.text:0000000000401285 mov esi, ecx
.text:0000000000401287 sub esi, edx
.text:0000000000401289 add esi, 0
.text:000000000040128C sub ecx, esi
.text:000000000040128E sub r9d, ecx
.text:0000000000401291 mov ecx, [rbp+var_40]
.text:0000000000401294 mov edx, r9d
.text:0000000000401297 xor edx, 0FFFFFFFFh
.text:000000000040129A and edx, 0FFFFFFFFh
.text:00000000004012A0 mov esi, eax
.text:00000000004012A2 xor esi, 0FFFFFFFFh
.text:00000000004012A8 mov r10d, r9d
.text:00000000004012AB and r10d, esi
.text:00000000004012AE or edx, r10d
.text:00000000004012B1 mov esi, eax
.text:00000000004012B3 xor esi, 9CAA49BAh
.text:00000000004012B9 xor edx, 0FFFFFFFFh
.text:00000000004012BC mov r10d, eax
.text:00000000004012BF xor r10d, 7A781398h
.text:00000000004012C6 or esi, edx
.text:00000000004012C8 or r10d, 7A781398h
.text:00000000004012CF xor esi, 0FFFFFFFFh
.text:00000000004012D2 and esi, r10d
.text:00000000004012D5 mov edx, eax
.text:00000000004012D7 xor edx, 9CAA49BAh
.text:00000000004012DD and edx, 1742B494h
.text:00000000004012E3 mov r10d, eax
.text:00000000004012E6 xor r10d, 1742B494h
.text:00000000004012ED mov r11d, r10d
.text:00000000004012F0 and r11d, 9CAA49BAh
.text:00000000004012F7 mov ebx, eax
.text:00000000004012F9 xor ebx, 0FFFFFFFFh
.text:00000000004012FF and ebx, 1742B494h
.text:0000000000401305 and r10d, 0FFFFFFFFh
.text:000000000040130C or edx, r11d
.text:000000000040130F or ebx, r10d
.text:0000000000401312 xor edx, ebx
.text:0000000000401314 mov r10d, r9d
.text:0000000000401317 xor r10d, 0FFFFFFFFh
.text:000000000040131B mov r11d, edx
.text:000000000040131E xor r11d, 0FFFFFFFFh
.text:0000000000401322 mov ebx, eax
.text:0000000000401324 xor ebx, 6366956Eh
.text:000000000040132A or r10d, r11d
.text:000000000040132D or ebx, 6366956Eh
.text:0000000000401333 xor r10d, 0FFFFFFFFh
.text:0000000000401337 and r10d, ebx
.text:000000000040133A mov r11d, eax
.text:000000000040133D xor r11d, 0FFFFFFFFh
.text:0000000000401344 and r11d, 0D7FDF0BAh
.text:000000000040134B mov ebx, eax
.text:000000000040134D xor ebx, 0D7FDF0BAh
.text:0000000000401353 mov r14d, ebx
.text:0000000000401356 and r14d, 0FFFFFFFFh
.text:000000000040135D mov r15d, eax
.text:0000000000401360 xor r15d, 0FFFFFFFFh
.text:0000000000401367 and r15d, 0D7FDF0BAh
.text:000000000040136E and ebx, 0FFFFFFFFh
.text:0000000000401374 or r11d, r14d
.text:0000000000401377 or r15d, ebx
.text:000000000040137A xor r11d, r15d
.text:000000000040137D xor r11d, 0FFFFFFFFh
.text:0000000000401381 mov ebx, eax
.text:0000000000401383 xor ebx, 9CAA49BAh
.text:0000000000401389 mov r14d, eax
.text:000000000040138C xor r14d, 4547DE05h
.text:0000000000401393 or r11d, ebx
.text:0000000000401396 or r14d, 4547DE05h
.text:000000000040139D xor r11d, 0FFFFFFFFh
.text:00000000004013A1 and r11d, r14d
.text:00000000004013A4 xor edx, 0FFFFFFFFh
.text:00000000004013A7 xor edx, 0FFFFFFFFh
.text:00000000004013AD and edx, 0FFFFFFFFh
.text:00000000004013B0 mov ebx, esi
.text:00000000004013B2 and ebx, r10d
.text:00000000004013B5 xor esi, r10d
.text:00000000004013B8 or ebx, esi
.text:00000000004013BA mov esi, r11d
.text:00000000004013BD xor esi, 0FFFFFFFFh
.text:00000000004013C0 mov r10d, edx
.text:00000000004013C3 xor r10d, 0FFFFFFFFh
.text:00000000004013C7 mov r14d, eax
.text:00000000004013CA xor r14d, 0E19CC5AFh
.text:00000000004013D1 mov r15d, esi
.text:00000000004013D4 and r15d, 0E19CC5AFh
.text:00000000004013DB and r11d, r14d
.text:00000000004013DE mov r12d, r10d
.text:00000000004013E1 and r12d, 0E19CC5AFh
.text:00000000004013E8 and edx, r14d
.text:00000000004013EB or r15d, r11d
.text:00000000004013EE or r12d, edx
.text:00000000004013F1 xor r15d, r12d
.text:00000000004013F4 or esi, r10d
.text:00000000004013F7 xor esi, 0FFFFFFFFh
.text:00000000004013FA or r14d, 0E19CC5AFh
.text:0000000000401401 and esi, r14d
.text:0000000000401404 or r15d, esi
.text:0000000000401407 mov edx, ebx
.text:0000000000401409 xor edx, 0FFFFFFFFh
.text:000000000040140C and edx, 0EFDD26B2h
.text:0000000000401412 mov esi, eax
.text:0000000000401414 xor esi, 0EFDD26B2h
.text:000000000040141A and ebx, esi
.text:000000000040141C mov r10d, r15d
.text:000000000040141F xor r10d, 0FFFFFFFFh
.text:0000000000401423 and r10d, 0EFDD26B2h
.text:000000000040142A and r15d, esi
.text:000000000040142D or edx, ebx
.text:000000000040142F or r10d, r15d
.text:0000000000401432 xor edx, r10d
.text:0000000000401435 mov esi, eax
.text:0000000000401437 xor esi, 0F4761DECh
.text:000000000040143D and esi, 0FFFFFFFFh
.text:0000000000401443 mov r10d, eax
.text:0000000000401446 xor r10d, 0FFFFFFFFh
.text:000000000040144D and r10d, 0F4761DECh
.text:0000000000401454 or esi, r10d
.text:0000000000401457 mov r10d, edx
.text:000000000040145A xor r10d, 0FFFFFFFFh
.text:000000000040145E and r10d, 2E09355h
.text:0000000000401465 mov r11d, eax
.text:0000000000401468 xor r11d, 2E09355h
.text:000000000040146F and edx, r11d
.text:0000000000401472 mov ebx, eax
.text:0000000000401474 xor ebx, 0FFFFFFFFh
.text:000000000040147A and ebx, 2E09355h
.text:0000000000401480 and r11d, 0FFFFFFFFh
.text:0000000000401487 or r10d, edx
.text:000000000040148A or ebx, r11d
.text:000000000040148D xor r10d, ebx
.text:0000000000401490 mov edx, eax
.text:0000000000401492 xor edx, 5CBF33DDh
.text:0000000000401498 and edx, 0FFFFFFFFh
.text:000000000040149E mov r11d, eax
.text:00000000004014A1 xor r11d, 0FFFFFFFFh
.text:00000000004014A8 and r11d, 5CBF33DDh
.text:00000000004014AF or edx, r11d
.text:00000000004014B2 mov r11d, esi
.text:00000000004014B5 and r11d, r10d
.text:00000000004014B8 xor esi, r10d
.text:00000000004014BB or r11d, esi
.text:00000000004014BE mov esi, eax
.text:00000000004014C0 xor esi, 5CBF33DDh
.text:00000000004014C6 mov r10d, edx
.text:00000000004014C9 xor r10d, 0FFFFFFFFh
.text:00000000004014CD mov ebx, eax
.text:00000000004014CF xor ebx, 7DB0FD97h
.text:00000000004014D5 mov r14d, esi
.text:00000000004014D8 and r14d, 7DB0FD97h
.text:00000000004014DF mov r15d, ebx
.text:00000000004014E2 and r15d, 5CBF33DDh
.text:00000000004014E9 mov r12d, r10d
.text:00000000004014EC and r12d, 7DB0FD97h
.text:00000000004014F3 and edx, ebx
.text:00000000004014F5 or r14d, r15d
.text:00000000004014F8 or r12d, edx
.text:00000000004014FB xor r14d, r12d
.text:00000000004014FE or esi, r10d
.text:0000000000401501 xor esi, 0FFFFFFFFh
.text:0000000000401504 or ebx, 7DB0FD97h
.text:000000000040150A and esi, ebx
.text:000000000040150C or r14d, esi
.text:000000000040150F mov edx, r11d
.text:0000000000401512 xor edx, 0FFFFFFFFh
.text:0000000000401515 and edx, 0FFFFFFFFh
.text:000000000040151B mov esi, eax
.text:000000000040151D xor esi, 0FFFFFFFFh
.text:0000000000401523 and r11d, esi
.text:0000000000401526 or edx, r11d
.text:0000000000401529 xor r14d, 0FFFFFFFFh
.text:000000000040152D mov esi, edx
.text:000000000040152F xor esi, r14d
.text:0000000000401532 and esi, edx
.text:0000000000401534 mov edx, eax
.text:0000000000401536 xor edx, 0F4761DECh
.text:000000000040153C and edx, 0FFFFFFFFh
.text:0000000000401542 mov r10d, eax
.text:0000000000401545 xor r10d, 0FFFFFFFFh
.text:000000000040154C and r10d, 0F4761DECh
.text:0000000000401553 or edx, r10d
.text:0000000000401556 xor edx, 0FFFFFFFFh
.text:0000000000401559 xor edx, 0FFFFFFFFh
.text:000000000040155F and edx, 0FFFFFFFFh
.text:0000000000401562 mov r10d, eax
.text:0000000000401565 xor r10d, 0FFFFFFFFh
.text:000000000040156C and r10d, 0FFFFFFFFh
.text:0000000000401573 mov r11d, eax
.text:0000000000401576 xor r11d, 0FFFFFFFFh
.text:000000000040157D and r11d, 0FFFFFFFFh
.text:0000000000401584 or r10d, r11d
.text:0000000000401587 xor r10d, 0FFFFFFFFh
.text:000000000040158B xor r10d, 0F4761DECh
.text:0000000000401592 and r10d, 0F4761DECh
.text:0000000000401599 mov r11d, edx
.text:000000000040159C xor r11d, 0FFFFFFFFh
.text:00000000004015A0 mov ebx, r10d
.text:00000000004015A3 xor ebx, 0FFFFFFFFh
.text:00000000004015A6 mov r14d, eax
.text:00000000004015A9 xor r14d, 8FA9D697h
.text:00000000004015B0 mov r15d, r11d
.text:00000000004015B3 and r15d, 8FA9D697h
.text:00000000004015BA and edx, r14d
.text:00000000004015BD mov r12d, ebx
.text:00000000004015C0 and r12d, 8FA9D697h
.text:00000000004015C7 and r10d, r14d
.text:00000000004015CA or r15d, edx
.text:00000000004015CD or r12d, r10d
.text:00000000004015D0 xor r15d, r12d
.text:00000000004015D3 or r11d, ebx
.text:00000000004015D6 xor r11d, 0FFFFFFFFh
.text:00000000004015DA or r14d, 8FA9D697h
.text:00000000004015E1 and r11d, r14d
.text:00000000004015E4 or r15d, r11d
.text:00000000004015E7 mov edx, r9d
.text:00000000004015EA xor edx, 0FFFFFFFFh
.text:00000000004015ED and edx, 0FFFFFFFFh
.text:00000000004015F3 mov r10d, eax
.text:00000000004015F6 xor r10d, 0FFFFFFFFh
.text:00000000004015FD and r9d, r10d
.text:0000000000401600 or edx, r9d
.text:0000000000401603 mov r9d, r15d
.text:0000000000401606 xor r9d, 0FFFFFFFFh
.text:000000000040160A and r9d, 310A780Dh
.text:0000000000401611 mov r10d, eax
.text:0000000000401614 xor r10d, 310A780Dh
.text:000000000040161B mov r11d, r15d
.text:000000000040161E and r11d, r10d
.text:0000000000401621 mov ebx, eax
.text:0000000000401623 xor ebx, 0FFFFFFFFh
.text:0000000000401629 and ebx, 310A780Dh
.text:000000000040162F and r10d, 0FFFFFFFFh
.text:0000000000401636 or r9d, r11d
.text:0000000000401639 or ebx, r10d
.text:000000000040163C xor r9d, ebx
.text:000000000040163F mov r10d, eax
.text:0000000000401642 xor r10d, 91898CEEh
.text:0000000000401649 and r10d, 0FFFFFFFFh
.text:0000000000401650 mov r11d, eax
.text:0000000000401653 xor r11d, 0FFFFFFFFh
.text:000000000040165A and r11d, 91898CEEh
.text:0000000000401661 or r10d, r11d
.text:0000000000401664 mov r11d, edx
.text:0000000000401667 xor r11d, 0FFFFFFFFh
.text:000000000040166B mov ebx, r9d
.text:000000000040166E xor ebx, 0FFFFFFFFh
.text:0000000000401671 mov r14d, eax
.text:0000000000401674 xor r14d, 1C6B4AEDh
.text:000000000040167B mov r12d, r11d
.text:000000000040167E and r12d, 1C6B4AEDh
.text:0000000000401685 and edx, r14d
.text:0000000000401688 mov r13d, ebx
.text:000000000040168B and r13d, 1C6B4AEDh
.text:0000000000401692 and r9d, r14d
.text:0000000000401695 or r12d, edx
.text:0000000000401698 or r13d, r9d
.text:000000000040169B xor r12d, r13d
.text:000000000040169E or r11d, ebx
.text:00000000004016A1 xor r11d, 0FFFFFFFFh
.text:00000000004016A5 or r14d, 1C6B4AEDh
.text:00000000004016AC and r11d, r14d
.text:00000000004016AF or r12d, r11d
.text:00000000004016B2 mov edx, eax
.text:00000000004016B4 xor edx, 91898CEEh
.text:00000000004016BA mov r9d, r10d
.text:00000000004016BD xor r9d, 0FFFFFFFFh
.text:00000000004016C1 mov r11d, eax
.text:00000000004016C4 xor r11d, 0DE97EC9h
.text:00000000004016CB mov ebx, edx
.text:00000000004016CD and ebx, 0DE97EC9h
.text:00000000004016D3 mov r14d, r11d
.text:00000000004016D6 and r14d, 91898CEEh
.text:00000000004016DD mov r13d, r9d
.text:00000000004016E0 and r13d, 0DE97EC9h
.text:00000000004016E7 and r10d, r11d
.text:00000000004016EA or ebx, r14d
.text:00000000004016ED or r13d, r10d
.text:00000000004016F0 xor ebx, r13d
.text:00000000004016F3 or edx, r9d
.text:00000000004016F6 xor edx, 0FFFFFFFFh
.text:00000000004016F9 or r11d, 0DE97EC9h
.text:0000000000401700 and edx, r11d
.text:0000000000401703 or ebx, edx
.text:0000000000401705 mov edx, r12d
.text:0000000000401708 xor edx, 0FFFFFFFFh
.text:000000000040170B and edx, 0F4C83E5Eh
.text:0000000000401711 mov r9d, eax
.text:0000000000401714 xor r9d, 0F4C83E5Eh
.text:000000000040171B and r12d, r9d
.text:000000000040171E mov r10d, eax
.text:0000000000401721 xor r10d, 0FFFFFFFFh
.text:0000000000401728 and r10d, 0F4C83E5Eh
.text:000000000040172F and r9d, 0FFFFFFFFh
.text:0000000000401736 or edx, r12d
.text:0000000000401739 or r10d, r9d
.text:000000000040173C xor edx, r10d
.text:000000000040173F xor edx, 0FFFFFFFFh
.text:0000000000401742 xor ebx, 0FFFFFFFFh
.text:0000000000401745 mov r9d, eax
.text:0000000000401748 xor r9d, 0C0FE19B9h
.text:000000000040174F or edx, ebx
.text:0000000000401751 or r9d, 0C0FE19B9h
.text:0000000000401758 xor edx, 0FFFFFFFFh
.text:000000000040175B and edx, r9d
.text:000000000040175E mov r9d, ecx
.text:0000000000401761 xor r9d, 0FFFFFFFFh
.text:0000000000401765 and r9d, 0E690A677h
.text:000000000040176C mov r10d, eax
.text:000000000040176F xor r10d, 0E690A677h
.text:0000000000401776 mov r11d, ecx
.text:0000000000401779 and r11d, r10d
.text:000000000040177C mov ebx, eax
.text:000000000040177E xor ebx, 0FFFFFFFFh
.text:0000000000401784 and ebx, 0E690A677h
.text:000000000040178A and r10d, 0FFFFFFFFh
.text:0000000000401791 or r9d, r11d
.text:0000000000401794 or ebx, r10d
.text:0000000000401797 xor r9d, ebx
.text:000000000040179A mov r10d, eax
.text:000000000040179D xor r10d, 0FFFFFFFFh
.text:00000000004017A4 xor r9d, 0FFFFFFFFh
.text:00000000004017A8 mov r11d, eax
.text:00000000004017AB xor r11d, 43B73160h
.text:00000000004017B2 or r10d, r9d
.text:00000000004017B5 or r11d, 43B73160h
.text:00000000004017BC xor r10d, 0FFFFFFFFh
.text:00000000004017C0 and r10d, r11d
.text:00000000004017C3 mov r9d, eax
.text:00000000004017C6 xor r9d, 0FFFFFFFFh
.text:00000000004017CD and r9d, 0FFFFFFFFh
.text:00000000004017D4 mov r11d, eax
.text:00000000004017D7 xor r11d, 0FFFFFFFFh
.text:00000000004017DE and r11d, 0FFFFFFFFh
.text:00000000004017E5 or r9d, r11d
.text:00000000004017E8 mov r11d, ecx
.text:00000000004017EB xor r11d, 0FFFFFFFFh
.text:00000000004017EF xor r9d, 0FFFFFFFFh
.text:00000000004017F3 mov ebx, eax
.text:00000000004017F5 xor ebx, 91453A87h
.text:00000000004017FB or r11d, r9d
.text:00000000004017FE or ebx, 91453A87h
.text:0000000000401804 xor r11d, 0FFFFFFFFh
.text:0000000000401808 and r11d, ebx
.text:000000000040180B mov r9d, r10d
.text:000000000040180E and r9d, r11d
.text:0000000000401811 xor r10d, r11d
.text:0000000000401814 or r9d, r10d
.text:0000000000401817 mov r10d, eax
.text:000000000040181A xor r10d, 0F4761DECh
.text:0000000000401821 and r10d, 0FFFFFFFFh
.text:0000000000401828 mov r11d, eax
.text:000000000040182B xor r11d, 0FFFFFFFFh
.text:0000000000401832 and r11d, 0F4761DECh
.text:0000000000401839 or r10d, r11d
.text:000000000040183C mov r11d, r9d
.text:000000000040183F xor r11d, 0FFFFFFFFh
.text:0000000000401843 and r11d, 5D947950h
.text:000000000040184A mov ebx, eax
.text:000000000040184C xor ebx, 5D947950h
.text:0000000000401852 mov r14d, r9d
.text:0000000000401855 and r14d, ebx
.text:0000000000401858 mov r12d, r10d
.text:000000000040185B xor r12d, 0FFFFFFFFh
.text:000000000040185F and r12d, 5D947950h
.text:0000000000401866 and r10d, ebx
.text:0000000000401869 or r11d, r14d
.text:000000000040186C or r12d, r10d
.text:000000000040186F xor r11d, r12d
.text:0000000000401872 xor r11d, 0FFFFFFFFh
.text:0000000000401876 xor r9d, 0FFFFFFFFh
.text:000000000040187A mov r10d, eax
.text:000000000040187D xor r10d, 0DD0D38D2h
.text:0000000000401884 or r11d, r9d
.text:0000000000401887 or r10d, 0DD0D38D2h
.text:000000000040188E xor r11d, 0FFFFFFFFh
.text:0000000000401892 and r11d, r10d
.text:0000000000401895 mov r9d, r15d
.text:0000000000401898 xor r9d, 0FFFFFFFFh
.text:000000000040189C and r9d, 0C5CC33F1h
.text:00000000004018A3 mov r10d, eax
.text:00000000004018A6 xor r10d, 0C5CC33F1h
.text:00000000004018AD and r15d, r10d
.text:00000000004018B0 mov ebx, eax
.text:00000000004018B2 xor ebx, 0FFFFFFFFh
.text:00000000004018B8 and ebx, 0C5CC33F1h
.text:00000000004018BE and r10d, 0FFFFFFFFh
.text:00000000004018C5 or r9d, r15d
.text:00000000004018C8 or ebx, r10d
.text:00000000004018CB xor r9d, ebx
.text:00000000004018CE mov r10d, ecx
.text:00000000004018D1 xor r10d, 0FFFFFFFFh
.text:00000000004018D5 mov ebx, r9d
.text:00000000004018D8 and ebx, r10d
.text:00000000004018DB xor r9d, 0FFFFFFFFh
.text:00000000004018DF mov r10d, ecx
.text:00000000004018E2 and r10d, r9d
.text:00000000004018E5 or ebx, r10d
.text:00000000004018E8 xor ebx, 0FFFFFFFFh
.text:00000000004018EB xor ecx, 0FFFFFFFFh
.text:00000000004018EE mov r9d, eax
.text:00000000004018F1 xor r9d, 76AD7BE8h
.text:00000000004018F8 or ebx, ecx
.text:00000000004018FA or r9d, 76AD7BE8h
.text:0000000000401901 xor ebx, 0FFFFFFFFh
.text:0000000000401904 and ebx, r9d
.text:0000000000401907 mov ecx, esi
.text:0000000000401909 xor ecx, 0FFFFFFFFh
.text:000000000040190C mov r9d, edx
.text:000000000040190F xor r9d, 0FFFFFFFFh
.text:0000000000401913 mov r10d, eax
.text:0000000000401916 xor r10d, 4A20E799h
.text:000000000040191D or ecx, r9d
.text:0000000000401920 or r10d, 4A20E799h
.text:0000000000401927 xor ecx, 0FFFFFFFFh
.text:000000000040192A and ecx, r10d
.text:000000000040192D mov r9d, esi
.text:0000000000401930 xor r9d, 0FFFFFFFFh
.text:0000000000401934 and r9d, 11AA83CDh
.text:000000000040193B mov r10d, eax
.text:000000000040193E xor r10d, 11AA83CDh
.text:0000000000401945 and esi, r10d
.text:0000000000401948 mov r14d, edx
.text:000000000040194B xor r14d, 0FFFFFFFFh
.text:000000000040194F and r14d, 11AA83CDh
.text:0000000000401956 and edx, r10d
.text:0000000000401959 or r9d, esi
.text:000000000040195C or r14d, edx
.text:000000000040195F xor r9d, r14d
.text:0000000000401962 mov edx, ecx
.text:0000000000401964 xor edx, 0FFFFFFFFh
.text:0000000000401967 mov esi, r9d
.text:000000000040196A xor esi, 0FFFFFFFFh
.text:000000000040196D mov r10d, eax
.text:0000000000401970 xor r10d, 0C4E327DEh
.text:0000000000401977 mov r14d, edx
.text:000000000040197A and r14d, 0C4E327DEh
.text:0000000000401981 and ecx, r10d
.text:0000000000401984 mov r15d, esi
.text:0000000000401987 and r15d, 0C4E327DEh
.text:000000000040198E and r9d, r10d
.text:0000000000401991 or r14d, ecx
.text:0000000000401994 or r15d, r9d
.text:0000000000401997 xor r14d, r15d
.text:000000000040199A or edx, esi
.text:000000000040199C xor edx, 0FFFFFFFFh
.text:000000000040199F or r10d, 0C4E327DEh
.text:00000000004019A6 and edx, r10d
.text:00000000004019A9 or r14d, edx
.text:00000000004019AC mov ecx, ebx
.text:00000000004019AE xor ecx, 0FFFFFFFFh
.text:00000000004019B1 mov edx, r11d
.text:00000000004019B4 xor edx, ecx
.text:00000000004019B6 and edx, r11d
.text:00000000004019B9 mov ecx, r11d
.text:00000000004019BC xor ecx, 0FFFFFFFFh
.text:00000000004019BF and ecx, 0FE1D374Fh
.text:00000000004019C5 mov esi, eax
.text:00000000004019C7 xor esi, 0FE1D374Fh
.text:00000000004019CD and r11d, esi
.text:00000000004019D0 mov r9d, ebx
.text:00000000004019D3 xor r9d, 0FFFFFFFFh
.text:00000000004019D7 and r9d, 0FE1D374Fh
.text:00000000004019DE and ebx, esi
.text:00000000004019E0 or ecx, r11d
.text:00000000004019E3 or r9d, ebx
.text:00000000004019E6 xor ecx, r9d
.text:00000000004019E9 mov esi, edx
.text:00000000004019EB and esi, ecx
.text:00000000004019ED xor edx, ecx
.text:00000000004019EF or esi, edx
.text:00000000004019F1 mov ecx, r14d
.text:00000000004019F4 xor ecx, 0FFFFFFFFh
.text:00000000004019F7 and ecx, 0FFFFFFFFh
.text:00000000004019FD mov edx, eax
.text:00000000004019FF xor edx, 0FFFFFFFFh
.text:0000000000401A05 mov r9d, r14d
.text:0000000000401A08 and r9d, edx
.text:0000000000401A0B or ecx, r9d
.text:0000000000401A0E xor ecx, 0FFFFFFFFh
.text:0000000000401A11 mov edx, esi
.text:0000000000401A13 xor edx, ecx
.text:0000000000401A15 and edx, esi
.text:0000000000401A17 mov ecx, esi
.text:0000000000401A19 xor ecx, 0FFFFFFFFh
.text:0000000000401A1C and ecx, 0FFFFFFFFh
.text:0000000000401A22 mov r9d, eax
.text:0000000000401A25 xor r9d, 0FFFFFFFFh
.text:0000000000401A2C and esi, r9d
.text:0000000000401A2F or ecx, esi
.text:0000000000401A31 xor r14d, 0FFFFFFFFh
.text:0000000000401A35 xor ecx, 0FFFFFFFFh
.text:0000000000401A38 xor eax, 62168BC8h
.text:0000000000401A3D or r14d, ecx
.text:0000000000401A40 or eax, 62168BC8h
.text:0000000000401A45 xor r14d, 0FFFFFFFFh
.text:0000000000401A49 and r14d, eax
.text:0000000000401A4C mov eax, edx
.text:0000000000401A4E and eax, r14d
.text:0000000000401A51 xor edx, r14d
.text:0000000000401A54 or eax, edx
.text:0000000000401A56 mov rdi, [rbp+var_30]
.text:0000000000401A5A movsxd r8, [rbp+var_40]
.text:0000000000401A5E mov [rdi+r8], al
.text:0000000000401A62 xor eax, eax
.text:0000000000401A64 mov ecx, 17141EDCh
.text:0000000000401A69 mov edx, 957C3CFAh
.text:0000000000401A6E mov esi, 39D401DFh
.text:0000000000401A73 mov edi, [rbp+var_40]
.text:0000000000401A76 add esi, 0
.text:0000000000401A79 sub esi, edi
.text:0000000000401A7B sub esi, 39D401DFh
.text:0000000000401A81 mov edi, eax
.text:0000000000401A83 sub edi, 0ABF5E842h
.text:0000000000401A89 sub edi, 8F128D86h
.text:0000000000401A8F add edi, 0ABF5E842h
.text:0000000000401A95 add esi, 974BFD68h
.text:0000000000401A9B add esi, edi
.text:0000000000401A9D sub esi, 974BFD68h
.text:0000000000401AA3 add edx, 0
.text:0000000000401AA6 sub edx, esi
.text:0000000000401AA8 sub edx, 957C3CFAh
.text:0000000000401AAE add ecx, 0
.text:0000000000401AB1 sub ecx, edx
.text:0000000000401AB3 sub ecx, 17141EDCh
.text:0000000000401AB9 mov edx, eax
.text:0000000000401ABB sub edx, 1
.text:0000000000401ABE add edx, 0
.text:0000000000401AC1 sub ecx, 52A58E3h
.text:0000000000401AC7 add ecx, edx
.text:0000000000401AC9 add ecx, 52A58E3h
.text:0000000000401ACF mov edx, eax
.text:0000000000401AD1 sub edx, 278C624Eh
.text:0000000000401AD7 sub edx, ecx
.text:0000000000401AD9 add edx, 278C624Eh
.text:0000000000401ADF sub eax, 3B1CEF0Ch
.text:0000000000401AE4 sub edx, eax
.text:0000000000401AE6 sub edx, 0F9FA2AF8h
.text:0000000000401AEC sub edx, 8F128D86h
.text:0000000000401AF2 add edx, 0F9FA2AF8h
.text:0000000000401AF8 sub edx, 915413BCh
.text:0000000000401AFE sub edx, 3B1CEF0Ch
.text:0000000000401B04 add edx, 915413BCh
.text:0000000000401B0A mov [rbp+var_40], edx
.text:0000000000401B0D jmp loc_4011AE
.text:0000000000401B12 ; ---------------------------------------------------------------------------
.text:0000000000401B12
.text:0000000000401B12 loc_401B12: ; CODE XREF: encrypt(uchar *,char *)+34↑j
.text:0000000000401B12 add rsp, 18h
.text:0000000000401B16 pop rbx
.text:0000000000401B17 pop r12
.text:0000000000401B19 pop r13
.text:0000000000401B1B pop r14
.text:0000000000401B1D pop r15
.text:0000000000401B1F pop rbp
.text:0000000000401B20 retn
.text:0000000000401B20 ; } // starts at 401180
.text:0000000000401B20 _Z7encryptPhPc endp

可以观察到,执行指令替换后的程序运算过程复杂程度非常高。

随机控制流(Random Control Flow)

是虚假控制流的一种变体,通过克隆基本块以及添加随机跳转(跳转到功能相同的两个基本块之一)来混淆控制流。由于不存在不可待基本块和不透明谓词,因此用于去除虚假控制流的手段无效。随机的跳转和冗余的不可达基本块导致了大量垃圾代码,干扰分析,并且 rdrand 指令可以干扰某些符号执行引擎的分析。

随机控制流也以基本块为单位进行混淆。

步骤

  • 基本块拆分。同虚假控制流。
  • 基本块克隆。这里可以对基本块进行编译,但不能修改基本块的功能。这里需要修复逃逸变量,因为任何执行流都可能执行到。
  • 构造随机跳转。把生成随机数的指令插入到入口块,并在入口块后插入基于随机数的随机跳转指令。其中随机数指令可以使用 LLVM 的内置函数 rdrand。然后插入随机跳转。并对随机变量进行等价变换使其更加复杂。
  • 构造虚假随机跳转。构造克隆块和原块互相之间和两者到结束块的跳转,但控制跳转时判断的变量,使克隆和原接你快都跳转到结尾的块。

实现

新增 Transform/src/RandomControlFlow.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Instructions.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/IntrinsicsX86.h"
#include "SplitBasicBlock.h"
#include "Utils.h"
#include <vector>
#include <cstdlib>
#include <ctime>
using std::vector;
using namespace llvm;

// 混淆次数,混淆次数越多混淆结果越复杂
static cl::opt<int> obfuTimes("rcf_loop", cl::init(1), cl::desc("Obfuscate a function <bcf_loop> time(s)."));

namespace{
struct RandomControlFlow : public FunctionPass{
static char ID;

RandomControlFlow() : FunctionPass(ID){
srand(time(NULL));
}

bool runOnFunction(Function &F);

// 创建一组等效于 origVar 的指令
Value* alterVal(Value *origVar,BasicBlock *insertAfter);

void insertRandomBranch(Value *randVar, BasicBlock *ifTrue, BasicBlock *ifFalse, BasicBlock *insertAfter);

// 以基本块为单位进行随机控制流混淆
bool randcf(BasicBlock *BB);

llvm::LLVMContext *CONTEXT;

};
}


bool RandomControlFlow::runOnFunction(Function &F){
INIT_CONTEXT(F);

for(int i = 0;i < obfuTimes;i ++){
vector<BasicBlock*> origBB;
for(BasicBlock &BB : F){
origBB.push_back(&BB);
}
for(BasicBlock *BB : origBB){
randcf(BB);
}
}
return true;
}

void RandomControlFlow::insertRandomBranch(Value *randVar, BasicBlock *ifTrue, BasicBlock *ifFalse, BasicBlock *insertAfter){
// 对随机数进行等价变换
Value *alteredRandVar = alterVal(randVar, insertAfter);
Value *randMod2 = BinaryOperator::Create(Instruction::And, alteredRandVar, CONST_I32(1), "", insertAfter);
ICmpInst *condition = new ICmpInst(*insertAfter, ICmpInst::ICMP_EQ, randMod2, CONST_I32(1));
BranchInst::Create(ifTrue, ifFalse, condition, insertAfter);
}

bool RandomControlFlow::randcf(BasicBlock *BB){
// 拆分得到 entryBB, bodyBB, endBB
// 其中所有的 PHI 指令都在 entryBB(如果有的话)
// endBB 只包含一条终结指令
BasicBlock *entryBB = BB;
BasicBlock *bodyBB = entryBB->splitBasicBlock(BB->getFirstNonPHIOrDbgOrLifetime(), "bodyBB");
BasicBlock *endBB = bodyBB->splitBasicBlock(bodyBB->getTerminator(), "endBB");
BasicBlock *cloneBB = createCloneBasicBlock(bodyBB);

// 在 entryBB 后插入随机跳转,使其能够随机跳转到第 bodyBB 或其克隆基本块 cloneBB
entryBB->getTerminator()->eraseFromParent();
Function *rdrand = Intrinsic::getDeclaration(entryBB->getModule(), Intrinsic::x86_rdrand_32);
CallInst *randVarStruct = CallInst::Create(rdrand->getFunctionType(), rdrand, "", entryBB);
// 通过 rdrand 内置函数获取随机数
Value *randVar = ExtractValueInst::Create(randVarStruct, 0, "", entryBB);
insertRandomBranch(randVar, bodyBB, cloneBB, entryBB);

// 添加 bodyBB 到 bodyBB.clone 的虚假随机跳转
bodyBB->getTerminator()->eraseFromParent();
insertRandomBranch(randVar, endBB, cloneBB, bodyBB);
// 添加 bodyBB.clone 到 bodyBB 的虚假随机跳转
cloneBB->getTerminator()->eraseFromParent();
insertRandomBranch(randVar, bodyBB, endBB, cloneBB);

return true;
}

Value* RandomControlFlow::alterVal(Value *startVar,BasicBlock *insertAfter){
uint32_t code = rand() % 3;
Value *result;
if(code == 0){
//x = x * (x + 1) - x^2
BinaryOperator *op1 = BinaryOperator::Create(Instruction::Add, startVar, CONST_I32(1), "", insertAfter);
BinaryOperator *op2 = BinaryOperator::Create(Instruction::Mul, startVar, op1, "", insertAfter);
BinaryOperator *op3 = BinaryOperator::Create(Instruction::Mul, startVar, startVar, "", insertAfter);
BinaryOperator *op4 = BinaryOperator::Create(Instruction::Sub, op2, op3, "", insertAfter);
result = op4;
}else if(code == 1){
//x = 3 * x * (x - 2) - 3 * x^2 + 7 * x
BinaryOperator *op1 = BinaryOperator::Create(Instruction::Mul, startVar, CONST_I32(3), "", insertAfter);
BinaryOperator *op2 = BinaryOperator::Create(Instruction::Sub, startVar, CONST_I32(2), "", insertAfter);
BinaryOperator *op3 = BinaryOperator::Create(Instruction::Mul, op1, op2, "", insertAfter);
BinaryOperator *op4 = BinaryOperator::Create(Instruction::Mul, startVar, startVar, "", insertAfter);
BinaryOperator *op5 = BinaryOperator::Create(Instruction::Mul, op4, CONST_I32(3), "", insertAfter);
BinaryOperator *op6 = BinaryOperator::Create(Instruction::Mul, startVar, CONST_I32(7), "", insertAfter);
BinaryOperator *op7 = BinaryOperator::Create(Instruction::Sub, op3, op5, "", insertAfter);
BinaryOperator *op8 = BinaryOperator::Create(Instruction::Add, op6, op7, "", insertAfter);
result = op8;
}else if(code == 2){
//x = (x - 1) * (x + 3) - (x + 4) * (x - 3) - 9
BinaryOperator *op1 = BinaryOperator::Create(Instruction::Sub, startVar, CONST_I32(1), "", insertAfter);
BinaryOperator *op2 = BinaryOperator::Create(Instruction::Add, startVar, CONST_I32(3), "", insertAfter);
BinaryOperator *op3 = BinaryOperator::Create(Instruction::Add, startVar, CONST_I32(4), "", insertAfter);
BinaryOperator *op4 = BinaryOperator::Create(Instruction::Sub, startVar, CONST_I32(3), "", insertAfter);
BinaryOperator *op5 = BinaryOperator::Create(Instruction::Mul, op1, op2, "", insertAfter);
BinaryOperator *op6 = BinaryOperator::Create(Instruction::Mul, op3, op4, "", insertAfter);
BinaryOperator *op7 = BinaryOperator::Create(Instruction::Sub, op5, op6, "", insertAfter);
BinaryOperator *op8 = BinaryOperator::Create(Instruction::Sub, op7, CONST_I32(9), "", insertAfter);
result = op8;
}
return result;
}

char RandomControlFlow::ID = 0;
static RegisterPass<RandomControlFlow> X("rcf", "Add random control flow to each function.");

修改 CMakeList.txt,添加新增的文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
project(OLLVM++)
cmake_minimum_required(VERSION 3.13.4)
find_package(LLVM REQUIRED CONFIG)

list(APPEND CMAKE_MODULE_PATH "${LLVM_CMAKE_DIR}")
include(AddLLVM)
include_directories("./include") # 包含 ./include 文件夹中的头文件
separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
add_definitions(${LLVM_DEFINITIONS_LIST})
include_directories(${LLVM_INCLUDE_DIRS})
add_llvm_library( LLVMObfuscator MODULE
src/HelloWorld.cpp
src/SplitBasicBlock.cpp
src/Flattening.cpp
src/Utils.cpp
src/BogusControlFlow.cpp
src/Substitution.cpp
src/RandomControlFlow.cpp
)

修改 test.sh,添加新的测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
cd ./Build
cmake ../Transforms
make
cd ../Test
clang -S -emit-llvm TestProgram.cpp -o IR/TestProgram.ll

echo "-----Hello World Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -hlw -S IR/TestProgram.ll -o IR/TestProgram_hlw.ll
clang IR/TestProgram_hlw.ll -o Bin/TestProgram_hlw
./Bin/TestProgram_hlw flag{s1mpl3_11vm_d3m0}

echo "-----Split Basic Block Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -split --split_num 5 -S IR/TestProgram.ll -o IR/TestProgram_split.ll
clang IR/TestProgram_split.ll -o Bin/TestProgram_split
./Bin/TestProgram_split flag{s1mpl3_11vm_d3m0}

echo "-----Control Flow Flattening Test-----"
opt -enable-new-pm=0 -lowerswitch -S IR/TestProgram.ll -o IR/TestProgram_lowerswitch.ll
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -fla IR/TestProgram_lowerswitch.ll -o IR/TestProgram_fla.ll
clang IR/TestProgram_fla.ll -o Bin/TestProgram_fla
./Bin/TestProgram_fla flag{s1mpl3_11vm_d3m0}

echo "-----Bogus Control Flow Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -bcf -bcf_loop 3 IR/TestProgram.ll -o IR/TestProgram_bcf.ll
clang IR/TestProgram_bcf.ll -o Bin/TestProgram_bcf
./Bin/TestProgram_bcf flag{s1mpl3_11vm_d3m0}

echo "-----Instruction Substitution Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -sub -sub_loop 3 IR/TestProgram.ll -o IR/TestProgram_sub.ll
clang IR/TestProgram_sub.ll -o Bin/TestProgram_sub
./Bin/TestProgram_sub flag{s1mpl3_11vm_d3m0}

echo "-----Random Control Flow Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -rcf -rcf_loop 1 IR/TestProgram.ll -o IR/TestProgram_rcf.ll
llc -filetype=obj -mattr=+rdrnd --relocation-model=pic IR/TestProgram_rcf.ll -o Bin/TestProgram_rcf.o
clang Bin/TestProgram_rcf.o -o Bin/TestProgram_rcf
./Bin/TestProgram_rcf flag{s1mpl3_11vm_d3m0}

test.sh 中,llc -filetype=obj -mattr=+rdrnd --relocation-model=pic IR/TestProgram_rcf.ll -o Bin/TestProgram_rcf.o手动编译,否则无法找到 rdrand 指令,--relocation-model是必须的,否则会出现以下错误:

1
2
/usr/bin/ld: Bin/TestProgram_rcf.o: relocation R_X86_64_32 against symbol `input' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value

完成后的控制流与虚假控制流类似,但是执行流程是随机的。

常量替代(Constant Substitution)

将二元运算指令中使用的常熟,替换为等效但更加复杂的表达式,来达到混淆计算过程或某些特殊常量的目的(比如 TEA 加密中使用的常量 0x9e3779b 可以替换为 12167*16715 + 18858*32146 - 643678438)。

目前只实现了整数常量的替换,因为浮点替换会造成舍入。整数替换与位数有关,目前只实现了 32 位整数的替换,可以拓展至任意位。

常量替代可以进一步扩展为常量数组替代和字符串替代,常量数组替代可以抹去 AES 或 DES 等加密算法中的特征数组,字符串替代可以防止分析者通过字符串定位关键代码。

思路比较简单,对操作数类型为 32 位整数的指令进行替换,替换方案有两种:

  • 线性替换。val -> ax + by + c,a, b 为随机常量,x, y 为随机全局变量,c = val - (ax + by)
  • 按位运算替换。val -> (x << 5 | y >> 3) ^ c,x, y 为随机全局变量,c = val ^ (x << 5 | y >> 3)

实现

添加 Transform/src/ConstantSubstitution.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
#include "llvm/IR/Function.h"
#include "llvm/IR/Module.h"
#include "llvm/Pass.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/Instructions.h"
#include "llvm/Support/CommandLine.h"
#include <vector>
#include <cstdlib>
#include <ctime>
#include "Utils.h"
using namespace llvm;
using std::vector;

#define MAX_RAND 32767
#define NUMBER_CONST_SUBST 2

// 混淆次数,混淆次数越多混淆结果越复杂
static cl::opt<int> obfuTimes("csub_loop", cl::init(1), cl::desc("Obfuscate a function <obfu_time> time(s)."));

namespace{

class ConstantSubstitution : public FunctionPass {
public:
static char ID;

ConstantSubstitution() : FunctionPass(ID) {
srand(time(NULL));
}

bool runOnFunction(Function &F);

// 对单个指令 BI 进行替换
void substitute(BinaryOperator *BI);

// 线性替换:val -> ax + by + c
// 其中 val 为原常量 a, b 为随机常量 x, y 为随机全局变量 c = val - (ax + by)
void linearSubstitute(BinaryOperator *BI, int i);

// 按位运算替换:val -> (x << 5 | y >> 3) ^ c
// 其中 val 为原常量x, y 为随机全局变量 c = val ^ (x << 5 | y >> 3)
void bitwiseSubstitute(BinaryOperator *BI, int i);

llvm::LLVMContext *CONTEXT;
};
}

bool ConstantSubstitution::runOnFunction(Function &F){
INIT_CONTEXT(F);
for(int i = 0;i < obfuTimes;i ++){
for(BasicBlock &BB : F){
vector<Instruction*> origInst;
for(Instruction &I : BB){
origInst.push_back(&I);
}
for(Instruction *I : origInst){
// 只对二元运算指令中的常量进行替换
if(BinaryOperator *BI = dyn_cast<BinaryOperator>(I)){
// 仅对整数进行替换
if(BI->getType()->isIntegerTy(32)){
substitute(BI);
}
}
}
}
}
}

void ConstantSubstitution::linearSubstitute(BinaryOperator *BI, int i){
Module &M = *BI->getModule();
ConstantInt *val = cast<ConstantInt>(BI->getOperand(i));
// 随机生成 x, y, a, b
int randX = rand() % MAX_RAND, randY = rand() % MAX_RAND;
int randA = rand() % MAX_RAND, randB = rand() % MAX_RAND;
// 计算 c = val - (ax + by)
APInt c = val->getValue() - (randA * randX + randB * randY);
ConstantInt *constX = ConstantInt::get(val->getType(), randX);
ConstantInt *constY = ConstantInt::get(val->getType(), randY);
ConstantInt *constA = ConstantInt::get(val->getType(), randA);
ConstantInt *constB = ConstantInt::get(val->getType(), randB);
ConstantInt *constC = (ConstantInt*)ConstantInt::get(val->getType(), c);
// 创建全局变量 x, y
GlobalVariable *x = new GlobalVariable(M, val->getType(), false, GlobalValue::PrivateLinkage, constX, "x");
GlobalVariable *y = new GlobalVariable(M, val->getType(), false, GlobalValue::PrivateLinkage, constY, "y");
LoadInst *opX = new LoadInst(val->getType(), x, "", BI);
LoadInst *opY = new LoadInst(val->getType(), y, "", BI);
// 构造 op = ax + by + c 表达式
BinaryOperator *op1 = BinaryOperator::CreateMul(opX, constA, "", BI);
BinaryOperator *op2 = BinaryOperator::CreateMul(opY, constB, "", BI);
BinaryOperator *op3 = BinaryOperator::CreateAdd(op1, op2, "", BI);
BinaryOperator *op4 = BinaryOperator::CreateAdd(op3, constC, "", BI);
// 用表达式 ax + by + c 替换原常量操作数
BI->setOperand(i, op4);
}

void ConstantSubstitution::bitwiseSubstitute(BinaryOperator *BI, int i){
Module &M = *BI->getModule();
ConstantInt *val = cast<ConstantInt>(BI->getOperand(i));
// 随机生成 x, y
unsigned randX = rand() % MAX_RAND, randY = rand() % MAX_RAND;
// 计算 c = val ^ (x << 5 | y >> 3)
APInt c = val->getValue() ^ (randX << 5 | randY >> 3);
ConstantInt *constX = ConstantInt::get(val->getType(), randX);
ConstantInt *constY = ConstantInt::get(val->getType(), randY);
ConstantInt *const5 = ConstantInt::get(val->getType(), 5);
ConstantInt *const3 = ConstantInt::get(val->getType(), 3);
ConstantInt *constC = (ConstantInt*)ConstantInt::get(val->getType(), c);
// 创建全局变量 x, y
GlobalVariable *x = new GlobalVariable(M, val->getType(), false, GlobalValue::PrivateLinkage, constX, "x");
GlobalVariable *y = new GlobalVariable(M, val->getType(), false, GlobalValue::PrivateLinkage, constY, "y");
LoadInst *opX = new LoadInst(val->getType(), x, "", BI);
LoadInst *opY = new LoadInst(val->getType(), y, "", BI);
// 构造 op = (x << 5 | y >> 3) ^ c 表达式
BinaryOperator *op1 = BinaryOperator::CreateShl(opX, const5, "", BI);
BinaryOperator *op2 = BinaryOperator::CreateLShr(opY, const3, "", BI);
BinaryOperator *op3 = BinaryOperator::CreateOr(op1, op2, "", BI);
BinaryOperator *op4 = BinaryOperator::CreateXor(op3, constC, "", BI);
// 用表达式 (x << 5 | y >> 3) ^ c 替换原常量操作数
BI->setOperand(i, op4);
}

void ConstantSubstitution::substitute(BinaryOperator *BI){
int operandNum = BI->getNumOperands();
for(int i = 0;i < operandNum;i ++){
if(isa<ConstantInt>(BI->getOperand(i))){
int choice = rand() % NUMBER_CONST_SUBST;
switch (choice) {
case 0:
linearSubstitute(BI, i);
break;
case 1:
bitwiseSubstitute(BI, i);
break;
default:
break;
}
}
}
}

char ConstantSubstitution::ID = 0;
static RegisterPass<ConstantSubstitution> X("csub", "Replace a constant value with equivalent instructions.");

在 CMakeList.txt 中添加新的源文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
project(OLLVM++)
cmake_minimum_required(VERSION 3.13.4)
find_package(LLVM REQUIRED CONFIG)

list(APPEND CMAKE_MODULE_PATH "${LLVM_CMAKE_DIR}")
include(AddLLVM)
include_directories("./include") # 包含 ./include 文件夹中的头文件
separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS})
add_definitions(${LLVM_DEFINITIONS_LIST})
include_directories(${LLVM_INCLUDE_DIRS})
add_llvm_library( LLVMObfuscator MODULE
src/HelloWorld.cpp
src/SplitBasicBlock.cpp
src/Flattening.cpp
src/Utils.cpp
src/BogusControlFlow.cpp
src/Substitution.cpp
src/RandomControlFlow.cpp
src/ConstantSubstitution.cpp
)

在 test.sh 中添加新的测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
cd ./Build
cmake ../Transforms
make
cd ../Test
clang -S -emit-llvm TestProgram.cpp -o IR/TestProgram.ll

echo "-----Hello World Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -hlw -S IR/TestProgram.ll -o IR/TestProgram_hlw.ll
clang IR/TestProgram_hlw.ll -o Bin/TestProgram_hlw
./Bin/TestProgram_hlw flag{s1mpl3_11vm_d3m0}

echo "-----Split Basic Block Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -split --split_num 5 -S IR/TestProgram.ll -o IR/TestProgram_split.ll
clang IR/TestProgram_split.ll -o Bin/TestProgram_split
./Bin/TestProgram_split flag{s1mpl3_11vm_d3m0}

echo "-----Control Flow Flattening Test-----"
opt -enable-new-pm=0 -lowerswitch -S IR/TestProgram.ll -o IR/TestProgram_lowerswitch.ll
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -fla IR/TestProgram_lowerswitch.ll -o IR/TestProgram_fla.ll
clang IR/TestProgram_fla.ll -o Bin/TestProgram_fla
./Bin/TestProgram_fla flag{s1mpl3_11vm_d3m0}

echo "-----Bogus Control Flow Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -bcf -bcf_loop 3 IR/TestProgram.ll -o IR/TestProgram_bcf.ll
clang IR/TestProgram_bcf.ll -o Bin/TestProgram_bcf
./Bin/TestProgram_bcf flag{s1mpl3_11vm_d3m0}

echo "-----Instruction Substitution Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -sub -sub_loop 3 IR/TestProgram.ll -o IR/TestProgram_sub.ll
clang IR/TestProgram_sub.ll -o Bin/TestProgram_sub
./Bin/TestProgram_sub flag{s1mpl3_11vm_d3m0}

echo "-----Random Control Flow Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -rcf -rcf_loop 1 IR/TestProgram.ll -o IR/TestProgram_rcf.ll
llc -filetype=obj -mattr=+rdrnd --relocation-model=pic IR/TestProgram_rcf.ll -o Bin/TestProgram_rcf.o
clang Bin/TestProgram_rcf.o -o Bin/TestProgram_rcf
./Bin/TestProgram_rcf flag{s1mpl3_11vm_d3m0}

echo "-----Constant Substitution Test-----"
opt -enable-new-pm=0 -load ../Build/LLVMObfuscator.so -csub IR/TestProgram.ll -o IR/TestProgram_csub.ll
clang IR/TestProgram_csub.ll -o Bin/TestProgram_csub
./Bin/TestProgram_csub flag{s1mpl3_11vm_d3m0}

待实现:

  • 任意位整数替换
  • 替换数组
  • 替换字符串
作者

lll

发布于

2022-08-30

更新于

2023-03-24

许可协议