参考 Fuzzing101 with LibAFL - Part I: Fuzzing Xpdf1 和 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I2 做一下笔记。libafl 的自由度相当高,我觉得学习路线会比较陡峭,这一次我就不求甚解一波。

复现

先下载 xpdf

cd fuzzing-101-solutions/exercise-1
wget https://dl.xpdfreader.com/old/xpdf-3.02.tar.gz
tar xvf xpdf-3.02.tar.gz
rm xpdf-3.02.tar.gz
mv xpdf-3.02 xpdf

build.rs 本质上是做了如下工作:

# these are example commands that will be executed automatically by build.rs
# and were taken almost verbatim from Fuzzing101's README
cd fuzzing-101-solutions/exercise-1/xpdf
make clean
rm -rf install 
export LLVM_CONFIG=llvm-config-15
CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --prefix=./install
make
make install

具体实现方法之后再看,先照抄。

复制完代码之后发现默认的 libafl 版本是 0.10.1,编译不起来就改成了 0.13.2,结果发现好多东西都变了,比如 libafl::bolts 变成了 libafl_bolts,还有一个 Executor :

We deleted TimeoutExecutor and TimeoutForkserverExecutor and make it mandatory for InProcessExecutor and ForkserverExecutor to have the timeout. Now InProcessExecutor and ForkserverExecutor have the default timeout of 5 seconds.


参考官方代码改了一堆问题之后可以编译运行了:

cd exercise-1
cargo build --release
../target/release/exercise-one-solution

如果我们要运行其他程序的话,修改 executor 的参数就行了,例如这里的参数为:

let mut executor = ForkserverExecutor::builder()
  .program("./xpdf/install/bin/pdftotext")
  .parse_afl_cmdline(["@@"])
  .coverage_map_size(MAP_SIZE)
  .build(tuple_list!(time_observer, edges_observer))
  .unwrap();

流程

研究一下流程:

  1. Corpus
    • corpus_dirs:种子目录;
    • input_corpus:保存在内存中的语料库;
    • timeouts_corpus:满足需求条件的语料库;
  2. Observer
    • time_observer:记录执行时间;
    • edges_observer:记录执行边的覆盖率信息;
  3. Feedback
    • feedback:选择感兴趣的输入的反馈机制;
      • 组合 edges_observeredges_observer
    • objective:选择满足需求(超时或崩溃)输入的反馈机制;
  4. Monitor:跟踪所有模糊测试客户端
    • monitor:这里使用了 SimpleMonitor 向 terminal 发送报告;
  5. Event Manager
    • mgr核心三部件之一,这里使用 monitor 构建最简单的 SimpleEventManager
  6. State
    • state核心三部件之一,保存模糊测试时的一些必要信息;
      • 组合了 input_corpustimeouts_corpusfeedbackobjective
  7. Scheduler
    • scheduler:调度策略,作者使用 IndexesLenTimeMinimizerScheduler 选取最快最小的种子;
  8. Fuzzer:
    • fuzzer:核心三部件之一,生成种子,并处理执行后的状态和反馈;
      • 组合了 schedulerfeedbackobjective
  9. Executor
    • executor:执行器;
      • 指定运行的程序和参数;
      • 组合 time_observeredges_observer
  10. 加载语料库
  11. Mutator
    • mutator:变异器
  12. Stage
    • stage:对单个输入的操作,这里是使用 mutator 对输入做;
  13. 运行 Fuzzer
    • 组合了 stagesexecutorstatemgr

修改

运行一段时间之后,只有超时没有崩溃,这是因为作者只配置了 TimeoutFeedback,而高性能机器上在 timeout 之前就 crash 了,所以我建议还是把 CrashFeedback 加上。我们先理解一下原来的 Feedback 及其用法:

 
// A Feedback, in most cases, processes the information reported by one or more observers to
// decide if the execution is interesting. This one is composed of two Feedbacks using a logical
// OR.
//
// Due to the fact that TimeFeedback can never classify a testcase as interesting on its own,
// we need to use it alongside some other Feedback that has the ability to perform said
// classification. These two feedbacks are combined to create a boolean formula, i.e. if the
// input triggered a new code path, OR, false.
let mut feedback = feedback_or!(
    // New maximization map feedback (attempts to maximize the map contents) linked to the
    // edges observer. This one will track indexes, but will not track novelties,
    // i.e. new_tracking(... true, false).
    MaxMapFeedback::new(&edges_observer),
    // Time feedback, this one never returns true for is_interesting, However, it does keep
    // track of testcase execution time by way of its TimeObserver
    TimeFeedback::new(&time_observer)
);

我们可以看到,这里其实用到了两种 Feedback 的组合。根据注释可知,TimeFeedback 不能独自判断一个样例是否有趣,因此这里用了一个 feedback_or 宏,如果 MaxMapFeedback 判断是否触发了新路径则认为输入是有趣的。

我最开始眼花了,把 TimeFeedback 看成了 TimeoutFeedback,然而并不是。TimeFeedback 永远不会返回 True,但是它可以跟踪输入的执行时间。

光有趣还不够,我们还要保存一些符合我们要求的输入,例如在这里作者保存的是能触发超时的种子:

// A feedback is used to choose if an input should be added to the corpus or not. In the case
// below, we're saying that in order for a testcase's input to be added to the corpus, it must:
//   1: be a timeout
//        AND
//   2: have created new coverage of the binary under test
//
// The goal is to do similar deduplication to what AFL does
//
// The feedback_and_fast macro combines the two feedbacks with a fast AND operation, which
// means only enough feedback functions will be called to know whether or not the objective
// has been met, i.e. short-circuiting logic.
let mut objective =
    feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer));

这里作者通过 feedback_and_fast 建立了两个约束,一是要超时,二是要能发现新的路径,这样是为了执行与 AFL 类似的重复数据删除。

在最后,feedbackobjective 都被用在了 state 中:

//
// Component: State
//
 
// Creates a new State, taking ownership of all of the individual components during fuzzing.
//
// On the initial pass, setup_restarting_mgr_std returns (None, LlmpRestartingEventManager).
// On each successive execution (i.e. on a fuzzer restart), it returns the state from the prior
// run that was saved off in shared memory. The code below handles the initial None value
// by providing a default StdState. After the first restart, we'll simply unwrap the
// Some(StdState) returned from the call to setup_restarting_mgr_std
let mut state = StdState::new(
    // random number generator with a time-based seed
    StdRand::with_seed(current_nanos()),
    input_corpus,
    timeouts_corpus,
    // States of the feedbacks that store the data related to the feedbacks that should be
    // persisted in the State.
    &mut feedback,
    &mut objective,
)
.unwrap();

第一个参数是随机数生成器,第二个参数是语料库,第三个参数是保存符合目标的语料库的位置。在最后两个参数中,feedback 会记录有趣的种子,objective 会保存符合要求的种子。


好了,在理解作者的意图之后,接下来该怎么做就很明显了,除了超时之外,我们肯定还要考虑能导致崩溃的输入,显然 CrashFeedback 是符合我们要求的。那我们该怎么使用它呢?

继续参考官方的示例,它使用了 feedback_or_fast! 宏去同时选取触发崩溃和超时的种子:

// A feedback to choose if an input is a solution or not
let mut objective = feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new());

我们也照葫芦画瓢,引入相关的库后修改就好了。在修改之后运行 fuzz,我们可以看到成功保存了可以触发 crash 的输入。

在几天之后回顾这篇文章时,我发现我已经忘记了编译命令,这里记录一下:

cargo build --release
../target/release/exercise-one-solution

问题

那么接下来的问题是:

  1. 根据 state 的参数,超时和崩溃似乎保存在了同一目录下,按照 libafl 的设计哲学,这个目录保存的是符合我们要求的输入,那么是否能够分别指定崩溃和超时保存的目录呢?
  2. 在通过 feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new()); 保存符合要求的输入后,libafl 是怎么去重的呢?上文使用 feedback_and_fast!(TimeoutFeedback::new(), MaxMapFeedback::new(&edges_observer)); 通过是否发现新路径进行去重,而这里没有发现新路径但也会超时的情况,是否也会被保存呢?如果都会被保存的话,我们是否可以在 feedback_or_fast 之后添加一个 feedback_and_fastMaxMapFeedback 帮助我们去重呢?

上面的这些问题,随着学习路程的继续慢慢解答吧。


加速

我们可以使用持久模式而不是 forkserver 加速模糊测试。在 Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I 中,作者修改了 xpdf 的源码并编写了 harness.c,这里不多加描述。

由于我们要将 fuzzer 编译成库,因此接下来将上文的 main.rs 重命名为 lib.rs。接下来看一下和上面的流程相比有什么改动吧。

Observer

在编译 harness 的过程中,作者使用了 libafl_cc 而不是上文的 afl-clang-[fast|lto]。对于传统的 afl-clang-[fast|lto],libafl 可以根据 __AFL_SHM_ID 环境变量获取覆盖率信息,而对于 libafl_cc 则需要使用 libafl_targets 暴露 EDGES_MAP

let edges_observer =
    HitcountsMapObserver::new(unsafe { std_edges_map_observer("edges") }).track_indices();

Monitor

为了避免目标打印的输出和 fuzzer 的输出混淆,作者使用 MultiMonitor 替换 SimpleMonitor。MultiMonitor 可以展示和累计每个客户端的统计数据。

let monitor = MultiMonitor::new(|s| {
    println!("{}", s);
});

Event Manager

对于 MultiMonitor,使用的时候需要启动两个 fuzzer,按照作者的意思第一个开启的 fuzzer 也是客户端,但我觉得,这都开启端口听其他客户端的消息了,怎么看都是服务端吧:

let (state, mut mgr) = match setup_restarting_mgr_std(monitor, 1337, EventConfig::AlwaysUnique)
{
    Ok(res) => res,
    Err(err) => match err {
        Error::ShuttingDown => {
            return Ok(());
        }
        _ => {
            panic!("Failed to setup the restarting manager: {}", err);
        }
    },
};

Harness

这是 forkserver 不存在的一个部件,专为 InProcessExecutor 而构造的。其中的 libfuzzer_test_one_input 就是我们编写 harness 时的 LLVMFuzzerTestOneInput

let mut harness = |input: &BytesInput| {
    let target = input.target_bytes();
    let buffer = target.as_slice();
    libfuzzer_test_one_input(buffer);
    ExitKind::Ok
};

Executor

既然使用了持久模式,相应的 executor 也会发生变化,InProcessExecutor 相比 ForkserverExecutor 需要更多的组件:

let mut executor = InProcessExecutor::new(
    &mut harness,
    tuple_list!(edges_observer, time_observer),
    &mut fuzzer,
    &mut state,
    &mut mgr,
)
.unwrap();

具体的流程是什么样的还是之后慢慢研究吧。

运行 Fuzzer

这一部分主要是加入了类似于 __AFL_LOOP 的机制,确定重启次数以及设置可能的手动重启:

fuzzer
    .fuzz_loop_for(&mut stages, &mut executor, &mut state, &mut mgr, 10000)
    .unwrap();
 
// Since were using this fuzz_loop_for in a restarting scenario to only run for n iterations
// before exiting, we need to ensure we call on_restart() and pass it the state. This way, the
// state will be available in the next, respawned, iteration.
mgr.on_restart(&mut state).unwrap();

作者使用了 cargo make 机制来完成整个流程,通过 Makefile.toml 编写各个部分:

exercise-1.5/Makefile.toml
[tasks.clean]
dependencies = ["cargo-clean", "afl-clean", "clean-xpdf"]
 
[tasks.afl-clean]
script = '''
rm -rf .cur_input* timeouts fuzzer fuzzer.o libexerciseonepointfive.a
'''
 
[tasks.clean-xpdf]
cwd = "xpdf"
script = """
make --silent clean
rm -rf built-with-* ../build/*
"""
 
[tasks.cargo-clean]
command = "cargo"
args = ["clean"]
 
[tasks.rebuild]
dependencies = ["afl-clean", "clean-xpdf", "build-compilers", "build-xpdf", "build-fuzzer"]
 
[tasks.build-compilers]
script = """
cargo build --release
cp -f ../target/release/libexerciseonepointfive.a .
"""
 
[tasks.build-xpdf]
cwd = "build"
script = """
cmake ../xpdf -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=$(pwd)/../../target/release/compiler -DCMAKE_CXX_COMPILER=$(pwd)/../../target/release/compiler_pp
make
"""
 
[tasks.build-fuzzer]
script = """
../target/release/compiler_pp -I xpdf/goo -I xpdf/fofi -I xpdf/splash -I xpdf/xpdf -I xpdf -o fuzzer harness.cc build/*/*.a -lm -ldl -lpthread -lstdc++ -lgcc -lutil -lrt
"""

之后运行 cargo run rebuild 执行所有部分,并生成 fuzzer 文件。

最后在两个窗口中分别运行编译好的 fuzzer,最先运行的 fuzzer 会作为服务端。

总结

作者通过 libafl 编写了一整套模糊测试流程,可以实现基于 forkserver 的模糊测试和基于 persistent mode 的模糊测试。这篇笔记简单总结了 libafl 的堆叠过程与使用流程,足以见出 libafl 的高自由度。

Footnotes

  1. Fuzzing101 with LibAFL - Part I: Fuzzing Xpdf

  2. Fuzzing101 with LibAFL - Part I.V: Speed Improvements to Part I