syzkaller是Google团队开发的一款针对Linux内核进行模糊测试的开源工具,目前还在不断的维护之中。文章结尾有syzkaller的项目地址。

本文主要分为以下几个部分:

1.针对syzkaller的安装过程做一个解释说明;

2.简述syzkaller的工作过程;

3.部分源码的分析。

环境配置:

Ubuntu要求:64-bit Ubuntu14.04(因作者未在32位Ubuntu上进行测试);

1.Ubuntu上有svn(apt-get install subversion);

2.Ubuntu上有g++(apt-get install g++);

3.Ubuntu上有文本编辑器(如vim,apt-get install vim);

4.Ubuntu上有git(apt-get install git)。

若前趋条件全部具备,则可以开始搭建环境:

1.安装gcc

需要新版本的gcc eg. gcc-7.1.0而使用新版gcc中的coverage support

(安装GCC的先决条件,执行下列代码:

sudo apt-get install flex bison libc6-dev libc6-dev-i386 linux-libc-dev linux-libc-dev:i386 libgmp3-dev libmpfr-dev libmpc-dev

1.1 执行下列指令

(前趋条件2)($GCC为自定义目录(尽量为可写目录)):


svn checkout svn://gcc.gnu.org/svn/gcc/trunk $GCC

cd $GCC

svn ls -v ^/tags | grep gcc_7_1_0_release

svn up -r 247494

解释:

1.svn checkout svn://gcc.gnu.org/svn/gcc/trunk $GCC

从站点拉取最新的gcc(笔者拉取下来的是gcc_8.0);

2.cd $GCC

3.svn ls -v ^/tags | grep gcc_7_1_0_release

4.svn up -r 247494

版本回退,将新版本(版本号具体是多少不记得了,第一步做完后能看到具体版本号)回退到247494。

1.2 修复GCC中的错误,错误在tree.h中:


 diff --git a/gcc/tree.h b/gcc/tree.h

 index 3bca90a..fdaa7af 100644

 --- a/gcc/tree.h

 +++ b/gcc/tree.h

 @@ -897,8 +897,8 @@  extern void omp_clause_range_check_failed (const_tree, const char *, int,

  /* If this is true, we should insert a __cilk_detach call just before

     this function call.  */

  #define EXPR_CILK_SPAWN(NODE) \

-  (tree_check2 (NODE, __FILE__, __LINE__, __FUNCTION__, \

-                CALL_EXPR, AGGR_INIT_EXPR)->base.u.bits.unsigned_flag)

+  (TREE_CHECK2 (NODE, CALL_EXPR, \

+                AGGR_INIT_EXPR)->base.u.bits.unsigned_flag)

 /* In a RESULT_DECL, PARM_DECL and VAR_DECL, means that it is

    passed by invisible reference (and the TREE_TYPE is a pointer to the true

1.2.1 找到出错的文件:

cd $GCC/gcc

1.2.2 修改文件错误(我使用的是vim(前趋条件4))

 sudo vim tree.h

1.2.3 找到第900行:

 :900

1.2.4 将:


(tree_check2 (NODE, __FILE__, __LINE__, __FUNCTION__, \

                 CALL_EXPR, AGGR_INIT_EXPR)->base.u.bits.unsigned_flag)

改为:


 (

TREE_CHECK2 (NODE, CALL_EXPR, \

                 AGGR_INIT_EXPR)->base.u.bits.unsigned_flag)

保存。

1.3 Build GCC(在$GCC下):

 


mkdir build

 mkdir install

 cd build/

 ../configure --enable-languages=c,c++ --disable-bootstrap --enable-checking=no --with-gnu-as --with-gnu-ld --with-ld=/usr/bin/ld.bfd --disable-multilib --prefix=$GCC/install/

 make -j64

 make install

1.4 若安装成功,执行ls $GCC/install/bin/应该看到:


 c++  gcc-ar      gcov-tool                x86_64-pc-linux-gnu-gcc-7.1.0

 cpp  gcc-nm      x86_64-pc-linux-gnu-c++  x86_64-pc-linux-gnu-gcc-ar

 g++  gcc-ranlib  x86_64-pc-linux-gnu-g++  x86_64-pc-linux-gnu-gcc-nm

 gcc  gcov        x86_64-pc-linux-gnu-gcc  x86_64-pc-linux-gnu-gcc-ranlib

2 .安装待测内核Kernel:

2.1 从github上拉取待测内核的资源(前趋条件5)($KERNEL为自定义路径)(速度很慢,建议直接将源码拷贝到目录下):

git clone https://github.com/torvalds/linux.git $KERNEL

2.2 生成内核的配置:


cd $KERNEL

 make defconfig

 make kvmconfig

2.3 更改内核配置文件.config

(我用的vim在配置文件最前面插入了下列四行(开启可调试/测试的配置)):


 CONFIG_KCOV=y

 CONFIG_DEBUG_INFO=y

 CONFIG_KASAN=y

 CONFIG_KASAN_INLINE=y

2.4 重新生成配置

 make oldconfig

2.5 用之前安装的GCC编译linux内核:

make CC='$GCC/install/bin/gcc' -j64

2.6 执行ls $KERNEL/vmlinux和ls $KERNEL/arch/x86/boot/bzImage,若能看到相应文件则说明安装成功。

3.安装Image(小型的Debian-wheezy的Linux镜像)

3.1 首先安装debootstrap:

sudo apt-get install debootstrap

3.2 用下列脚本安装Image:


 #!/bin/bash

 # Copyright 2016 syzkaller project authors. All rights reserved.

 # Use of this source code is governed by Apache 2 LICENSE that can be found in the LICENSE file.

 # create-image.sh creates a minimal Debian-wheezy Linux image suitable for syzkaller.

 set -eux

 # Create a minimal Debian-wheezy distributive as a directory.

 sudo rm -rf wheezy

 mkdir -p wheezy

 sudo debootstrap --include=openssh-server,curl,tar,gcc,libc6-dev,time,strace,sudo,less,psmisc wheezy wheezy

 # Set some defaults and enable promtless ssh to the machine for root.

 sudo sed -i '/^root/ { s/:x:/::/ }' wheezy/etc/passwd

 echo 'T0:23:respawn:/sbin/getty -L ttyS0 115200 vt100' | sudo tee -a wheezy/etc/inittab

 printf '\nauto eth0\niface eth0 inet dhcp\n' | sudo tee -a wheezy/etc/network/interfaces

 echo 'debugfs /sys/kernel/debug debugfs defaults 0 0' | sudo tee -a wheezy/etc/fstab

 echo "kernel.printk = 7 4 1 3" | sudo tee -a wheezy/etc/sysctl.conf

 echo 'debug.exception-trace = 0' | sudo tee -a wheezy/etc/sysctl.conf

 echo "net.core.bpf_jit_enable = 1" | sudo tee -a wheezy/etc/sysctl.conf

 echo "net.core.bpf_jit_harden = 2" | sudo tee -a wheezy/etc/sysctl.conf

 echo "net.ipv4.ping_group_range = 0 65535" | sudo tee -a wheezy/etc/sysctl.conf

 echo -en "127.0.0.1\tlocalhost\n" | sudo tee wheezy/etc/hosts

 echo "nameserver 8.8.8.8" | sudo tee -a wheezy/etc/resolve.conf

 echo "syzkaller" | sudo tee wheezy/etc/hostname

 sudo mkdir -p wheezy/root/.ssh/

 rm -rf ssh

 mkdir -p ssh

 ssh-keygen -f ssh/id_rsa -t rsa -N ''

 cat ssh/id_rsa.pub | sudo tee wheezy/root/.ssh/authorized_keys

 # Build a disk image

 dd if=/dev/zero of=wheezy.img bs=1M seek=2047 count=1

 sudo mkfs.ext4 -F wheezy.img

 sudo mkdir -p /mnt/wheezy

 sudo mount -o loop wheezy.img /mnt/wheezy

 sudo cp -a wheezy/. /mnt/wheezy/.

 sudo umount /mnt/wheezy

参考链接

3.3 若安装完成则能看到wheezy.img文件。

3.4 (还有一些对于运行syzkaller不必要但是很有用的工具:)


 sudo chroot wheezy /bin/bash -c "apt-get update; apt-get install -y curl tar time strace gcc make sysbench git vim screen usbutils"

 sudo chroot wheezy /bin/bash -c "mkdir -p ~; cd ~/; wget https://github.com/kernelslacker/trinity/archive/v1.5.tar.gz -O trinity-1.5.tar.gz; tar -xf trinity-1.5.tar.gz"

 sudo chroot wheezy /bin/bash -c "cd ~/trinity-1.5 ; ./configure.sh ; make -j16 ; make install"

 cp -r $KERNEL wheezy/tmp/

 sudo chroot wheezy /bin/bash -c "apt-get update; apt-get install -y flex bison python-dev libelf-dev libunwind7-dev libaudit-dev libslang2-dev libperl-dev binutils-dev liblzma-dev libnuma-dev"

 sudo chroot wheezy /bin/bash -c "cd /tmp/linux/tools/perf/; make"

 sudo chroot wheezy /bin/bash -c "cp /tmp/linux/tools/perf/perf /usr/bin/"

 rm -r wheezy/tmp/linux

4.安装QEMU:

4.1

sudo apt-get install kvm qemu-kvm

4.2 安装完后确保内核能够启动:


qemu-system-x86_64 \

   -kernel $KERNEL/arch/x86/boot/bzImage \

   -append "console=ttyS0 root=/dev/sda debug earlyprintk=serial slub_debug=QUZ"\

   -hda $IMAGE/wheezy.img \

   -net user,hostfwd=tcp::10021-:22 -net nic \

   -enable-kvm \

   -nographic \

   -m 2G \

   -smp 2 \

   -pidfile vm.pid \

   2>&1 | tee vm.log

正常启动信息:


early console in setup code

 early console in extract_kernel

 input_data: 0x0000000005d9e276

 input_len: 0x0000000001da5af3

 output: 0x0000000001000000

 output_len: 0x00000000058799f8

 kernel_total_size: 0x0000000006b63000

 Decompressing Linux... Parsing ELF... done.

 Booting the kernel.

[    0.000000] Linux version 4.12.0-rc3+ ...

[    0.000000] Command line: console=ttyS0 root=/dev/sda debug earlyprintk=serial

...

[ ok ] Starting enhanced syslogd: rsyslogd.

[ ok ] Starting periodic command scheduler: cron.

[ ok ] Starting OpenBSD Secure Shell server: sshd.

4.3 After that you should be able to ssh to QEMU instance in another terminal:(这个不太理解在做什么)


ssh -i $IMAGE/ssh/id_rsa -p 10021 -o "StrictHostKeyChecking no" root@localhost

这里会要你输入root@localhost的身份验证信息,不知道哪有这个信息。但是可以以root用户执行上述命令,则不用身份验证。

4.4 结束QEMU进程:

kill $(cat vm.pid)

5.安装GO(前趋条件1)(32-bit参考)


wget https://storage.googleapis.com/golang/go1.8.1.linux-amd64.tar.gz

 tar -xf go1.8.1.linux-amd64.tar.gz

 mv go goroot

 export GOROOT=`pwd`/goroot

 export PATH=$PATH:$GOROOT/bin

 mkdir gopath

 export GOPATH=`pwd`/gopath

一定要正确设置goroot和gopath变量。

6.安装syzkaller:

6.1 取得安装包:


go get -u -d github.com/google/syzkaller/...

cd gopath/src/github.com/google/syzkaller/

mkdir workdir

make

6.2 添加配置文件:


 {

     "http": "127.0.0.1:56741",

     "workdir": "/gopath/src/github.com/google/syzkaller/workdir",

     "vmlinux": "/linux/upstream/vmlinux",

     "image": "/image/wheezy.img",

     "sshkey": "/image/ssh/id_rsa",

     "syzkaller": "/gopath/src/github.com/google/syzkaller",

     "procs": 8,

     "type": "qemu",

     "vm": {

         "count": 4,

         "kernel": "/linux/arch/x86/boot/bzImage",

         "cpu": 2,

         "mem": 2048

     }

 }

将配置中的相应字段改成自己主机上的路径。

配置字段含义

6.3 运行syzkaller manager:

 ./bin/syz-manager -config=my.cfg

执行./bin/syz-manager -config=my.cfg时,可能会出现:

failed to copy binary,可考虑使用sudo执行命令;

/sys/kernel/debug/kcov is missing. Enable CONFIG_KCOV and mount debugfs,则需要开启内核上的CONFIG_KCOV选项,也可以在配置文件中加入”cover”: false即不需要使用覆盖率的信息进行测试

该问题和Ubuntu内核版本无关,Ubuntu14和Ubuntu16上均出现此错误。

运行成功会看到以下界面:

Image

至此,syzkaller的运行环境就全部搭建完成了。

那么接下来再简述一下syzkaller的工作原理:

简述一下syzkaller的工作原理

整个syzkaller基于这样一个拓扑结构进行工作,从图上可以看到用户直接是通过syz-manager进行交互,而不直接与fuzzer接触,通过在syz-manager的配置文件中指定相应字段的值,该syz-manager基于这一些值进行与fuzzer的连接以及日志的文件输出等等。图中的VM在笔者这里是一个QEMU的实例(一般会启动4~8个这样的实例进行测试),真正执行测试的两大核心组件syz-fuzzer和syz-executor则在每个实例之中都存在。在syz-executor不断给被测kernel输入包含随机的syscall序列的程序的同时,sys-fuzzer也在不断接收内核的覆盖率信息以传给syz-executor提供更加有效的测试程序。

如果在这个过程中发生了crash,则该信息会在web端和workdir下显示相关的crash的log和report文件。

这是笔者运行出来的结果:

这是笔者运行出来的结果

可以看到其中出现了四种类型的错误(其中三种是典型错误),下面来分析一下错误产生的流程。

首先,整个程序的入口是manager.go之中的main函数:


func main() {

    flag.Parse()

    EnableLogCaching(1000, 1<<20)

    cfg, syscalls, err := mgrconfig.LoadFile(*flagConfig)

    if err != nil {

        Fatalf("%v", err)

    }

    initAllCover(cfg.Vmlinux)

    RunManager(cfg, syscalls)

}

第一行是将各个选项转换为命令行参数;

第二行是设置web端的显示界面的最大显示行数为1000行;

第三行就开始读取配置文件中的相关选项中并得到相应的syscall了,整个过程如下:


LoadFile函数位于mgrconfig.go中:

func LoadFile(filename string) (*Config, map[int]bool, error) {

    return load(nil, filename)

}

Load函数如下:


func load(data []byte, filename string) (*Config, map[int]bool, error) {

    cfg := &Config{

        Cover:     true,

        Reproduce: true,

        Sandbox:   "setuid",

        Rpc:       "localhost:0",

        Procs:     1,

    }

    if data != nil {

        if err := config.LoadData(data, cfg); err != nil {

            return nil, nil, err

        }

    } else {

        if err := config.LoadFile(filename, cfg); err != nil {

            return nil, nil, err

        }

    }

    if !osutil.IsExist(filepath.Join(cfg.Syzkaller, "bin/syz-fuzzer")) {

        return nil, nil, fmt.Errorf("bad config syzkaller param: can't find bin/syz-fuzzer")

    }

    if !osutil.IsExist(filepath.Join(cfg.Syzkaller, "bin/syz-executor")) {

        return nil, nil, fmt.Errorf("bad config syzkaller param: can't find bin/syz-executor")

    }

    if !osutil.IsExist(filepath.Join(cfg.Syzkaller, "bin/syz-execprog")) {

        return nil, nil, fmt.Errorf("bad config syzkaller param: can't find bin/syz-execprog")

    }

    if cfg.Http == "" {

        return nil, nil, fmt.Errorf("config param http is empty")

    }

    if cfg.Workdir == "" {

        return nil, nil, fmt.Errorf("config param workdir is empty")

    }

    if cfg.Vmlinux == "" {

        return nil, nil, fmt.Errorf("config param vmlinux is empty")

    }

    if cfg.Type == "" {

        return nil, nil, fmt.Errorf("config param type is empty")

    }

    if cfg.Procs < 1 || cfg.Procs > 32 {

        return nil, nil, fmt.Errorf("bad config param procs: '%v', want [1, 32]", cfg.Procs)

    }

    switch cfg.Sandbox {

    case "none", "setuid", "namespace":

    default:

        return nil, nil, fmt.Errorf("config param sandbox must contain one of none/setuid/namespace")

    }

    cfg.Workdir = osutil.Abs(cfg.Workdir)

    cfg.Vmlinux = osutil.Abs(cfg.Vmlinux)

    cfg.Syzkaller = osutil.Abs(cfg.Syzkaller)

    if cfg.Kernel_Src == "" {

        cfg.Kernel_Src = filepath.Dir(cfg.Vmlinux) // assume in-tree build by default

    }

    syscalls, err := parseSyscalls(cfg)

    if err != nil {

        return nil, nil, err

    }

    if err := parseSuppressions(cfg); err != nil {

        return nil, nil, err

    }

    if cfg.Hub_Client != "" && (cfg.Name == "" || cfg.Hub_Addr == "" || cfg.Hub_Key == "") {

        return nil, nil, fmt.Errorf("hub_client is set, but name/hub_addr/hub_key is empty")

    }

    if cfg.Dashboard_Client != "" && (cfg.Name == "" ||

        cfg.Dashboard_Addr == "" ||

        cfg.Dashboard_Key == "") {

        return nil, nil, fmt.Errorf("dashboard_client is set, but name/dashboard_addr/dashboard_key is empty")

    }

    return cfg, syscalls, nil

}

其中产生syscall的核心在parseSyscall函数中:


func parseSyscalls(cfg *Config) (map[int]bool, error) {

    match := func(call *sys.Call, str string) bool {

        if str == call.CallName || str == call.Name {

            return true

        }

        if len(str) > 1 && str[len(str)-1] == '*' && strings.HasPrefix(call.Name, str[:len(str)-1]) {

            return true

        }

        return false

    }

    syscalls := make(map[int]bool)

    if len(cfg.Enable_Syscalls) != 0 {

        for _, c := range cfg.Enable_Syscalls {

            n := 0

            for _, call := range sys.Calls {

                if match(call, c) {

                    syscalls[call.ID] = true

                    n++

                }

            }

            if n == 0 {

                return nil, fmt.Errorf("unknown enabled syscall: %v", c)

            }

        }

    } else {

        for _, call := range sys.Calls {

            syscalls[call.ID] = true

        }

    }

    for _, c := range cfg.Disable_Syscalls {

        n := 0

        for _, call := range sys.Calls {

            if match(call, c) {

                delete(syscalls, call.ID)

                n++

            }

        }

        if n == 0 {

            return nil, fmt.Errorf("unknown disabled syscall: %v", c)

        }

    }

    // mmap is used to allocate memory.

    syscalls[sys.CallMap["mmap"].ID] = true

    return syscalls, nil

}

其中这个函数在判断配置文件之中是否有制定enable_syscalls或者disable_syscalls以确定需要进行测试的syscall集合。

若没有指定以上两个字段,则默认使用所有的syscall(sys.Calls),其中syscall相关信息存储在syscall.h中:


// AUTOGENERATED FILE

#define __NR_syz_emit_ethernet 1000006

#define __NR_syz_extract_tcp_res 1000008

#define __NR_syz_fuse_mount 1000004

#define __NR_syz_fuseblk_mount 1000005

#define __NR_syz_kvm_setup_cpu 1000007

#define __NR_syz_open_dev 1000002

#define __NR_syz_open_pts 1000003

#define __NR_syz_test 1000001

struct call_t {

    const char* name;

    int sys_nr;

};

#if defined(__x86_64__) || 0

static call_t syscalls[] = {

    {"accept", 43},

    {"accept$alg", 43},

    {"accept$ax25", 43},

    {"accept$inet", 43},

    {"accept$inet6", 43},

    {"accept$ipx", 43},

    {"accept$llc", 43},

    {"accept$netrom", 43},

    {"accept$nfc_llcp", 43},

      ...

};

#endif

#if defined(__aarch64__) || 0

static call_t syscalls[] = {

    {"accept", 202},

    {"accept$alg", 202},

    {"accept$ax25", 202},

    {"accept$inet", 202},

    {"accept$inet6", 202},

    {"accept$ipx", 202},

       ...

};

#endif

#if defined(__ppc64__) || defined(__PPC64__) || defined(__powerpc64__) || 0

static call_t syscalls[] = {

    {"accept", 330},

    {"accept$alg", 330},

    {"accept$ax25", 330},

    {"accept$inet", 330},

    {"accept$inet6", 330},

    {"accept$ipx", 330},

    {"accept$llc", 330},

    ...

};

#endif

根据不同的型号提取相应的syscall映射关系。

若LoadFile函数没有出现错误,则继续执行initAllCover函数和RunManager函数,其中initAllCover函数笔者还未弄懂在做什么工作,产生错误信息的代码处于RunManager函数中:


func RunManager(cfg *mgrconfig.Config, syscalls map[int]bool) {

    env := mgrconfig.CreateVMEnv(cfg, *flagDebug)

    vmPool, err := vm.Create(cfg.Type, env)

    if err != nil {

        Fatalf("%v", err)

    }

    crashdir := filepath.Join(cfg.Workdir, "crashes")

    osutil.MkdirAll(crashdir)

    enabledSyscalls := ""

    if len(syscalls) != 0 {

        buf := new(bytes.Buffer)

        for c := range syscalls {

            fmt.Fprintf(buf, ",%v", c)

        }

        enabledSyscalls = buf.String()[1:]

        Logf(1, "enabled syscalls: %v", enabledSyscalls)

    }

    mgr := &Manager{

        cfg:             cfg,

        vmPool:          vmPool,

        crashdir:        crashdir,

        startTime:       time.Now(),

        stats:           make(map[string]uint64),

        crashTypes:      make(map[string]bool),

        enabledSyscalls: enabledSyscalls,

        corpus:          make(map[string]RpcInput),

        disabledHashes:  make(map[string]struct{}),

        corpusSignal:    make(map[uint32]struct{}),

        maxSignal:       make(map[uint32]struct{}),

        corpusCover:     make(map[uint32]struct{}),

        fuzzers:         make(map[string]*Fuzzer),

        fresh:           true,

        vmStop:          make(chan bool),

    }

    Logf(0, "loading corpus...")

    mgr.corpusDB, err = db.Open(filepath.Join(cfg.Workdir, "corpus.db"))

    if err != nil {

        Fatalf("failed to open corpus database: %v", err)

    }

    deleted := 0

    for key, rec := range mgr.corpusDB.Records {

        p, err := prog.Deserialize(rec.Val)

        if err != nil {

            if deleted < 10 {

                Logf(0, "deleting broken program: %v\n%s", err, rec.Val)

            }

            mgr.corpusDB.Delete(key)

            deleted++

            continue

        }

        disabled := false

        for _, c := range p.Calls {

            if !syscalls[c.Meta.ID] {

                disabled = true

                break

            }

        }

        if disabled {

            // This program contains a disabled syscall.

            // We won't execute it, but remeber its hash so

            // it is not deleted during minimization.

            // TODO: use mgr.enabledCalls which accounts for missing devices, etc.

            // But it is available only after vm check.

            mgr.disabledHashes[hash.String(rec.Val)] = struct{}{}

            continue

        }

        mgr.candidates = append(mgr.candidates, RpcCandidate{

            Prog:      rec.Val,

            Minimized: true, // don't reminimize programs from corpus, it takes lots of time on start

        })

    }

    mgr.fresh = len(mgr.corpusDB.Records) == 0

    Logf(0, "loaded %v programs (%v total, %v deleted)", len(mgr.candidates), len(mgr.corpusDB.Records), deleted)

    // Now this is ugly.

    // We duplicate all inputs in the corpus and shuffle the second part.

    // This solves the following problem. A fuzzer can crash while triaging candidates,

    // in such case it will also lost all cached candidates. Or, the input can be somewhat flaky

    // and doesn't give the coverage on first try. So we give each input the second chance.

    // Shuffling should alleviate deterministically losing the same inputs on fuzzer crashing.

    mgr.candidates = append(mgr.candidates, mgr.candidates...)

    shuffle := mgr.candidates[len(mgr.candidates)/2:]

    for i := range shuffle {

        j := i + rand.Intn(len(shuffle)-i)

        shuffle[i], shuffle[j] = shuffle[j], shuffle[i]

    }

    // Create HTTP server.

    mgr.initHttp()

    // Create RPC server for fuzzers.

    s, err := NewRpcServer(cfg.Rpc, mgr)

    if err != nil {

        Fatalf("failed to create rpc server: %v", err)

    }

    Logf(0, "serving rpc on tcp://%v", s.Addr())

    mgr.port = s.Addr().(*net.TCPAddr).Port

    go s.Serve()

    if cfg.Dashboard_Addr != "" {

        mgr.dash = dashapi.New(cfg.Dashboard_Client, cfg.Dashboard_Addr, cfg.Dashboard_Key)

    }

    go func() {

        for lastTime := time.Now(); ; {

            time.Sleep(10 * time.Second)

            now := time.Now()

            diff := now.Sub(lastTime)

            lastTime = now

            mgr.mu.Lock()

            if mgr.firstConnect.IsZero() {

                mgr.mu.Unlock()

                continue

            }

            mgr.fuzzingTime += diff * time.Duration(atomic.LoadUint32(&mgr.numFuzzing))

            executed := mgr.stats["exec total"]

            crashes := mgr.stats["crashes"]

            mgr.mu.Unlock()

            Logf(0, "executed programs: %v, crashes: %v", executed, crashes)

        }

    }()

    if *flagBench != "" {

        f, err := os.OpenFile(*flagBench, os.O_WRONLY|os.O_CREATE|os.O_EXCL, osutil.DefaultFilePerm)

        if err != nil {

            Fatalf("failed to open bench file: %v", err)

        }

        go func() {

            for {

                time.Sleep(time.Minute)

                vals := make(map[string]uint64)

                mgr.mu.Lock()

                if mgr.firstConnect.IsZero() {

                    mgr.mu.Unlock()

                    continue

                }

                mgr.minimizeCorpus()

                vals["corpus"] = uint64(len(mgr.corpus))

                vals["uptime"] = uint64(time.Since(mgr.firstConnect)) / 1e9

                vals["fuzzing"] = uint64(mgr.fuzzingTime) / 1e9

                vals["signal"] = uint64(len(mgr.corpusSignal))

                vals["coverage"] = uint64(len(mgr.corpusCover))

                for k, v := range mgr.stats {

                    vals[k] = v

                }

                mgr.mu.Unlock()

                data, err := json.MarshalIndent(vals, "", "  ")

                if err != nil {

                    Fatalf("failed to serialize bench data")

                }

                if _, err := f.Write(append(data, '\n')); err != nil {

                    Fatalf("failed to write bench data")

                }

            }

        }()

    }

    if mgr.cfg.Hub_Client != "" {

        go func() {

            for {

                time.Sleep(time.Minute)

                mgr.hubSync()

            }

        }()

    }

    go func() {

        c := make(chan os.Signal, 2)

        signal.Notify(c, syscall.SIGINT)

        <-c

        close(vm.Shutdown)

        Logf(0, "shutting down...")

        <-c

        Fatalf("terminating")

    }()

    mgr.vmLoop()

}

中间的代码笔者就不加赘述了,关键产生crash的地方只在于最后一句mgr.vmLoop()之中:

func (mgr *Manager) vmLoop() {

    Logf(0, "booting test machines...")

    Logf(0, "wait for the connection from test machine...")

    instancesPerRepro := 4

    vmCount := mgr.vmPool.Count()

    if instancesPerRepro > vmCount {

        instancesPerRepro = vmCount

    }

    instances := make([]int, vmCount)

    for i := range instances {

        instances[i] = vmCount - i - 1

    }

    runDone := make(chan *RunResult, 1)

    pendingRepro := make(map[*Crash]bool)

    reproducing := make(map[string]bool)

    reproInstances := 0

    var reproQueue []*Crash

    reproDone := make(chan *ReproResult, 1)

    stopPending := false

    shutdown := vm.Shutdown

    for {

        mgr.mu.Lock()

        phase := mgr.phase

        mgr.mu.Unlock()

        for crash := range pendingRepro {

            if reproducing[crash.desc] {

                continue

            }

            delete(pendingRepro, crash)

            if !mgr.needRepro(crash.desc) {

                continue

            }

            Logf(1, "loop: add to repro queue '%v'", crash.desc)

            reproducing[crash.desc] = true

            reproQueue = append(reproQueue, crash)

        }

        Logf(1, "loop: phase=%v shutdown=%v instances=%v/%v %+v repro: pending=%v reproducing=%v queued=%v",

            phase, shutdown == nil, len(instances), vmCount, instances,

            len(pendingRepro), len(reproducing), len(reproQueue))

        canRepro := func() bool {

            return phase >= phaseTriagedHub &&

                len(reproQueue) != 0 && reproInstances+instancesPerRepro <= vmCount

        }

        if shutdown == nil {

            if len(instances) == vmCount {

                return

            }

        } else {

            for canRepro() && len(instances) >= instancesPerRepro {

                last := len(reproQueue) - 1

                crash := reproQueue[last]

                reproQueue[last] = nil

                reproQueue = reproQueue[:last]

                vmIndexes := append([]int{}, instances[len(instances)-instancesPerRepro:]...)

                instances = instances[:len(instances)-instancesPerRepro]

                reproInstances += instancesPerRepro

                Logf(1, "loop: starting repro of '%v' on instances %+v", crash.desc, vmIndexes)

                go func() {

                    res, err := repro.Run(crash.output, mgr.cfg, mgr.vmPool, vmIndexes)

                    reproDone <- &ReproResult{vmIndexes, crash.desc, res, err}

                }()

            }

            for !canRepro() && len(instances) != 0 {

                last := len(instances) - 1

                idx := instances[last]

                instances = instances[:last]

                Logf(1, "loop: starting instance %v", idx)

                go func() {

                    crash, err := mgr.runInstance(idx)

                    runDone <- &RunResult{idx, crash, err}

                }()

            }

        }

        var stopRequest chan bool

        if !stopPending && canRepro() {

            stopRequest = mgr.vmStop

        }

        select {

        case stopRequest <- true:

            Logf(1, "loop: issued stop request")

            stopPending = true

        case res := <-runDone:

            Logf(1, "loop: instance %v finished, crash=%v", res.idx, res.crash != nil)

            if res.err != nil && shutdown != nil {

                Logf(0, "%v", res.err)

            }

            stopPending = false

            instances = append(instances, res.idx)

            // On shutdown qemu crashes with "qemu: terminating on signal 2",

            // which we detect as "lost connection". Don't save that as crash.

            if shutdown != nil && res.crash != nil && !mgr.isSuppressed(res.crash) {

                mgr.saveCrash(res.crash)

                if mgr.needRepro(res.crash.desc) {

                    Logf(1, "loop: add pending repro for '%v'", res.crash.desc)

                    pendingRepro[res.crash] = true

                }

            }

        case res := <-reproDone:

            crepro := false

            desc := ""

            if res.res != nil {

                crepro = res.res.CRepro

                desc = res.res.Desc

            }

            Logf(1, "loop: repro on %+v finished '%v', repro=%v crepro=%v desc='%v'",

                res.instances, res.desc0, res.res != nil, crepro, desc)

            if res.err != nil {

                Logf(0, "repro failed: %v", res.err)

            }

            delete(reproducing, res.desc0)

            instances = append(instances, res.instances...)

            reproInstances -= instancesPerRepro

            if res.res == nil {

                mgr.saveFailedRepro(res.desc0)

            } else {

                mgr.saveRepro(res.res)

            }

        case <-shutdown:

            Logf(1, "loop: shutting down...")

            shutdown = nil

        }

    }

}

继续跟进可以在函数中看到这么一句代码crash, err := mgr.runInstance(idx)的返回值是和crash有关的,于是继续跟进runInstance函数,该函数还仍然隶属于manager下,所以并不强调虚拟机的类型:


func (mgr *Manager) runInstance(index int) (*Crash, error) {

    inst, err := mgr.vmPool.Create(index)

    if err != nil {

        return nil, fmt.Errorf("failed to create instance: %v", err)

    }

    defer inst.Close()

    fwdAddr, err := inst.Forward(mgr.port)

    if err != nil {

        return nil, fmt.Errorf("failed to setup port forwarding: %v", err)

    }

    fuzzerBin, err := inst.Copy(filepath.Join(mgr.cfg.Syzkaller, "bin", "syz-fuzzer"))

    if err != nil {

        return nil, fmt.Errorf("failed to copy binary: %v", err)

    }

    executorBin, err := inst.Copy(filepath.Join(mgr.cfg.Syzkaller, "bin", "syz-executor"))

    if err != nil {

        return nil, fmt.Errorf("failed to copy binary: %v", err)

    }

    // Leak detection significantly slows down fuzzing, so detect leaks only on the first instance.

    leak := mgr.cfg.Leak && index == 0

    fuzzerV := 0

    procs := mgr.cfg.Procs

    if *flagDebug {

        fuzzerV = 100

        procs = 1

    }

    // Run the fuzzer binary.

    start := time.Now()

    atomic.AddUint32(&mgr.numFuzzing, 1)

    defer atomic.AddUint32(&mgr.numFuzzing, ^uint32(0))

    cmd := fmt.Sprintf("%v -executor=%v -name=vm-%v -manager=%v -procs=%v -leak=%v -cover=%v -sandbox=%v -debug=%v -v=%d",

        fuzzerBin, executorBin, index, fwdAddr, procs, leak, mgr.cfg.Cover, mgr.cfg.Sandbox, *flagDebug, fuzzerV)

    outc, errc, err := inst.Run(time.Hour, mgr.vmStop, cmd)

    if err != nil {

        return nil, fmt.Errorf("failed to run fuzzer: %v", err)

    }

    desc, text, output, crashed, timedout := vm.MonitorExecution(outc, errc, true, mgr.cfg.ParsedIgnores)

    if timedout {

        // This is the only "OK" outcome.

        Logf(0, "vm-%v: running for %v, restarting (%v)", index, time.Since(start), desc)

        return nil, nil

    }

    if !crashed {

        // syz-fuzzer exited, but it should not.

        desc = "lost connection to test machine"

    }

    return &Crash{index, desc, text, output}, nil

}

函数中的第一句代码inst, err := mgr.vmPool.Create(index)返回的inst其实已经确定了具体虚拟机的类型(由于笔者的虚拟机是QEMU,所以接下来皆以QEMU为例进行讲解);又看到最后一句的返回值return &Crash{index, desc, text, output}, nil返回的是一个Crash结构,包括该Crash的描述信息等等,所以继续跟进MonitorExecution函数:


func MonitorExecution(outc <-chan []byte, errc <-chan error, needOutput bool, ignores []*regexp.Regexp) (desc string, text, output []byte, crashed, timedout bool) {

    waitForOutput := func() {

        dur := time.Second

        if needOutput {

            dur = 10 * time.Second

        }

        timer := time.NewTimer(dur).C

        for {

            select {

            case out, ok := <-outc:

                if !ok {

                    return

                }

                output = append(output, out...)

            case <-timer:

                return

            }

        }

    }

    matchPos := 0

    const (

        beforeContext = 1024 << 10

        afterContext  = 128 << 10

    )

    extractError := func(defaultError string) (string, []byte, []byte, bool, bool) {

        // Give it some time to finish writing the error message.

        waitForOutput()

        if bytes.Contains(output, []byte("SYZ-FUZZER: PREEMPTED")) {

            return "preempted", nil, nil, false, true

        }

        if !report.ContainsCrash(output[matchPos:], ignores) {

            return defaultError, nil, output, defaultError != "", false

        }

        desc, text, start, end := report.Parse(output[matchPos:], ignores)

        start = start + matchPos - beforeContext

        if start < 0 {

            start = 0

        }

        end = end + matchPos + afterContext

        if end > len(output) {

            end = len(output)

        }

        return desc, text, output[start:end], true, false

    }

    lastExecuteTime := time.Now()

    ticker := time.NewTimer(3 * time.Minute)

    tickerFired := false

    for {

        if !tickerFired && !ticker.Stop() {

            <-ticker.C

        }

        tickerFired = false

        ticker.Reset(3 * time.Minute)

        select {

        case err := <-errc:

            switch err {

            case nil:

                // The program has exited without errors,

                // but wait for kernel output in case there is some delayed oops.

                return extractError("")

            case TimeoutErr:

                return err.Error(), nil, nil, false, true

            default:

                // Note: connection lost can race with a kernel oops message.

                // In such case we want to return the kernel oops.

                return extractError("lost connection to test machine")

            }

        case out := <-outc:

            output = append(output, out...)

            if bytes.Index(output[matchPos:], []byte("executing program")) != -1 { // syz-fuzzer output

                lastExecuteTime = time.Now()

            }

            if bytes.Index(output[matchPos:], []byte("executed programs:")) != -1 { // syz-execprog output

                lastExecuteTime = time.Now()

            }

            if report.ContainsCrash(output[matchPos:], ignores) {

                return extractError("unknown error")

            }

            if len(output) > 2*beforeContext {

                copy(output, output[len(output)-beforeContext:])

                output = output[:beforeContext]

            }

            matchPos = len(output) - 128

            if matchPos < 0 {

                matchPos = 0

            }

            // In some cases kernel constantly prints something to console,

            // but fuzzer is not actually executing programs.

            if time.Since(lastExecuteTime) > 3*time.Minute {

                return "test machine is not executing programs", nil, output, true, false

            }

        case <-ticker.C:

            tickerFired = true

            return "no output from test machine", nil, output, true, false

        case <-Shutdown:

            return "", nil, nil, false, false

        }

    }

}

可以看到dur = 10 * time.Second是在指定上文提到过的web端每10秒输出一条信息,而错误信息则从Parse函数中得到。

具体的执行每个syscall的函数则在fuzzer.go内的execute和execute1函数中,由于本人还在分析之中,所以在此不做进一步的分析了,若有读者了解fuzzer中的函数流程和原理,欢迎留言!

以上工作是本人于2017年在奇虎360公司实习期间所做,感谢公司对我的悉心指导!

 项目地址

 原文地址

附:syzkaller各工具的选项

syz-manager

-bench string

write execution statistics into this file periodically

string指定一个将要写入执行数据的目标文件名(需要这个文件不存在)。用于定期接收执行的参数数据,如:


{

  "corpus": 0,

  "coverage": 0,

  "exec candidate": 0,

  "exec fuzz": 0,

  "exec gen": 214,

  "exec minimize": 0,

  "exec smash": 0,

  "exec total": 214,

  "exec triage": 0,

  "executor restarts": 9,

  "fuzzer new inputs": 0,

  "fuzzing": 60,

  "signal": 0,

  "uptime": 45,

  "vm restarts": 2

}

该数据和web端表格内数据一致。

-config string

configuration file

指定配置文件。

-debug

dump all VM output to console

输出所有的调试信息,报考QEMU启动的所有信息。


2017/08/01 22:33:50 loading corpus...

2017/08/01 22:33:50 loaded 0 programs (0 total, 0 deleted)

2017/08/01 22:33:50 serving http on http://127.0.0.1:56741

2017/08/01 22:33:50 serving rpc on tcp://[::]:45365

2017/08/01 22:33:50 booting test machines...

2017/08/01 22:33:50 wait for the connection from test machine...

2017/08/01 22:33:50 running command: qemu-system-x86_64 []string{"-m", "2048", "-net", "nic", "-net", "user,host=10.0.2.10,hostfwd=tcp::25985-:22", "-display", "none", "-serial", "stdio", "-no-reboot", "-numa", "node,nodeid=0,cpus=0-1", "-numa", "node,nodeid=1,cpus=2-3", "-smp", "sockets=2,cores=2,threads=1", "-enable-kvm", "-usb", "-usbdevice", "mouse", "-usbdevice", "tablet", "-soundhw", "all", "-hda", "/home/peter/Desktop/syz/image/wheezy.img", "-snapshot", "-kernel", "/home/peter/Desktop/syz/kernel/arch/x86/boot/bzImage", "-append", "console=ttyS0 vsyscall=native rodata=n oops=panic nmi_watchdog=panic panic_on_warn=1 panic=86400 ftrace_dump_on_oops=orig_cpu earlyprintk=serial net.ifnames=0 biosdevname=0 kvm-intel.nested=1 kvm-intel.unrestricted_guest=1 kvm-intel.vmm_exclusive=1 kvm-intel.fasteoi=1 kvm-intel.ept=1 kvm-intel.flexpriority=1 kvm-intel.vpid=1 kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1 kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1 kvm-intel.enable_apicv=1 root=/dev/sda "}

early console in extract_kernel

input_data: 0x0000000001e8a276

input_len: 0x00000000006d62e5

output: 0x0000000001000000

output_len: 0x000000000154e9fc

kernel_total_size: 0x0000000001179000

booted via startup_32()

Physical KASLR using RDTSC...

Virtual KASLR using RDTSC...

-v int

verbosity

赘言。

指定crash的Log文件中的赘言的级别。只接受≤1的int值。

syz-fuzzer

-abort_signal int

initial signal to send to executor in error conditions; upgrades to SIGKILL if executor does not exit

在发生错误的情形下发送给executor的标志量,如果executor不存在则发送SIGKILL


-buffer_size uint

internal buffer size (in bytes) for executor output

给executor输出的内部缓冲大小

-collide

collide syscalls to provoke data races (default true)

(是否并发的执行冲突的syscall??默认为true。)

-cover

collect feedback signals (coverage) (default true)

收集覆盖率的反馈信息(默认为true)

-debug

debug output from executor

是否输出所有executor的调试信息。

-executor string

path to executor binary

指定syz-executor的可执行文件的路径。

-leak

detect memory leaks

是否检测内存泄漏的情况。

-manager string

manager rpc address

manager的远程过程调用的地址

-name string

unique name for manager

给syz-manager设置的唯一的标识名

-output string

write programs to none/stdout/dmesg/file (default “stdout”)

指定测试程序输出的路径

-pprof string

address to serve pprof profiles

给pprof的服务的地址

-procs int

number of parallel test processes (default 1)

并发的测试进程数(默认为1)

-sandbox string

sandbox for fuzzing (none/setuid/namespace) (default “setuid”)

用于fuzzing的沙盒地址

-threaded

use threaded mode in executor (default true)

在syz-executor中使用threaded模式(多线程)(默认为true)

-timeout duration

execution timeout (default 1m0s)

执行超时间隔(默认为1分钟)

-v int

verbosity

是否开启赘言(只有syz-manager的debug选项开启的时候才能开启赘言)

syz-executor

mmap of input file failed (errno 9)

syz-execprog

-abort_signal int

initial signal to send to executor in error conditions; upgrades to SIGKILL if executor does not exit

在发生错误的情形下发送给executor的标志量,如果executor不存在则发送SIGKILL

-buffer_size uint

internal buffer size (in bytes) for executor output

给executor输出的内部缓冲大小

-collide

collide syscalls to provoke data races (default true)

(是否并发的执行冲突的syscall??默认为true。)

-cover

collect feedback signals (coverage) (default true)

收集覆盖率的反馈信息(默认为true)

-coverfile string

write coverage to the file

指定输出覆盖率信息的文件

-debug

debug output from executor

是否输出所有executor的调试信息。

-executor string

path to executor binary (default “./syz-executor”)

指定syz-executor的可执行文件的路径。(默认为./syz-executor)

-fault_call int

inject fault into this call (0-based) (default -1)

给这次调用注入错误(0为初始值计数)(默认值为-1)

-fault_nth int

inject fault on n-th operation (0-based)

给第n次的操作注入错误(0为初始值开始计数)

-output string

write programs to none/stdout (default “none”)

指定测试程序的输出路径(默认为“none”)

-procs int

number of parallel processes to execute programs (default 1)

执行测试程序的并发度(默认为1)

-repeat int

repeat execution that many times (0 for infinite loop) (default 1)

指定执行该程序的重复次数(0表示无限循环)(默认值为1)

-sandbox string

sandbox for fuzzing (none/setuid/namespace) (default “setuid”)

用于fuzzing的沙盒地址(默认为“setuid)

-threaded

use threaded mode in executor (default true)

在syz-executor中使用threaded模式(多线程)(默认为true)

-timeout duration

execution timeout (default 1m0s)

执行超时间隔(默认为1分钟)

-v int

verbosity

是否开启赘言(只有syz-manager的debug选项开启的时候才能开启赘言)

*本文作者:阡陌时空