Jonathan Salwan


Concolic execution - Taint analysis with Valgrind and constraints path solver with Z3

by Jonathan Salwan - 2013-06-10


Last summer, with my friends Ahmed Bougacha and Pierre Collet, we worked on a personal project called Taminoo. Basically, Taminoo is a constraint path solver using Valgrind and Z3. At first, we didn't plan to release it because it was just a PoC. But we hope it will give ideas to our readers! In this blog entry I will try to explain how we built Taminoo :).


Concolic execution

Concolic execution is a technic that uses both symbolic and concrete execution to solve a constraint path. For example, when you want to fuzz a target binary, you do want to maximize the code coverage. Imagine the following code:

int foo(int a, char mod){
  if (a == 1234){
    switch (mod){
      case MOD_VULN:
        /* vulnerability here */
        return 0;
        return -1;
  return -1;

The user can control both "a" and "mod" arguments. But if you do dumb-fuzzing on those inputs, you will rarely match the different constraints to trigger the "MOD_VULN" code path. And you will miss a potential exploitable bug.


With the concolic execution, we first run the program with a = 0 and mod = 0. When we hit the line "if (a == 1234 )" we save the constant value "1234" somewhere and we re-run the target with a = 1234 and mod = 0. We also do that with the switch case. At the end of the process, you got a tree full of your constraints. If you want to maximize the code coverage, you have to try to take every code path possible.

If you want to go further in the concolic execution world, you can read the dedicated page on wikipedia.


Our approach

First of all, Valgrind is a Dynamic Binary Instrumentation framework: you can easily build dynamic analysis tools. It also uses a RISC-like intermediate language called VEX. We used Valgrind to taint the memory controled by the user and to track the data flow. The second part is based on Z3, a high-performance theorem prover developed at Microsoft Research. Z3 is used to solve all the constraints from our VEX's output.

We based our tests on this following code:

#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
#include <fcntl.h>

char *serial = "\x30\x39\x3c\x21\x30";

int main(void)
  int fd, i = 0;
  char buf[260] = {0};
  char *r = buf;

  fd = open("serial.txt", O_RDONLY);
  read(fd, r, 256);
  while (i < 5){
    if ((*r ^ 0x55) != *serial)
      return 0;
    r++, serial++, i++;
  if (!*r)
    printf("Good boy\n");
  return 0;

As we said earlier, Taminoo's goal is to find a way to have the "Good boy" message. Taminoo's process is made of two parts:

  1. Taint the user inputs with Valgrind and VEX
  2. Solve the constraints with z3

If we wanted to do a complete PoC we would have tainted all the user input like: the environment variable, the syscalls, etc. But as you can see in the previous code, the user input is data read in a file called serial.txt. Then we just need to taint the memory where the read syscall stores the bytes read.


Catching the system calls is very straightforward to do with Valgrind. Its API allows us to do pre and post operations.

static void pre_syscall(ThreadId tId, UInt syscall_number, UWord* args, UInt nArgs){

static void post_syscall(ThreadId tId, UInt syscall_number, UWord* args, UInt nArgs, 
                         SysRes res){

static void init(void)
  VG_(details_name)            ("Taminoo");
  VG_(details_version)         (NULL);
  VG_(details_description)     ("Taint analysis poc");
  VG_(basic_tool_funcs)        (init, instrument, fini);
  VG_(needs_syscall_wrapper)   (pre_syscall, post_syscall);


Then, to propagate correctly the taints, we instrument each instruction of the binary. If it is a LOAD, PUT or STORE instruction we spread the taints. For example, imagine we have three variables - a, b and c - the variable a is tainted. When b = a and c = b the b and c will also be tainted because they can be controlled via a.

uint32_t a, b, c;

a = atoi(user_input);
b = a; /* b is tainted */
c = b; /* c is tainted */

And in our valgrind module, we have a code like this :

switch (st->tag) {
    case Ist_Store:
                    /* dst_addr */ st->Ist.Store.addr,
                    /* src_tmp  */ INSERT_TMP_NUMBER(st->,
                    /* size     */ INSERT_EXPR_SIZE(st->;
    case Ist_Put:
                    /* dst_reg */ mkIRExpr_HWord(st->Ist.Put.offset),
                    /* src_tmp */ INSERT_TMP_NUMBER(st->,
                    /* size    */ INSERT_EXPR_SIZE(st->;
    case Iex_Get:
                    /* dst_tmp */ mkIRExpr_HWord(dst),
                    /* src_reg */ mkIRExpr_HWord(data->Iex.Get.offset),
                    /* size    */ mkIRExpr_HWord(sizeofIRType(data->Iex.Get.ty)));
    case Iex_Load:
                    /* dst_tmp  */ mkIRExpr_HWord(st->Ist.WrTmp.tmp),
                    /* src_addr */ st->>Iex.Load.addr,
                    /* size     */ INSERT_TYPE_SIZE(data->Iex.Load.ty));

And the respective handler :

/* STORE */
static void helper_store(Addr addr, IRTemp src, Int size)
  struct id *src_id = tainted_tmps[src];
  unsigned i;

  if (multibyte_needs_splitting(size, multibyte_from_addr, addr)) {
    for (i = 0; i < size; ++i) {
      multibyte_split(multibyte_from_addr, addr + i);

  for (i = 0; i < size; ++i) {
    if (src_id == 0)
      fb_remove_addr(&tainted_addr, addr + i);
      fb_add_addr(&tainted_addr, addr + i, src_id, i);

/* LOAD */
static void helper_load(IRTemp dst, Addr data, Int size)
  static struct id *data_id[32];
  struct id_mb *mb;
  int i;

  mb = multibyte_from_addr(data);
  data_id[0] = (mb ? mb->id : 0);
  if (multibyte_needs_splitting(size, multibyte_from_addr, data)) {
    for (i = 0; i < size; ++i) {
      data_id[i] = multibyte_split(multibyte_from_addr, data + i);
    tainted_tmps[dst] = merge_tainted_bytes(data_id, size, read_from_mem, data);
  } else {
    take_id(&tainted_tmps[dst], data_id[0]);

/* PUT */
static void helper_put(Int reg_dst, IRTemp src, Int size)
  struct id *src_id = tainted_tmps[src];
  unsigned i;

  if (multibyte_needs_splitting(size, multibyte_from_reg, reg_dst)) {
    for (i = 0; i < size; ++i) {
      multibyte_split(multibyte_from_reg, reg_dst + i);

  for (i = 0; i < size; ++i) {
    take_id(&tainted_regs[reg_dst + i].id, src_id);
    tainted_regs[reg_dst + i].offset = i;

/* GET */
static void helper_get(IRTemp dst, Int reg_src, Int size)
  static struct id *reg_id[32];
  int i;

  reg_id[0] = tainted_regs[reg_src].id;
  if (multibyte_needs_splitting(size, multibyte_from_reg, reg_src)) {
    for (i = 0; i < size; ++i) {
      reg_id[i] = multibyte_split(multibyte_from_reg, reg_src + i);
    tainted_tmps[dst] = merge_tainted_bytes(reg_id, size, read_from_regs, reg_src);
  } else {
    take_id(&tainted_tmps[dst], reg_id[0]);

Ok, let's try to test our valgrind module - foobar - with the string "test" in the serial.txt file.

# printf "test" > serial.txt
# ./vg-in-place --tool=foobar --log-file=.log --taint-filename=serial.txt ../crackme/crackmeXor2
# cat .log
==14567== -----
==14567== Taminoo, Taint analysis poc
==14567== Using Valgrind-3.8.0 and LibVEX; rerun with -h for copyright info
==14567== Command: ../crackme/crackmeXor2
==14567== Parent PID: 14565
#1:8 = Read(4,0)
#2:8 = Read(4,1)
#3:8 = Read(4,2)
#4:8 = Read(4,3)
#5:32 = 8Uto32(#1:8)
#6:32 = Xor32(#5:32,0x55)
#7:8 = 32to8_0(#6:32)
#8:8 = 32to8_1(#6:32)
#9:8 = 32to8_2(#6:32)
#10:8 = 32to8_3(#6:32)
#11:32 = 8Uto32(#7:8)
#12:8 = 32to8(#11:32)
#13:1 = CmpEQ8(#12:8,0x30) = False
#14:32 = 1Uto32(#13:1)
#15:1 = 32to1(#14:32)
Jump(#15:1) = False
#6 freed
#5 freed
#14 freed
#13 freed
#12 freed
#15 freed
#11 freed
#7 freed

What this output means ? Really straightforward actually! For each action, we assign an ID and they depend on the previous action ID. First, we read and taints the input string in memory. Then we can see the action ID #6: #6:32 = Xor32(#5:32,0x55). That means this action depends on the #5 action, but #5 depends on #1 which is a simple read on the file descriptor 4 at the offset 0. After that, we can see the action #13 compares the action #12 with the constant value 0x30, and it also tells us the comparaison is false.

To be more clear, here is the constraint chain:

Constraint tree

The first constraint can be written like this:


Now, it's time to introduce Z3. To do the concolic execution we have a Python script which runs Valgrind and parses the VEX output. Then, it converts all the different constraints in Z3 patterns. It solves the first equation, and will re-run Valgrind to solve the second equation, and so on until the target displays the "Good boy" message. You can find the dump of all constraints here.

Below, an example for the first equation CmpEQ8(Xor32(Read(4,0),0x55),0x30) translated in z3py:

# cat
from z3 import *

x = BitVec('x', 32)
s = Solver()
s.add((x ^ 0x55) == 0x30)
print s.check()
print s.model()

# ./
[x = 101]

As you can see the concolic execution can be really fun and efficient. We are really sorry to not release the sources, because the code really sucks and it wasn't a serious project. If you want to see a real project using concolic execution, please read the Fuzzgrind paper which inspired us a lot.

However we will give share our sources with Axel "0vercl0k" Souchet who want to plays with it. We hope he will, at least, clean the sources and will release a forked project.

PS: Just for fun, here are some pictures of the Taminoo mascot :) IMG1 IMG2


Edit 2014-01-22: See also this blog post which is a fork of Taminoo project but this time using Pin.