Binary analysis: Concolic execution with Pin and z3

by @Jonathan Salwan - 2013-08-28

 

Edit 2015-09-06: Check out our concolic execution library using python bindings.

 

1 - Introduction

In a previous post, I talked about the concolic execution using Valgrind to the taint analysis and z3 to the constraint path solving. So why another blog post about this technique? Because recently my previous researchs was around Pin and because Pin is supported on Linux, Windows and Mac. I also wanted to see how it's possible to do it without IR - With Valgrind and z3 it was pretty easy because Valgrind provides an IR (VEX). Then, it can be useful for other people or it can be give some new ideas. :-).

 

2 - Concolic execution

We can find two types of analysis, static and dynamic analysis. Both approaches have some advantages and disadvantages. If you use dynamic analysis, we can't cover all the code but you will be more reliable. If you use static analysis, you can cover the code, but you can't get the context information at runtime. The concolic execution is a technic that uses both symbolic and concrete execution to solve a constraint path. The concolic execution is mainly used to cover a code. To list the constraints, the symbolic execution is used. Below, a little example about the symbolic execution:

int foo(int i1, int i2)
{
    int x = i1;
    int y = i2;

    if (x > 80){
        x = y * 2;
        y = 0;
        if (x == 256)
            return True;
    }
    else{
        x = 0;
        y = 0;
    }
    /* ... */
    return False;
}

Based that code, we can see 3 differents paths, for each path we have a specific constraint. The constraints tree look like that:

Constraint Path

 

So, we can say that this code can return False via two differents paths and True via only one path. With the symbolic execution, it's possible to know which constraints are necessary to return False or True.

 

Constraint Path

 

The concolic execution will use the concrete execution to save and solve the constraints at runtime. With this above case, to cover this code, the program will be executed three times and for each execution, one constraint will be solved to take another path. That's what we'll see in the next chapter.

 

3 - Proof of concept on dumb crackme

3.1 - Introduction

We will start to analyze a simple code which contains only three simple conditions. The goal will be to solve this crackme automatically via the concolic execution.

#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
#include <fcntl.h>

int main(void)
{
  int   fd;
  char  buff[260] = {0};

  fd = open("serial.txt", O_RDONLY);
  read(fd, buff, 256);
  close(fd);

  if (buff[0] != 'a') return False;
  if (buff[1] != 'b') return False;
  if (buff[2] != 'c') return False;

  printf("Good boy\n");

  return True;
}

Based on that code, if we represent all paths and constraints, our constraints tree will look like that:

This code contains four possible paths. Each path has its constraints.

      +-----------+----------------------------------------------------+--------------+
      | PC number |                    Constraints                     | return value |
      +-----------+----------------------------------------------------+--------------+
      |     1     | buff[0] != 'a'                                     | return False | 
      |     2     | buff[0] == 'a' && buff[1] != 'b'                   | return False |
      |     3     | buff[0] == 'a' && buff[1] == 'b' && buff[2] != 'c' | return False |
      |     4     | buff[0] == 'a' && buff[1] == 'b' && buff[2] == 'c' | return True  |
      +-----------+----------------------------------------------------+--------------+

Now that we have listed all constraints which are possible, we can cover the code. For that, we need to run the program with the first constraint, then we re-run the program with the second constraint and repeat this operation until the last constraint is executed. This operation is called the concolic execution. Below you can see a diagram representing this execution.

Crackme1 constraints concolic

As you can see above, we can cover all the code with only four executions. Now, we will see how it's possible to implement it with Pin. For that we need to:

  • Taint the serial.txt buffer.
  • Follow our data (Spread the taint).
  • Save the first constraint.
  • Solve this constraint.
  • Re-run the binary with the first constraint solved.
  • And repeat this operation for each constraint (each path)...

In this blog post, we will not talk about the taint analysis, for that, you can read my previous post. Then, to solve the constraints we will use the theorem prover Z3 and its C++ API.

 

3.2 - Compile a Pin tool with Z3 C++ API

We will use the Z3 C++ API inside the Pin tool. So, you need to install the Z3 library and add the headers/lib in the compile line. In my case, I downloaded the z3.zip in my pintool directory and I compiled the library. Then, to compile my Pin tool, I created a shell script which compiles with the Z3 headers/lib. This script looks like that:

$ pwd
/home/jonathan/Works/Tools/pin-2.12-58423-gcc.4.4.7-linux/source/tools/ConcolicExecution

$ cat compile.sh
g++ -DBIGARRAY_MULTIPLIER=1 -DUSING_XED -Wall -Werror -Wno-unknown-pragmas -fno-stack-protector -DTARGET_IA32E -DHOST_IA32E -fPIC -DTARGET_LINUX  -I../../../source/include/pin -I../../../source/include/pin/gen -I../../../extras/components/include -I./z3/src/api/c++ -I../../../extras/xed2-intel64/include -I../../../source/tools/InstLib -O3 -fomit-frame-pointer -fno-strict-aliasing    -c -o obj-intel64/ConcolicExecution.o ConcolicExecution.cpp

g++ -shared -Wl,--hash-style=sysv -Wl,-Bsymbolic -Wl,--version-script=../../../source/include/pin/pintool.ver    -o obj-intel64/ConcolicExecution.so obj-intel64/ConcolicExecution.o  -L../../../intel64/lib -L../../../intel64/lib-ext -L../../../intel64/runtime/glibc -L../../../extras/xed2-intel64/lib -lpin -lxed -ldwarf -lelf -ldl -lz3
$

 

3.3 - Save and solve the constraints

If we look the ASM representation of our C code. We can see that this code loads in the "eax" register our character ("rbp-0x110" points on our serial buffer). Then, it compares the smaller size register "al" with a constant and it jumps to two different space if it's true or false.

.text:400683:  movzx  eax,BYTE PTR [rbp-0x110]
.text:40068a:  cmp    al,0x61
.text:40068c:  je     400695 <main+0x81>
.text:40068e:  mov    eax,0x1
.text:400693:  jmp    4006c8 <main+0xb4>

.text:400695:  movzx  eax,BYTE PTR [rbp-0x10f]
.text:40069c:  cmp    al,0x62
.text:40069e:  je     4006a7 <main+0x93>
.text:4006a0:  mov    eax,0x1
.text:4006a5:  jmp    4006c8 <main+0xb4>

.text:4006a7:  movzx  eax,BYTE PTR [rbp-0x10e]
.text:4006ae:  cmp    al,0x63
.text:4006b0:  je     4006b9 <main+0xa5>
.text:4006b2:  mov    eax,0x1
.text:4006b7:  jmp    4006c8 <main+0xb4>

.text:4006b9:  mov    edi,0x4007c7
.text:4006be:  call   4004e0 <puts@plt>

.text:4006c3:  mov    eax,0x0
.text:4006c8:  leave  
.text:4006c9:  ret

This code is pretty simple and if we taint the datas used in our serial.txt, we have something like that:

$ printf "xxx" > serial.txt
$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff194a6600 to 0x7fff194a6700 (via read)
[READ in 7fff194a6600]  400683: movzx eax, byte ptr [rbp-0x110]
                        eax is now tainted
[FOLLOW]                40068a: cmp al, 0x61
[SPREAD]                40068e: mov eax, 0x1
                        output: eax | input: constant
                        eax is now freed

Now, the real question is: Where does start and stop the equation? I think it's real good/complicated question and I am currently still working on that! However, in your case, we will start an equation when a byte controllable by the user is LOAD and we will stop it when the "cmp" instruction occurs. We will also assign an unique ID for each constraint.

$ printf "xxx" > serial.txt
$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff194a6600 to 0x7fff194a6700 (via read)
[READ in 7fff194a6600]  400683: movzx eax, byte ptr [rbp-0x110]
[Constraint]            #0 = 0x78
                        eax is now tainted
[FOLLOW]                40068a: cmp al, 0x61
[Equation]              cmp(#0, 0x61)
[Equation]              cmp(x, 0x61)
[SPREAD]                40068e: mov eax, 0x1
                        output: eax | input: constant
                        eax is now freed

As you can see above, we assign the first constraint with the unique ID #0. This constraint was the first, so we tag it to remember that's possible to control it via the user input. Then, when the "cmp" occurs, we display the full equation.

To maintain a link between a register and a constraint number, a table is updated. When a constraint is assigned, it's also assigned to a register.

Reg constraint table

That means eax = #0 = 0x78 - 0x78 is our first character in our serial. Then cmp(al, 0x61) = cmp(#0, 0x61) because eax = #0 and cmp(#0, 0x61) = cmp(x, 0x61) because #0 is the first constraint of our equation.

Now to solve this equation we just use Z3.

$ printf "xxx" > serial.txt
$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff194a6600 to 0x7fff194a6700 (via read)
[READ in 7fff194a6600]  400683: movzx eax, byte ptr [rbp-0x110]
[Constraint]            #0 = 0x78
                        eax is now tainted
[FOLLOW]                40068a: cmp al, 0x61
[Equation]              cmp(#0, 0x61)
[Equation]              cmp(x, 0x61)
[Z3 Solver]-------------------------------------
(solver
  (= x #x00000061))
(define-fun x () (_ BitVec 32)
  #x00000061)
The good value is 0x61
[Z3 Solver]-------------------------------------
[SPREAD]                40068e: mov eax, 0x1
                        output: eax | input: constant
                        eax is now freed

Z3 tries to solve this equation (= x #x00000061)) and finds that the result is 0x61. At this time, the Pin tool writes the good character (0x61) in our serial.txt.

 

3.4 - Demo on the first crackme

To solve this crackme and to generate the good serial.txt, we need to run three times this Pin tool. For each execution, one character is found and written in the serial file.

$ printf "xxx" > serial.txt
$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff065b1ab0 to 0x7fff065b1bb0 (via read)
[READ in 7fff065b1ab0]  400683: movzx eax, byte ptr [rbp-0x110]
[Constraint]            #0 = 0x78
                        eax is now tainted
[FOLLOW]                40068a: cmp al, 0x61
[Equation]              cmp(#0, 0x61)
[Equation]              cmp(x, 0x61)
[Z3 Solver]-------------------------------------
(solver
  (= x #x00000061))
(define-fun x () (_ BitVec 32)
  #x00000061)
The good value is 0x61
[Z3 Solver]-------------------------------------
[SPREAD]                40068e: mov eax, 0x1
                        output: eax | input: constant
                        eax is now freed

$ cat serial.txt
a%

$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff0c1677a0 to 0x7fff0c1678a0 (via read)
[READ in 7fff0c1677a0]  400683: movzx eax, byte ptr [rbp-0x110]
[Constraint]            #0 = 0x61
                        eax is now tainted
[FOLLOW]                40068a: cmp al, 0x61
[Equation]              cmp(#0, 0x61)
[Equation]              cmp(x, 0x61)
[Z3 Solver]-------------------------------------
(solver
  (= x #x00000061))
(define-fun x () (_ BitVec 32)
  #x00000061)
The good value is 0x61
[Z3 Solver]-------------------------------------
[READ in 7fff0c1677a1]  400695: movzx eax, byte ptr [rbp-0x10f]
[Constraint]            #1 = 0x00
                        eax is already tainted
[FOLLOW]                40069c: cmp al, 0x62
[Equation]              cmp(#1, 0x62)
[Equation]              cmp(cmp(x, 0x61), 0x62)
[Z3 Solver]-------------------------------------
(solver
  (= x #x00000062))
(define-fun x () (_ BitVec 32)
  #x00000062)
The good value is 0x62
[Z3 Solver]-------------------------------------
[SPREAD]                4006a0: mov eax, 0x1
                        output: eax | input: constant
                        eax is now freed

$ cat serial.txt
ab%

$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff4acd2e60 to 0x7fff4acd2f60 (via read)
[READ in 7fff4acd2e60]  400683: movzx eax, byte ptr [rbp-0x110]
[Constraint]            #0 = 0x61
                        eax is now tainted
[FOLLOW]                40068a: cmp al, 0x61
[Equation]              cmp(#0, 0x61)
[Equation]              cmp(x, 0x61)
[Z3 Solver]-------------------------------------
(solver
  (= x #x00000061))
(define-fun x () (_ BitVec 32)
  #x00000061)
The good value is 0x61
[Z3 Solver]-------------------------------------
[READ in 7fff4acd2e61]  400695: movzx eax, byte ptr [rbp-0x10f]
[Constraint]            #1 = 0x62
                        eax is already tainted
[FOLLOW]                40069c: cmp al, 0x62
[Equation]              cmp(#1, 0x62)
[Equation]              cmp(cmp(x, 0x61), 0x62)
[Z3 Solver]-------------------------------------
(solver
  (= x #x00000062))
(define-fun x () (_ BitVec 32)
  #x00000062)
The good value is 0x62
[Z3 Solver]-------------------------------------
[READ in 7fff4acd2e62]  4006a7: movzx eax, byte ptr [rbp-0x10e]
[Constraint]            #2 = 0x00
                        eax is already tainted
[FOLLOW]                4006ae: cmp al, 0x63
[Equation]              cmp(#2, 0x63)
[Equation]              cmp(cmp(cmp(x, 0x61), 0x62), 0x63)
[Z3 Solver]-------------------------------------
(solver
  (= x #x00000063))
(define-fun x () (_ BitVec 32)
  #x00000063)
The good value is 0x63
[Z3 Solver]-------------------------------------
[SPREAD]                4006b2: mov eax, 0x1
                        output: eax | input: constant
                        eax is now freed

$ cat serial.txt
abc%

$ ./crackme1
Good boy

 

3.5 - Another crackme using XOR-based algorithm

To complicate things a bit, let's use this following dumb crackme using an XOR-based algorithm.

#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
#include <fcntl.h>

char *serial = "\x30\x39\x3c\x21\x30";

int main(void)
{
  int fd, i = 0;
  char buf[260] = {0};
  char *r = buf;

  fd = open("serial.txt", O_RDONLY);
  read(fd, r, 256);
  close(fd);
  while (i < 5){
    if ((*r ^ (0x55)) != *serial)
      return 0;
    r++, serial++, i++;
  }
  if (!*r)
    printf("Good boy\n");
  return 0;
}

This code reads and applies an XOR with the constant key "0x55" on each character in the serial file. Then, it checks the result with a constant string serial. This code is intersting to study the execution concolic, because we have a simple algorithm. On the following CFG, the blue blox is our algorithm.

CFG

Now. let's see what happens when we taint and follow our datas from the serial file.

$ printf "xxx" > ./serial.txt
$ ../../../pin -t obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme2
[TAINT]                 bytes tainted from 0x7fff3cab6cc0 to 0x7fff3cab6dc0 (via read)
[READ in 7fff3cab6cc0]  400698: movzx eax, byte ptr [rax]
                        eax is now tainted
[SPREAD]                40069b: mov edx, eax
                        output: edx | input: eax
                        edx is now tainted
[FOLLOW]                40069b: mov edx, eax
[FOLLOW]                4195997: xor edx, 0x55
[READ in 4007ec]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[SPREAD]                7feb884e67db: mov edx, 0x1
                        output: edx | input: constant
                        edx is now freed

Same for the first example, we need to assign a unique constraint for each spread. Then, when the cmp instruction occurs, we need to solve the equation via Z3.

$ ../../../pin -t obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme2
[TAINT]                 bytes tainted from 0x7fff3cab6cc0 to 0x7fff3cab6dc0 (via read)
[READ in 7fff3cab6cc0]  400698: movzx eax, byte ptr [rax]
[Constraint]            #0 = 0x61
                        eax is now tainted
[SPREAD]                40069b: mov edx, eax
                        output: edx | input: eax
                        edx is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #1 = #0
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #2 = xor(#1, 0x55)
[READ in 4007ec]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#2, 0x30)
[Equation]              cmp(xor(x, 0x55), 0x30)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000030))
(define-fun x () (_ BitVec 32)
  #x00000065)
The good value is 0x65
[Z3 Solver]-------------------------------------
[SPREAD]                7feb884e67db: mov edx, 0x1
                        output: edx | input: constant
                        edx is now freed

$ cat serial.txt
e%
$

As you can see above, my constraint on the xor instruction looks like that: xor(#1, 0x55), it means we need to display/follow all ALU operations on a specific convention. Like:

add(a, b)
sub(a, b)
mul(a, b)
div(a, b)
xor(a, b)
...

This is a real problem with Pin. Because it doesn't provide an IR, we need to implement all operations. For example, with the xor instruction, we need to catch these following encoding:

xor reg, reg         
xor mem, reg         
xor reg, mem         
xor reg, immed       
xor mem, immed       
xor accum, immed

Then, when we need to build our equation like cmp(#2, 0x30), we need to replace the constraint number by its content - And for that we will use the constraints table.

equation mutation

After the first constraint solved, we set the first character in the serial file and we re-run the Pin tool to solve the second constraint. We repeat this operation until all constraints are solved. The following diagram represent our executions. As you can see, for each execution, only one constraint is solved.

 

concolic execution

 

And the full result which generates a valide file key is pasted below. As you can see, for each execution, one character is found until the valide key.

$ printf "xxx" > ./serial.txt
$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff2d60e7d0 to 0x7fff2d60e8d0 (via read)
[READ in 7fff2d60e7d0]  400698: movzx eax, byte ptr [rax]
[Constraint]            #0 = 0x41
                        eax is now tainted
[SPREAD]                40069b: mov edx, eax
                        output: edx | input: eax
                        edx is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #1 = #0
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #2 = xor(#1, 0x55)
[READ in 4007ec]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#2, 0x30)
[Equation]              cmp(xor(x, 0x55), 0x30)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000030))
(define-fun x () (_ BitVec 32)
  #x00000065)
The good value is 0x65
[Z3 Solver]-------------------------------------
[SPREAD]                7ff3541837db: mov edx, 0x1
                        output: edx | input: constant
                        edx is now freed

$ cat serial.txt
e%

$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff6d1f8730 to 0x7fff6d1f8830 (via read)
[READ in 7fff6d1f8730]  400698: movzx eax, byte ptr [rax]
[Constraint]            #0 = 0x65
                        eax is now tainted
[SPREAD]                40069b: mov edx, eax
                        output: edx | input: eax
                        edx is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #1 = #0
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #2 = xor(#1, 0x55)
[READ in 4007ec]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#2, 0x30)
[Equation]              cmp(xor(x, 0x55), 0x30)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000030))
(define-fun x () (_ BitVec 32)
  #x00000065)
The good value is 0x65
[Z3 Solver]-------------------------------------
[READ in 7fff6d1f8731]  400698: movzx eax, byte ptr [rax]
[Constraint]            #3 = 0x00
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #4 = #3
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #5 = xor(#4, 0x55)
[READ in 4007ed]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#5, 0x39)
[Equation]              cmp(xor(x, 0x55), 0x39)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000039))
(define-fun x () (_ BitVec 32)
  #x0000006c)
The good value is 0x6c
[Z3 Solver]-------------------------------------
[SPREAD]                7fe0b6aa47db: mov edx, 0x1
                        output: edx | input: constant
                        edx is now freed

$ cat serial.txt
el%

$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff2d1e1e00 to 0x7fff2d1e1f00 (via read)
[READ in 7fff2d1e1e00]  400698: movzx eax, byte ptr [rax]
[Constraint]            #0 = 0x65
                        eax is now tainted
[SPREAD]                40069b: mov edx, eax
                        output: edx | input: eax
                        edx is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #1 = #0
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #2 = xor(#1, 0x55)
[READ in 4007ec]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#2, 0x30)
[Equation]              cmp(xor(x, 0x55), 0x30)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000030))
(define-fun x () (_ BitVec 32)
  #x00000065)
The good value is 0x65
[Z3 Solver]-------------------------------------
[READ in 7fff2d1e1e01]  400698: movzx eax, byte ptr [rax]
[Constraint]            #3 = 0x6c
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #4 = #3
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #5 = xor(#4, 0x55)
[READ in 4007ed]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#5, 0x39)
[Equation]              cmp(xor(x, 0x55), 0x39)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000039))
(define-fun x () (_ BitVec 32)
  #x0000006c)
The good value is 0x6c
[Z3 Solver]-------------------------------------
[READ in 7fff2d1e1e02]  400698: movzx eax, byte ptr [rax]
[Constraint]            #6 = 0x00
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #7 = #6
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #8 = xor(#7, 0x55)
[READ in 4007ee]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#8, 0x3c)
[Equation]              cmp(xor(x, 0x55), 0x3c)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x0000003c))
(define-fun x () (_ BitVec 32)
  #x00000069)
The good value is 0x69
[Z3 Solver]-------------------------------------
[SPREAD]            7f7e919ef7db: mov edx, 0x1
                    output: edx | input: constant
                    edx is now freed

$ cat serial.txt
eli%

$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff597b37a0 to 0x7fff597b38a0 (via read)
[READ in 7fff597b37a0]  400698: movzx eax, byte ptr [rax]
[Constraint]            #0 = 0x65
                        eax is now tainted
[SPREAD]                40069b: mov edx, eax
                        output: edx | input: eax
                        edx is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #1 = #0
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #2 = xor(#1, 0x55)
[READ in 4007ec]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#2, 0x30)
[Equation]              cmp(xor(x, 0x55), 0x30)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000030))
(define-fun x () (_ BitVec 32)
  #x00000065)
The good value is 0x65
[Z3 Solver]-------------------------------------
[READ in 7fff597b37a1]  400698: movzx eax, byte ptr [rax]
[Constraint]            #3 = 0x6c
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #4 = #3
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #5 = xor(#4, 0x55)
[READ in 4007ed]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#5, 0x39)
[Equation]              cmp(xor(x, 0x55), 0x39)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000039))
(define-fun x () (_ BitVec 32)
  #x0000006c)
The good value is 0x6c
[Z3 Solver]-------------------------------------
[READ in 7fff597b37a2]  400698: movzx eax, byte ptr [rax]
[Constraint]            #6 = 0x69
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #7 = #6
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #8 = xor(#7, 0x55)
[READ in 4007ee]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#8, 0x3c)
[Equation]              cmp(xor(x, 0x55), 0x3c)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x0000003c))
(define-fun x () (_ BitVec 32)
  #x00000069)
The good value is 0x69
[Z3 Solver]-------------------------------------
[READ in 7fff597b37a3]  400698: movzx eax, byte ptr [rax]
[Constraint]            #9 = 0x00
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #10 = #9
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #11 = xor(#10, 0x55)
[READ in 4007ef]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#11, 0x21)
[Equation]              cmp(xor(x, 0x55), 0x21)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000021))
(define-fun x () (_ BitVec 32)
  #x00000074)
The good value is 0x74
[Z3 Solver]-------------------------------------
[SPREAD]                7f9ac23db7db: mov edx, 0x1
                        output: edx | input: constant
                        edx is now freed

$ cat serial.txt
elit%

$ ../../../pin -t ./obj-intel64/ConcolicExecution.so -taint-file serial.txt -- ./crackme1
[TAINT]                 bytes tainted from 0x7fff313be550 to 0x7fff313be650 (via read)
[READ in 7fff313be550]  400698: movzx eax, byte ptr [rax]
[Constraint]            #0 = 0x65
                        eax is now tainted
[SPREAD]                40069b: mov edx, eax
                        output: edx | input: eax
                        edx is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #1 = #0
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #2 = xor(#1, 0x55)
[READ in 4007ec]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#2, 0x30)
[Equation]              cmp(xor(x, 0x55), 0x30)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000030))
(define-fun x () (_ BitVec 32)
  #x00000065)
The good value is 0x65
[Z3 Solver]-------------------------------------
[READ in 7fff313be551]  400698: movzx eax, byte ptr [rax]
[Constraint]            #3 = 0x6c
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #4 = #3
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #5 = xor(#4, 0x55)
[READ in 4007ed]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#5, 0x39)
[Equation]              cmp(xor(x, 0x55), 0x39)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000039))
(define-fun x () (_ BitVec 32)
  #x0000006c)
The good value is 0x6c
[Z3 Solver]-------------------------------------
[READ in 7fff313be552]  400698: movzx eax, byte ptr [rax]
[Constraint]            #6 = 0x69
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #7 = #6
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #8 = xor(#7, 0x55)
[READ in 4007ee]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#8, 0x3c)
[Equation]              cmp(xor(x, 0x55), 0x3c)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x0000003c))
(define-fun x () (_ BitVec 32)
  #x00000069)
The good value is 0x69
[Z3 Solver]-------------------------------------
[READ in 7fff313be553]  400698: movzx eax, byte ptr [rax]
[Constraint]            #9 = 0x74
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #10 = #9
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #11 = xor(#10, 0x55)
[READ in 4007ef]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#11, 0x21)
[Equation]              cmp(xor(x, 0x55), 0x21)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000021))
(define-fun x () (_ BitVec 32)
  #x00000074)
The good value is 0x74
[Z3 Solver]-------------------------------------
[READ in 7fff313be554]  400698: movzx eax, byte ptr [rax]
[Constraint]            #12 = 0x00
                        eax is now tainted
[FOLLOW]                40069b: mov edx, eax
[Constraint]            #13 = #12
[FOLLOW]                4195997: xor edx, 0x55
[Constraint]            #14 = xor(#13, 0x55)
[READ in 4007f0]        4006a7: movzx eax, byte ptr [rax]
                        eax is now freed
[FOLLOW]                4006aa: cmp dl, al
[Equation]              cmp(#14, 0x30)
[Equation]              cmp(xor(x, 0x55), 0x30)
[Z3 Solver]-------------------------------------
(solver
  (= (bvxor x #x00000055) #x00000030))
(define-fun x () (_ BitVec 32)
  #x00000065)
The good value is 0x65
[Z3 Solver]-------------------------------------
[SPREAD]                7f0d00e1f7db: mov edx, 0x1
                        output: edx | input: constant
                        edx is now freed

$ cat serial.txt
elite%

$ ./crackme1
Good boy

 

4 - Conclusion

I think that the concolic execution is a great technique and it need to be investigated and improved. I hope that more and more people will look into it. Also, I think it isn't a good idea to do a concolic execution with a DBI (Dynamic Binary Instrumentation) without intermediate language like Pin. Why? Because without IR, you need to implement all instructions set and their different encodings. This is possible but that's really boring and you can forget an operation... To the theorem solver conclusion, I'm not a Z3 expert, I do know it's used internally by Microsoft for a lot of purpose (I guess they got pretty big equations), but I have only used it with toy-equations, so I can't really say.

 

4.1 - My Pin tool

First of all, my Pin tool is not reliable and it works only with the above examples... I only implemented the instruction necessary for my examples (mov, cmp, xor). So, if you want use it, you need to implement all the x86 instructions... This Pin tool is just a PoC but it can give you a base to your project. The sources are here.

 

4.2 - References

 

4.3 - Special thanks

I would like to thanks Axel "0vercl0k" Souchet for his skills in Z3 and proofreading.