A binary analysis, count me if you can

by @Jonathan Salwan - 2013-05-02

Recently, I analysed a binary using the number of instructions executed to infer the internal binary conception. This will be the subject of this short blog post.

To begin, let's take for example a basic crackme program that needs a password on stdin. Usually, it relies on a simple string-compare function that returns as soon as the function sees a difference between your two strings. Here is what we have in x64 assembler:

Typically we have :

mov   rcx, 16h
mov   rsi, rdx
mov   rdi, rax
repe  cmpsb

Or we can have also :

cmp   ax, dx
jz    .1b

In those previous examples, when the first difference occurs, the loop breaks. It means as long as the comparison is correct, the number of instruction executed increases. Therefore, it is possible to guess the good password via the number of instruction executed.

This time let's take a "real" example with my VMNDH (NDH2k13 CTF - Crackme 500). This crackme implements a 16 bits virtual machine and some anti-debugging technics. The purpose of the challenge was to find the password giving you the "good-boy" message.

vm_ndh

To make the debugging and the analysis harder the ELF header has been corrupted. If you are interested in knowing how you can read my old post "Linux process execution and the useless ELF header fields"

# nm ./crackme
nm: ./crackme: File format not recognized

# objdump -d ./crackme
objdump: ./crackme: File format not recognized

# gdb ./crackme
GNU gdb (Gentoo 7.5.1 p2) 7.5.1
"./crackme": not in executable format: File format not recognized
gdb-peda$ r
Starting program:  
No executable file specified.
gdb-peda$ quit

# ./crackme
Password: test
Bad password

As I said, the virtual machine was written in C, and of course the interesting code was written our custom 16 bits language. This language is highly inspirated from two well known architectures: x86 and ARM. The source was

; Crackme 500 - NDH 2k13

#include inc/stdlib.inc

; strlen 
.label  strlen
xor     r1, r1      
xor     r2, r2

; printf 
.label  printf
mov     r7, r0   
call    :strlen
mov     r2, r7  
mov     r3, r0 
movb    r1, #1 
movb    r0, #4  
syscall
ret

; bad password 
.label  badpassword
movl    r0, :badpwd
call    :printf
ret

; good password 
.label  goodpassword
movl    r0, :goodpwd
call    :printf
ret

; checkserial 
.label  checkserial
mov     r7, r0     
movl    r6, :key  
call    :strlen
cmpb    r0, #9    
jz      :sok0
call    :badpassword
end

.label  sok0
mov     r0, [r7]  
mov     r1, [r6]  
xor     r0, r1
cmpb    r0, #x53
jz      :sok1
call    :badpassword
end

.label  sok1
inc     r7
inc     r6
mov     r0, [r7] 
mov     r1, [r6]  
xor     r0, r1
cmpb    r0, #x5b
jz      :sok2
call    :badpassword
end

.label  sok2
inc     r7
inc     r6
mov     r0, [r7]    
mov     r1, [r6]   
xor     r0, r1
cmpb    r0, #x4b
jz      :sok3
call    :badpassword
end

.label  sok3
inc     r7
inc     r6
mov     r0, [r7]  
mov     r1, [r6] 
xor     r0, r1
cmpb    r0, #x29
jz      :sok4
call    :badpassword
end

.label  sok4
inc     r7
inc     r6
mov     r0, [r7]   
mov     r1, [r6]  
xor     r0, r1
cmpb    r0, #x52
jz      :sok5
call    :badpassword
end

.label  sok5
inc     r7
inc     r6
mov     r0, [r7]   
mov     r1, [r6]  
xor     r0, r1
cmpb    r0, #x76
jz      :sok6
call    :badpassword
end

.label  sok6
inc     r7
inc     r6
mov     r0, [r7]   
mov     r1, [r6]  
xor     r0, r1
cmpb    r0, #x5a
jz      :sok7
call    :badpassword
end

.label  sok7
inc     r7
inc     r6
mov     r0, [r7]   
mov     r1, [r6]  
xor     r0, r1
cmpb    r0, #x49
jz      :endok
call    :badpassword
end

.label  endok
xor     r0, r0
ret


; main 
.label  main
movl    r0, :str1
call    :printf
subb    sp, #32     
mov     r2, sp      
movb    r1, #0      
movb    r3, #31     
movb    r0, #3      
syscall
mov     r0, r2      
call    :checkserial
call    :goodpassword
end


.label key
.db 0x12, 0x21, 0x02, 0x19, 0x25, 0x34, 0x29, 0x11

.label str1
.db "Password: ",0x00

.label goodpwd
.db "Good password",0x0a,0x00

.label badpwd
.db "Bad password",0x0a,0x00

The crackme waits for a password on stdin and compares all characters with constante values. This source was compiled to the NDH architecture via our personal compiler. The goal of the challenge was to reverse the virtual machine and to understand how it works to find the secret password. You can find a great write-up written by @w4kfu on his blog. Now we will see how it was possible to solve this challenge in less than 10 minutes with that instruction counting black-magic!

To count the number of instruction executed, I used Pin developped by Intel. Pin is a dynamic binary instrumentation framework for the IA-32 and x86-64 instruction-set architectures. In the Pin samples, we can find a 'pintool' counting the instructions (./source/tools/ManualExamples/inscount0.cpp).

// The running count of instructions is kept here
// make it static to help the compiler optimize docount
static UINT64 icount = 0;

// This function is called before every instruction is executed
VOID docount() { icount++; }

// Pin calls this function every time a new instruction is encountered
VOID Instruction(INS ins, VOID *v)
{
    // Insert a call to docount before every instruction, no arguments are passed
    INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);
}

To know the length of the password, we run Pin with the inscount0 pintool. See what happens:

# ../pin -t ./inscount0.so -- ./crackme <<< 'a' >> /dev/null ; cat inscount.out
Count 160524
# ../pin -t ./inscount0.so -- ./crackme <<< 'aa' >> /dev/null ; cat inscount.out
Count 163320
# ../pin -t ./inscount0.so -- ./crackme <<< 'aaa' >> /dev/null ; cat inscount.out
Count 166116
# ../pin -t ./inscount0.so -- ./crackme <<< 'aaaa' >> /dev/null ; cat inscount.out
Count 168912
# ../pin -t ./inscount0.so -- ./crackme <<< 'aaaaa' >> /dev/null ; cat inscount.out
Count 171708
# ../pin -t ./inscount0.so -- ./crackme <<< 'aaaaaa' >> /dev/null ; cat inscount.out
Count 174504
# ../pin -t ./inscount0.so -- ./crackme <<< 'aaaaaaa' >> /dev/null ; cat inscount.out
Count 177300
# ../pin -t ./inscount0.so -- ./crackme <<< 'aaaaaaaa' >> /dev/null ; cat inscount.out
Count 183021

At each step, we can see the difference between the instruction counters is 2796 ; excepted with a 8 characters long password.

>>> 163320 - 160524
2796
>>> 166116 - 163320
2796
>>> 168912 - 166116
2796
>>> 171708 - 168912
2796
>>> 174504 - 171708
2796
>>> 177300 - 174504
2796
>>> 183021 - 177300
5721

We can say that the length of the password is 8 characters. Now let's see what happens if we find the first character of the password:

# ../pin -t ./inscount0.so -- ./crackme <<< 'a_______' >> /dev/null ; cat inscount.out
Count 183021
# ../pin -t ./inscount0.so -- ./crackme <<< 'b_______' >> /dev/null ; cat inscount.out
Count 183021
# ../pin -t ./inscount0.so -- ./crackme <<< 'c_______' >> /dev/null ; cat inscount.out
Count 183021
[...]
# ../pin -t ./inscount0.so -- ./crackme <<< '>_______' >> /dev/null ; cat inscount.out
Count 183020
# ../pin -t ./inscount0.so -- ./crackme <<< '@_______' >> /dev/null ; cat inscount.out
Count 183020
# ../pin -t ./inscount0.so -- ./crackme <<< 'A_______' >> /dev/null ; cat inscount.out
Count 187059
# ../pin -t ./inscount0.so -- ./crackme <<< 'B_______' >> /dev/null ; cat inscount.out
Count 183020
# ../pin -t ./inscount0.so -- ./crackme <<< 'C_______' >> /dev/null ; cat inscount.out
Count 183020

We can see for each step, the number of instructions executed is ~183021, excepted when the good characters is found. See the A character, at this step, we can see 187059 instructions executed, this is 4038 instructions more than another step. So, We can imagine that the first good character is A. Now let's go with the second character.

[...]
# ../pin -t ./inscount0.so -- ./crackme <<< 'Ay______' >> /dev/null ; cat inscount.out
Count 187058
# ../pin -t ./inscount0.so -- ./crackme <<< 'Az______' >> /dev/null ; cat inscount.out
Count 191097
# ../pin -t ./inscount0.so -- ./crackme <<< 'A0______' >> /dev/null ; cat inscount.out
Count 187058
[...]

It's the same thing, when the second character is z, we have 4039 instructions executed more than another step. Ok, now we can say that we match the good character, we have ~4030 instructions executed more than the wrong character.

Now, it's easy to find the password - we do simple bruteforce. Here the number of combinations is (26*2+10) * 8 = 496 instead of (26*2+10) ** 8 = 218 340 105 584 896.

import sys
import commands

if __name__ == "__main__":
    pwd  = "________"
    base = 0x2e
    off  = 0x00
    sav  = 0x00
    while pwd.find("Good Password") == -1:
        pwd = pwd[:off] + chr(base) + pwd[off+1:];
        cmd = "./pin -t ./inscount0.so -- ./crackme <<< %s > /dev/null; cat inscount.out" %(pwd)
        res = int(commands.getstatusoutput(cmd)[1].split("Count")[1])
        print "insert('%s') = %d ins" %(pwd, res)
        if sav == 0x00:
            sav = res
        if res - sav > 200:
            off += 1
            if off >= len(pwd):
                break
            base = 0x2d
            sav = 0
        base += 1
        sav = res
    print "The password is %s" %(pwd)
    sys.exit(0)

Let's run this script and wait some minutes.

# ./ins_counts_bf.py
insert('._______') = 182899 ins
insert('/_______') = 182899 ins
insert('0_______') = 182899 ins
insert('1_______') = 182899 ins
insert('2_______') = 182899 ins
insert('3_______') = 182899 ins
insert('?_______') = 182899 ins
[...]
insert('@_______') = 182899 ins
insert('A_______') = 186938 ins
insert('A.______') = 186937 ins
insert('A/______') = 186937 ins
insert('Az._____') = 190975 ins
[...]
insert('Az/_____') = 190975 ins
insert('Az0_____') = 190975 ins
insert('Az1_____') = 190975 ins
insert('AzH_____') = 190975 ins
[...]
insert('AzI_____') = 195014 ins
insert('AzI.____') = 195014 ins
insert('AzI/____') = 195014 ins
insert('AzI0____') = 199052 ins
insert('AzI0.___') = 199051 ins
insert('AzI0/___') = 199051 ins
insert('AzI0w___') = 203089 ins
[...]
insert('AzI0w.__') = 203089 ins
insert('AzI0w/__') = 203089 ins
insert('AzI0w0__') = 203089 ins
insert('AzI0wA__') = 203089 ins
[...]
insert('AzI0wB__') = 207128 ins
insert('AzI0wB._') = 207127 ins
insert('AzI0wB/_') = 207127 ins
insert('AzI0wBs_') = 211166 ins
[...]
insert('AzI0wBs.') = 211165 ins
insert('AzI0wBs/') = 211165 ins
insert('AzI0wBs0') = 211165 ins
insert('AzI0wBsW') = 211165 ins
[...]
insert('AzI0wBsX') = 215072 ins
The password is AzI0wBsX

# ./crackme
Password: AzI0wBsX
Good password

In conclusion, we saw It was possible to solve this challenge with a black-box approach just by analyzing the execution flow. The password is found with only 336 tests (executed in less than 5 minutes). The full dump is here. But you may ask you why 5 minutes ? Because Pin instruments all the instructions of the program, that's why the bruteforce takes some time. Of course this technic doesn't always work for example, if the input_user is hashed or if the string compare function does not break at the first wrong character.