Solver Interval Narrowing

2018/10/21

After solving a couple challenges for Hitcon CTF 2018 this weekend, I felt it was time to return to Raptor, a higher-order IR and analysis framework over Falcon.

I am currently implementing, or attempting to implement, a strided intervals analysis for Raptor. I will need this to resolve jump tables, and provide a minimally-accurate disassembly for following analyses.

Motivating Example

Let’s start with a motivating example.

I have a the following simple program in C:

#include <stdio.h>

int main() {

    int c = getchar();

    switch (c) {
    case 0: puts("0\n"); break;
    case 1: puts("1\n"); break;
    case 2: puts("2\n"); break;
    case 3: puts("3\n"); break;
    case 4: puts("4\n"); break;
    case 5: puts("5\n"); break;
    default: puts("Unknown number\n"); break;
    }

    return 0;
}

When this function is analyzed and optimized by raptor, we get the following graph:

Graph of program

Comments are the, “Strided Intervals,” for rax. You’ll notice the strides are wrong, this part is still in progress. However, you will notice at address 0x62e we have the following:

02 0x062e temp_0.0:32 = [(0x878:64 + (rax:64 * 0x4:64))] // rax=0(0x0:64,0x5:64)

Great, we’ve successfully bound rax to valid entries for the jump table. We could use this information to resolve the entries for this jump table, and continue to successfully disassemble this instruction.

Oh, but how I did it

To accomplish this feat, I used z3’s optimization functionality to find the minimum and maximum values for variables constrained by edge conditions, and used this information to narrow variables as needed. I have added this functionality against Falcon IL in falcon-z3, and it has been ported to Raptor.

I am sure the logic I am using now is unsound (though I can add soundness checks later). The logic is, generally:

when an edge is encountered, and the edge is conditionally guarded
    create a set of constraints beginning with the condition guarding the edge
    for every location in the use_def chain for this edge
        if the location is an assignment
            add a constraint that the assignment is true
        else
            be_unsound()

    for every variable in "used_variables"
        solve for the lo and hi values, where
            lo is the lowest value possible (unsigned)
            hi is the highest value possible (unsigned)

        if a value exists for this variable, narrow it by lo/hi
        else set the variable equal to an interval of lo/hi

Where “used_variables” are the variables read by the assignments in the use-def chains.

The performance is deplorable

Analyzing this trivial, target binary without the strided intervals analysis which invokes z3 is very fast. I have spent quite some time chasing down performance issues. Let’s take a look:

Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m0.017s
user    0m0.012s
sys 0m0.005s
Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m0.016s
user    0m0.011s
sys 0m0.004s
Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m0.018s
user    0m0.012s
sys 0m0.004s

I can live with an average runtime of 0.017s. As soon as I add the strided intervals analysis in one time (as opposed to each time Raptor’s optimizing/analysis loop runs, which is multiple times), I get the following:

Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m2.746s
user    0m2.691s
sys 0m0.021s
Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m2.695s
user    0m2.675s
sys 0m0.013s
Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m2.785s
user    0m2.763s
sys 0m0.014s

I rebuilt the test in docker for Debian Stretch, and ran the same benchmarks. I got the following times:

root@5900dab6b398:/code/raptor# time target/release/raptor-test jump-table-0 

real    0m1.254s
user    0m1.220s
sys 0m0.000s
root@5900dab6b398:/code/raptor# time target/release/raptor-test jump-table-0 

real    0m1.235s
user    0m1.210s
sys 0m0.000s
root@5900dab6b398:/code/raptor# time target/release/raptor-test jump-table-0 

real    0m1.276s
user    0m1.250s
sys 0m0.000s
root@5900dab6b398:/code/raptor

Better, but far worse that the times without the strided intervals.

If I still run the strided intervals analysis, but comment out the solver logic and do nothing when we hit those conditionally-guarded edges:

Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m0.017s
user    0m0.012s
sys 0m0.004s
Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m0.018s
user    0m0.012s
sys 0m0.004s
Alexs-MacBook-Pro:raptor endeavor$ time target/release/raptor-test ~/code/test-programs/jump-table-0/jump-table-0 

real    0m0.020s
user    0m0.012s
sys 0m0.004s

That solver is a mother. The performance hit is unacceptable.

Moving forward

Using a solver to find the bounds of variables seemed like a good, quick and easy win. It certainly works. However, the performance trade-offs are unacceptable. I guess it’s back to good-old-math to solve for narrowing values.