TWiF 12 Aug 2017 - Palindrome Tutorial

2017/08/12

This Week in Falcon

Falcon is a Binary Analysis Framework in Rust.

In, “This Week in Falcon,” we break down an example use of Falcon to solve for eip in the CGC Palindrome binary (CADET_00001) by artificially constraining state.

Falcon Updates

Palindrome Tutorial

The code for this tutorial is available at github.com/endeav0r/falcon-palindrome.

Introduction

I have been using the Palindrome challenge from CGC, graciously made available for native Linux by Trail of Bits’ cb-multios project, to test and develop Falcon. In this tutorial, we work through writing a Rust program to solve for eip in the Palindrome binary with Falcon.

Let’s start at the end. The following is the ouptut of the program we are about to create.

TRACE - SYS_READ 0x0:32 0xAFFFEF4F:32 0x1:32
Past read
End of read_delim
TRACE - Got invalid concretized load address 0xefffeece
TRACE - Got invalid concretized load address 0xefffeecf
TRACE - Got invalid concretized load address 0xb001eed3
TRACE - Got invalid concretized load address 0xefffeecf
TRACE - Got invalid concretized load address 0xefffeece
Hit return address in check
0xAFFFEF2C:32 ((((zext.32(fd_0_92:8) << 0x0:32) | (zext.32(fd_0_93:8) << 0x8:32)) | (zext.32(fd_0_94:8) << 0x10:32)) | (zext.32(fd_0_95:8) << 0x18:32))
\x00\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\x00\x00\x00\x80\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xef\xbe\xad\xde\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5

real    0m50.958s
user    0m33.130s
sys 0m14.440s
root@d7aeb6201641:/falcon-palindrome#

We feed that input into Palindrome and observe the crash with eip set to deadbeef.

dev@ubuntu-16-04-2-32-bit:~$ echo -ne '\x00\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\x00\x00\x00\x80\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xef\xbe\xad\xde\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5\xf5' | ./Palindrome

Welcome to Palindrome Finder

    Please enter a possible palindrome: Segmentation fault (core dumped)
dev@ubuntu-16-04-2-32-bit:~$ dmesg | tail -n 1
[  200.539254] Palindrome[1531]: segfault at deadbeef ip deadbeef sp bf85d420 error 15
dev@ubuntu-16-04-2-32-bit:~$ exit

Setting up an environment for Falcon.

The easiest way to get a working environment for Falcon is to use the provided Dockerfile.

git clone https://github.com/endeav0r/falcon
docker build -t falcon falcon

Clone the repository for this project, and then mount it inside the docker container.

git clone https://github.com/endeav0r/falcon-palindrome
docker run --rm -ti -v $(pwd)/falcon-palindrome:/falcon-palindrome falcon

Run this tutorial with the following commands:

source ~/.profile
cd /falcon-palindrome
cargo run --release

Rust is infamous for its terrible build times. It may take 5 minutes or more to pull all dependencies, build falcon and the example, run, and find a solution.

When developing falcon, I often mount both falcon and an example project side-by-side in a docker. To do this, I run:

docker run --rm -ti -v $(pwd)/falcon:/falcon -v $(pwd)/example:/example falcon

You will also want to set the falcon dependency in your Cargo.toml to

falcon = { path = "/falcon" }

Crates and Falcon modules

I’ve broken down the crates and Falcon modules we use, and why we’re using them. You can probably skip this safely if you’re not interested, and come back if needed.

// We use error-chain to simplify the handling of errors in Rust. If you're not
// familiar with error-chain, see
// http://brson.github.io/2016/11/30/starting-with-error-chain
#[macro_use]
extern crate error_chain;

// Of course, we'll need falcon
extern crate falcon;

// Falcon uses the standard rust log crate. It's your preferene whether you want to
// see Falcon's logging output, but I do, so I always set this up.
extern crate log;

// Rayon is a, "Data-parallelism library for Rust." We use Rayon to simplify parallelizing
// Falcon, allowing us to step multiple symbolic executors forward simultaneously. Falcon
// is thread-safe and meets Rust's thread-safe guarantees.
extern crate rayon;

// The il module is used to create and modify Falcon IL directly.
// We use this module to create constraints and query a SymbolicEngine, as all
// state is represented in Falcon IL.
use falcon::il;

// The engine module contains everything needed for Symbolic Execution.
use falcon::engine;

// The loader module loads and links the Elfs we have provided, allowing
// us to bootstrap a SymbolicMemory, and lift functions. If needed, we could even
// attempt to lift an entire program across all Elfs with relocations pre-resolved.
use falcon::loader::Loader;

// The platform module models a system. We use the LinuxX86 platform. Currently LinuxX86
// only has two system calls modelled, read and write. Fortunately, this is enough
// for Palindrome.
use falcon::platform;

// We also need to use the Platform trait.
use falcon::platform::Platform;

// Requirements for the log crate.
use log::{LogRecord, LogLevel, LogLevelFilter, LogMetadata};

// We need rayon prelude for magical parallel iterators.
use rayon::prelude::*;

// We specify the path of the binary to load as a rust Path object.
use std::path::Path;

// We need to pass certain things wrapped in Arc to get our EngineDriver started.
use std::sync::Arc;


// Create our logger
struct StdoutLogger;

impl log::Log for StdoutLogger {
    fn enabled(&self, metadata: &LogMetadata) -> bool {
        metadata.level() <= LogLevel::Trace
    }

    fn log(&self, record: &LogRecord) {
        if self.enabled(record.metadata()) {
            println!("{} - {}", record.level(), record.args());
        }
    }
}


// Bootstrap error-chain
pub mod error {
    error_chain! {
        types {
            Error, ErrorKind, ResultExt, Result;
        }

        foreign_links {
            Falcon(::falcon::error::Error);
        }
    }
}

pub use error::*;

Loading the Palindrome binary, and instantiating an EngineDriver

To symbolically execute our program, we need a properly configured and instantiated EngineDriver, and to understand how we do that we need to understand some of Falcon’s core components and vocabulary.

engine::EngineDriver

An engine::EngineDriver, “Drives,” a, engine::SymbolicEngine through an il::Program. The most important components of an EngineDriver are a SymbolicEngine, a platform::Platform, and a Program. We initialize an EngineDriver with these three components, as well as a translator::Arch, which can be used for opportunistic translation of newly discovered functions, and an engine::ProgramLocation, which points to the next location in an il::Program to execute.

We step() an EngineDriver forward over a Program, and receive zero to many EngineDriver back in return. When stepping an EngineDriver causes a SymbolicEngine to fork, we receive a new EngineDriver for each new SymbolicEngine.

engine::SymbolicEngine

An engine::SymbolicEngine is the heart of our symbolic executor. If you’re coming from another symbolic execution engine, SymbolicEngine, may be a confusing term because it encompasses multiple components into one. The SymbolicEngine holds our symbolic state, applies il::Operations over that state, holds path constraints, manages forking, and is used to query the solver.

engine::SymbolicEngine also holds SymbolicMemory, which we will need to setup up ourselves and pass to the engine.

When we refer to, “Symbolic state,” as a concept, we are referring to the SymbolicEngine as a component of the Falcon library. There is no explicitly separate symbolic state.

platform::LinuxX86

platform::LinuxX86, “Implements,” the x86 Linux Operating System. We are using the term, “Implements,” very loosely here, because at the time of this writing it only implements the read and write system calls. It is stateful, but maintaining this state is managed for us by EngineDriver.

The important things we are going to get out of LinuxX86 are:

  1. An instantiation an initial LinuxX86 state. This will also update a SymbolicEngine and ensure it is in the proper state, for example by initializing the stack and command line arguments.
  2. Symbolic variables for reads.
  3. A Vec of all symbolic variables it has created.

loader::elf::ElfLinker

The ElfLinker takes an x86 Elf, links it with its dependencies, performs necessary relocations, creates an initial memory state, and does everything we need to get up and running with our target binary.

Putting it all together

Let’s bootstrap our first EngineDriver.

fn run () -> Result<()> {
    // Load our target binary
    let path = Path::new("Palindrome/Palindrome");
    let elf = falcon::loader::elf::ElfLinker::new(&path)?;

    // Create one function in our program at the entry to this Elf. We need to
    // do this so we can create a valid ProgramLocation.
    let mut program = il::Program::new();
    program.add_function(elf.function(elf.program_entry())?);

    // Get a valid ProgramLocation for this Elf.
    let program_location = engine::ProgramLocation::from_address(elf.program_entry(), &program).unwrap();

    // Create a new `SymbolicMemory`
    let mut memory = engine::SymbolicMemory::new(engine::Endian::Little);

    // Copy over, one byte at a time, the memory from our Loader into the `SymbolicMemory`.
    for (address, segment) in elf.memory()?.segments() {
        let bytes = segment.bytes();
        for i in 0..bytes.len() {
            memory.store(*address + i as u64, il::expr_const(bytes[i] as u64, 8))?;
        }
    }

    // Create a new SymbolicEngine, passing it our initial memory model.
    let mut engine = engine::SymbolicEngine::new(memory);

    // Create a new LinuxX86 platform.
    let mut platform = platform::LinuxX86::new();

    // Initialize both the LinuxX86 platform, and the SymbolicEngine, together.
    platform.initialize(&mut engine)?;

    // Get the architecture translator from the loader so our EngineDriver can
    // opportunistically lift new code.
    let translator = elf.translator()?;

    // Create our EngineDriver.
    let driver = engine::EngineDriver::new(
        Arc::new(program),
        program_location,
        engine,
        &translator,
        Arc::new(platform)
    );

Path explosion, and manually constraining state

Falcon does not yet have any means to deal with Path Explosion. We will need to manually constrain state by limiting the number of paths we follow. Let’s take a look at Palindrome so we can understand where we need to limit exploration.

The comments are to help highlight relevant sections of Palindrome.

int main(int cgc_argc, char *cgc_argv[]) {
    int r;

    if (cgc_transmit_all(1, HI, sizeof(HI)-1) != 0) {
        cgc__terminate(0);
    }
    
    while(1){
        if (cgc_transmit_all(1, ASK, sizeof(ASK)-1) != 0) {
            cgc__terminate(0);
        }
        // This is the function we care about, cgc_check. We don't care about
        // anything after this function.
        r = cgc_check();
...
int cgc_check(){
    int len = -1;
    int i;
    int pal = 1;
    char string[64];
    for (i = 0; i < sizeof(string); i++)
        string[i] = '\0';

    // This is the bug in Palindrome, a read of up to 128 bytes into a buffer
    // only 64 bytes large. Palindrome is a class stack buffer overflow. However,
    // the implementation of cgc_receive_delim is actually very important to us.
    if (cgc_receive_delim(0, string, 128, '\n') != 0)
        return -1;

    // This loop causes a variable to be read from and written to the stack
    // several times in a loop, and was the reason for the recent no-split
    // memory changes to Falcon.
    // We don't actually care about the value of `len`, but if our engine
    // cannot handle this well, we will become stuck on this loop.
    for(i = 0; string[i] != '\0'; i++){
        len++;
    }

    int steps = len;

    // We don't care about this loop, other than same issues as in previous
    // loop.
    if(len % 2 == 1){
        steps--;
    }
    // We don't care about this loop.
    for(i = 0; i <= steps/2; i++){
        if(string[i] != string[len-1-i]){
            pal = 0;
        }
    }
    // We want to ensure we do not hit the CGC EASTEREGG, as we want to
    // overwrite the return address from this function.
    if(string[0] == '^'){
        if (cgc_transmit_all(1, EASTEREGG, sizeof(EASTEREGG)-1) != 0) {
            cgc__terminate(0);
        }
    }

    // We want to reach this instruction with a return address on the stack
    // which has been overwritten by our input.
    return pal;
}
int cgc_receive_delim(int fd, char *buf, const cgc_size_t size, char delim) {
    cgc_size_t rx = 0;
    cgc_size_t rx_now = 0;
    int ret;

    if (!buf) 
        return 1;

    if (!size)
        return 2;

    // We need this loop to execute enough times to overwrite the stack pointer
    // in the `check` function's stack frame.
    while (rx < size) {
        ret = cgc_receive(fd, buf + rx, 1, &rx_now);

        // State-forking condition.
        if (rx_now == 0) {
            //should never return until at least something was received
            //so consider this an error too
            return 3;
        }

        // State-forking condition.
        if (ret != 0) {
            return 3;
        }

        // State-forking condition.
        if (buf[rx] == delim) {
           break;
        }
        rx += rx_now;
    }

    // The target return address.
    return 0;
}

Let’s spend a minute to talk about…

Symbolic Execution and Path Explosion

Up until the point we enter the cgc_receive_delim function, we have not yet placed any symbolic variables into our SymbolicEngine, and therefor we are not forking symbolic state. Upon execution of the first read, which will take place in cgc_receive, we begin forking. We need to read many (ideally, we’d like the full 0x80) bytes in this cgc_receive_delim function in order to overflow the return address in check. However, each forked state will be explored in parallel, and will lead to further forked states in check. Soon we will have an unmanageable number of symbolic states to explore. We will exhaust our resources before overflowing the return address in check.

This is the classic path explosion problem which both Veritesting and Driller attempt to mitigate. In a fully-autonomous system, we need mechanisms to mitigate path explosion.

We aren’t behaving fully autonomously. Falcon has no means to deal with path explosion… yet. Instead, we will hard code decision points to constrain state as we see fit. Let’s take a look at cgc_receive_delim in a disassembler.

Binja Receive Delim Screenshot 0

If we take a look at each state when it’s at the address 0x8048800, we can check the value of eax to determine how many times we have looped. However, as we’re symbolically executing, we’re also forking a large number of states we don’t want to continue.

The easiest place to deal with these states will be the return address out of receive_delim.

Binja Receive Delim Screenshot 1

We have enough information to guide our execution through receive_delim until we reach a state where we have read enough bytes of input. Let’s create a function which captures this knowledge and can advance multiple EngineDrivers simultaneously.

fn get_past_read<P>(driver: engine::EngineDriver<P>)
    -> Result<engine::EngineDriver<P>> where P: platform::Platform<P> + Send + Sync {

    // We keep track of the possible forked states in a Vec of EngineDriver.
    // We update this Vec in each iteration of the loop which follows.
    let mut drivers = vec![driver];

    loop {
        // step_drivers is a Vec of EngineDriver we want to continue stepping through.
        // Not adding an EngineDriver to step_drivers is equivalent to killing, or
        // terminating, that state.
        let mut step_drivers = Vec::new();

        for driver in drivers {
            // If the EngineDriver is currently executing an Instruction (as opposed to,
            // for eaxmple, a conditional edge), we can query the EngineDriver for the
            // address of the instruction being executed. This address is added by the
            // binary lifter, and is the address in memory the instruction was lifted
            // from.
            if let Some(address) = driver.address() {

                // We are going to check each state when it reaches address 0x8048800 
                if address == 0x8048800 {

                    // Get the eax scalar from the SymbolicEngine
                    let eax = driver.engine().get_scalar("eax").unwrap();

                    // We are going to ask the Engine to solve for a value of eax, while also
                    // providing a constraint that eax == 0x80. In the event there is a satisfiable
                    // value state where eax == 0x80, an il::Constant will be returned with the
                    // value of eax (which will be 0x80 because of our constraint).
                    //
                    // At this time in the program, eax is actually an il::Constant, and we could
                    // just fetch that constant, but the method below is more generic and will
                    // work for scalars which are also symbolic.
                    if let Some(eax) = driver.engine().eval(
                        eax,
                        Some(vec![il::Expression::cmpeq(il::expr_scalar("eax", 32), il::expr_const(0x80, 32))?])
                    )? {
                        // Double-check that eax is 0x80, and return this EngineDriver.
                        if eax.value() == 0x80 {
                            return Ok(driver.clone());
                        }
                    }
                }
                // We are returning from read_delim without having read enough
                // bytes. We, "Kill," this state by not adding it to the Vec of
                // states we will continue to step.
                else if address == 0x804880d {
                    continue
                }
            }
            // The current EngineDriver has not reached our desired condition at
            // 0x8048800, nor has it attempted to exit the function, so we continue
            // to step it forward.
            step_drivers.push(driver);
        }
        // into_par_iter() is a threaded, parallel iterator provided by rayon that
        // allows for simple multi-threading of operations when thread-safe guarantees
        // are met by code in Rust. Falcon meets these thread-safety guarantees, and
        // we can step through multiple drivers in a parallel fashion.
        //
        // We don't see much benefit using rayon in this particular function, but when
        // we have many states to step we see great performance improvements from
        // parallelization.
        drivers = step_drivers.into_par_iter()
                              .map(|d| d.step().unwrap())
                              .reduce(|| Vec::new(), |mut v, mut d| {
                                v.append(&mut d);
                                v
                            });

        // In the event we run out of drivers, all states have been terminated before
        // reaching our desired condition at 0x8048800. This is an error condition
        // for us.
        if drivers.len() == 0 {
            bail!("Ran out of drivers");
        }
    }
}

And we add the following to our run() function:

let driver = get_past_read(driver)?;

Onward to, and then solving in, check.

We can now step our EngineDriver, which has read enough bytes to overflow the return address, through the check function. To do this, we’re going to provide one more convenience function, step_driver_to_address. This function will take an EngineDriver, a target address, and a number of steps we want to step forward. This function will step that EngineDriver, and its forks, forward, and then return all EngineDrivers which reach the target address.

With everything we’ve talked about, and some working knowledge of Rust, you should be able to determine the logic of this function on your own.

fn step_driver_to_address<P> (
    driver: engine::EngineDriver<P>,
    target_address: u64,
    steps_to_take: usize
) -> Result<Vec<engine::EngineDriver<P>>> where P: platform::Platform<P> + Send + Sync {

    let mut final_drivers = Vec::new();
    let mut drivers = vec![driver];

    for _ in 0..steps_to_take {
        if drivers.len() == 0 {
            break;
        }
        let mut step_drivers = Vec::new();
        for driver in drivers {
            if let Some(address) = driver.address() {
                if address == target_address {
                    final_drivers.push(driver);
                    continue;
                }
            }
            step_drivers.append(&mut driver.step()?);
        }
        drivers = step_drivers;
    }

    final_drivers.append(&mut drivers);

    Ok(final_drivers)
}

If you remember, our current state hasn’t made it all the way out of the receive_delim function, and we really only need one state upon completion of this function. Let’s use our step_to_driver_address function to step to the return address of receive_delim, and then choose just one state to carry forward.

let drivers = step_driver_to_address(driver, 0x804880e, 256)?;

let driver = drivers[0].clone();

We’re almost done, we just need one state to reach the end of the check function. The return address of the check function is at 0x80489ce.

let drivers = step_driver_to_address(driver, 0x80489ce, 256)?;

Output the solution

We’re done! We just need to output the solution.

To do this, we’ll need to:

  1. Get the expression currently in memory at the location of the return address.
  2. Create a constraint that this expression is equal to 0xdeadbeef (or whatever other value you wish).
  3. Solve for our inputs.
// We only need one state.
let driver = drivers[0].clone();

// Get the address of the stack pointer. This is an il::Constant.
let esp = driver.engine().get_scalar_only_concrete("esp")?.unwrap();

// Read a 32-bit dword from symbolic memory at the stack pointer.
let retaddr = driver.engine().memory().load(esp.value(), 32)?.unwrap();

// Print out the stack pointer, and memory pointed to by the stack pointer.
println!("{} {}", esp, retaddr);

// Convenience reference to this EngineDriver's SymbolicEngine
let engine = driver.engine();

// Create a constraint that the expression we found for the return address will be
// equal to 0xdeadbeef.
let constraint = Some(vec![il::Expression::cmpeq(retaddr, il::expr_const(0xdeadbeef, 32))?]);

// Get the symbolic_scalars from our platform. This returns all symbolic scalars created
// from symbolic reads in order. We only read from stdin, so we will receive an in-order
// Vec of scalars for each byte of input read over stdin.
let scalars = driver.platform().symbolic_scalars();

// Solve for a valid value for each symbolic byte of input while enforcing the
// constraint that the return address is equal to 0xdeadbeef.
let values = scalars.iter()
                    .map(|s| engine.eval(&s.clone().into(), constraint.clone()).unwrap().unwrap())
                    .collect::<Vec<il::Constant>>();

// Create a nicely formatted string we can copy and paste elsewhere.
let input = values.iter()
                  .map(|v| format!("\\x{:02x}", v.value()))
                  .collect::<Vec<String>>()
                  .join("");

// Print out our solved-for inputs.
println!("{}", input);

Closing out

The code for this tutorial is available at github.com/endeav0r/falcon-palindrome. If you find any issues with this example, please open an issue.

Falcon is still in an early state of development. Things may change.

I am available for falcon-related questions and help in irc.freenode.net#rust-falcon.