The Il Nop

2018/07/31

What is an IL/IR?

IL stands for Intermediate Language, and IR stands for Intermediate Representation. An intermediate representation is how we store a computer program as it transforms from one state to another, for example from source code to a compiled binary. Intermediate Language is the language an IR uses to encode the semantics of the program for analysis. These two terms are often used interchangebly.

Houston, We Have a Problem

I have a problem with Falcon IL. It’s time for another change.

Falcon lifts binary code to an intermediate language called Falcon IL. This is a simple IL, which captures both instruction semantics and control flow. I use Falcon for implementing static analyses and symbolic executors which work over x86 and MIPS binaries.

As I am about to make yet another change to Falcon IL, I decided to write a blog post on the history of Falcon’s IL. I will talk about the decisions I made, what worked well, what didn’t, and what changed.

In The Beginning

Falcon IL is not the first IL I have implemented. My first IL was almost an exact clone of RREIL, which captures individual instruction semantics and not Control-Flow. Queso is another binary analysis framework, now abandoned, with its own IL, this time lifting programs to full Control-Flow Graphs for static symbolic execution. Binary Toolkit, an experimental Dynamic Binary Instrumentation engine, also implements its own IL, called Bins for Binary toolkit INStruction.

I have experienced the, “Joys,” of VEX IR in Angr’s pyvex, and the forever popular LLVM IR. While LLVM IR is geared more towards representing programs from source than Binary form, McSema and s2e have found ways to apply LLVM IR to binaries. I work with BAP IL now through my work at ForAllSecure, though my experience with BAP IL post-dates my initial design decisions with Falcon. As always, I find Binary Ninja’s LLIL and MLIL great for getting work done quick and dirty (I wrote a MLIL tool just a couple weeks ago).

When it came time to implement Falcon’s IL, I felt I had a solid grasp of what to do, and what I wanted.

ILs by number of instructions.

IL’s can be plotted on a spectrum of how many types of instructions they implement. RREIL is, perhaps, the simplest, with a simple three-form IL, meaning all (or almost all) instructions are of the form OP, DST, LHS, RHS. An add instruction in RREIL would be ADD RAX, RAX, 7 for example. There are no expressions in RREIL, only instructions, and the IL has been designed to keep the number of instructions as minimal as possible.

On the opposite end, we have VEX IR. VEX IR is not designed for program analysis, VEX IR is designed to make valgrind fast. VEX IR has hundreds of instruction types, allowing valgrind to choose the most optimal instruction for instrumented code, and keeping the instrumented execution fast. The appeal of VEX IR is its completeness, and price (GPL), though adapting it for program analysis purposes can be daunting due to the hundreds of instruction types.

In the middle we have BAP IL, LLVM IR, and Binary Ninja’s ILs. All of these ILs support expressions. With expressions, we can have a = b * (c + 7);. Without expressions, we must say: temp0 = c + 7; a = b + temp0;. Gross. It is my opinion that, in practice, expressions are very easy to work with, just as easy as three-from ILs. Transformations and simplifications can be written over expressions one time, and applied everywhere.

All of these ILs, from RREIL to Binary Ninja’s ILs, are designed to hold the same information, they just represent the information differently. Whereas one IL may have ADD DST, LHS, RHS, another IL may have DST = (LHS + RHS). However, the semantic meaning remains the same.

So what qualifies as an instruction in an IL, and which ILs have more instructions? I believe an instruction in an IL tells us how data will be handled and moved, though I know this isn’t a complete definition. Assignment, Load, Store, and Branch are all basic instruction types.

Take assignment, for example. DST = LHS OP RHS where OP ∈ {ADD, SUB, MUL} is equivalent to ADD DST, LHS, RHS, SUB DST, LHS, RHS, and MUL DST, LHS, RHS. It’s all assignment in the end, but one form allows us encode many different types of operations in one assignment instruction, while the other requires multiple different types of assignment instructions based on the operation being performed. I prefer the former.

With Falcon, I wanted to design a simple, expression-based IL. I originally arrived at five Operation types:

  * Assign { dst: Scalar,       src: Expression }
  * Store  { index: Expression, src: Expression }
  * Load   { dst: Scalar,       src: Expression }
  * Branch { target: Expression }
  * Raise  { expr: Expression }

Expression originally held the following values:

  * Scalar(scalar)
  * Constant(constant)
  * Add(lhs, rhs)
  * Sub(lhs, rhs)
  * Mul(lhs, rhs)
  * Divu(lhs, rhs)
  * Modu(lhs, rhs)
  * Divs(lhs, rhs)
  * Mods(lhs, rhs)
  * And(lhs, rhs)
  * Or(lhs, rhs)
  * Shl(lhs, rhs)
  * Shr(lhs, rhs)
  * Cmpeq(lhs, rhs)
  * Cmpneq(lhs, rhs)
  * Cmplts(lhs, rhs)
  * Cmpltu(lhs, rhs)
  * Trun(bits, rhs)
  * Sext(bits, rhs)
  * Zext(bits, rhs)

  scalar is of type Scalar
  constant is of type Constant
  lhs is of type Expression
  rhs is of type Expression
  bits is of type usize

Each Instruction holds an Operation, as well as some other information, such as the address the operation was decoded from, an optional comment, a position in a Block. Block and Edge form a ControlFlowGraph, and from there we have Function and then Program.

Here’s a link to Falcon’s original IL for reference.

ILs by readability

Binary Ninja is the first IL I know of which made it a point to have a readable IL. If you haven’t spent a lot of time looking at ILs for reverse-engineering, the experience is typically like reading a computer program written by your friend’s nephew who is having trouble with their Introduction to Java course, and is pretty sure their program to find the greatest number in an array is complete, but they’re having some trouble and here’s 300 lines of code and I guess they don’t teach whitespace anymore and all_the_variablesLookLike_THIS. It can be annoying.

RREIL is not readable. VEX IR is not readable. After experiencing Binary Ninja’s IL, I wanted readable IL. Here is an example of some (current) Falcon IL:

[ Block: 0x0 ]
804849B 00 exc:32 = (esp:32 + 0x4:32)
804849F 01 temp_0.0:32 = (esp:32 & 0xFFFFFFF0:32)
804849F 02 ZF:1 = (temp_0.0:32 == 0.0:32)
804849F 03 SF:1 = trun.1((temp_0.0:32 >> 0x1F:32))
804849F 04 CF:1 = 0x0:1
804849F 05 OF:1 = 0x0:1
804849F 06 esp:32 = temp_0.0:32
80484A2 07 temp_0.0:32 = [(ecx - 0x4:32)]
80484A2 08 esp:32 = (esp:32 - 0x4)
80484A2 09 [esp:32] = temp_0.0:32
80484A5 0A esp:32 = (esp:32 - 0x4:32)
80484A5 0B [esp:32] = ebp:32

Not great, but I’ve seen worse. First, while all of those may look like assignments to someone not familiar with Falcon IL, we actually have Assign, Store, and Load Operation types under the hood. Also, expressions help here, allowing us to encode things like: temp_0.0:32 = [(ecx - 0x4:32)], or, SF:1 = trun.1((temp_0.0:32 >> 0x1F:32)) in one operation, which I believe helps readability.

Falcon IL doesn’t go to the lengths of Binary Ninja’s IL to be readable, but some attempts are made to not make it unreadable.

Encoding branches

Branching instructions can be broken into several categories, the most prominent of which are direct and indirect. A direct branch is one where the destination is encoded directly into the instruction, such as “Go to address 0x12345,” or, “Add -200 to the instruction pointer.” An indirect branch is one where the destination is not encoded directly into the instruction, such as, “Load the value from [0x12345], and go to the address held in that value,” or, “Go to the value held in the eax register.”

We also have, “Call,” and, “Not-call,” branch instructions. Perhaps a better word for them is, “Inter-procedural branch,” and, “Intra-procedural branch.” In most architectures, this information is right in the mnemonic. We have call for x86 (and syscall/sysenter if you want), and balr, jalr, and other lrs for MIPS. Yes, this gets tricky with some architectures, but we’re only talking about architectures Falcon supports here.

Last, branch instructions can be conditional. Conditional branches are branches such as je in x86 assembly, branches that are taken if some condition is met, such as the zero flag is set to 1.

Falcon encodes inter-procedural branches, or, “Call,” branches, as well as indirect branches, as branch instructions. Falcon encodes intra-procedural direct branches, including conditional intra-procedural direct branches, implicitly as optionally guarded edges in the ControlFlowGraph. This means where a jmp label instruction exists in assembly, no instruction will exist in Falcon IL, and an edge will be created to the jump target.

Falcon IL encodes intra-procedural indirect branches, most often used as jump tables, simply as indirect branches.

Things start out splendidly, and then changes were made

At first, everything with Falcon was going splendidly. System call instructions, and other interesting instructions, were implemented as Raise operations. With this, Falcon’s IL was used to symbolically execute the Palindrome problem from the Cyber Grand Challenge, and I wrote some analyses to statically detect common errors in passing arguments to libc functions in MIPS binaries. When I needed to lift floating point instructions, I would simply emit a, Raise operation, and if encountered I would stop analysis. I wasn’t encountering these in my analyses, so everything went well.

Raise becomes Intrinsic

Eventually, while symbolically executing MIPS instructions, I came across the rdhwr instruction. This instruction has special handling in the MIPS ABI. When the rd operand, or source operand, of this instruction is the stack pointer, the address of Thread Local Storage is moved into the rt operand, or destination operand. This isn’t something my Raise operation could handle. Uh oh.

I ended up replacing the Raise operation with the much more verbose, and battle-ready, Intrinsic operation. The Intrinsic operation has operands of its own, and provides capabilities for analysis even when the semantics of an operation are not fully captured. Perhaps I just need to know what scalars are written to? That’s captured in Intrinsic.

I have now completely replaced all occurences of Raise with intrinsic, and things are working well. I’m happy with this switch.

If-Then-Else, or Ite Expression

State-merging is a thing. Here’s a paper, and here’s another. Given two values with path constraints, such as A = 1 iff B = 2 and A = 2 iff B != 2, we can combine these values as A = 1 iff B = 2 else A = 2 and merge state, assuming some other things are true, but that’s not in-scope for this post. Seems relatively straightforward, unless of course you have no means of representing if-then-else expressions in your IL.

I originally left the Ite expression out of Falcon IL very consciously. Assembly instructions which conditionally assigned values were translated to Control-Flow Graphs, where the condition of the assignment was broken out into conditional edges in the graph, the assignment instruction only being encountered if the conditionally-guarded edge preceeding it was true. This was working fine for me originally, until I needed to do state merging.

After a while, I relented, and now Falcon has an Ite expression. This expression is never lifted, but can be added later on by further analyses, such as state-merging in symbolic execution (or other things!).

MIPS you delay-slot tricky devil

To understand this problem, you need to understand a bit more about Falcon IL and MIPS.

If you don’t have the notion of delay slots explicitly encoded in your IL (really something I should write another post about), delay slots are the perfect storm.

Let’s start with how these instructions were originally implemented.

Assume we have the following:

1000: balr 0x123456:32
1004: addiu $v0, $zero, 1

Easy. Falcon lifts instructions by block, not by individual instruction. When we encounter a branching instruction with a delay slot, we can simply lift the branching instruction, lift the delay slot, put the delay slot before the branching instruction, and end the block. Easy. Let’s take a look at what that may look like:

1004 00 $v0:32 = (0x0:32 + 0x1:32)
1000 01 b 0x123456:32

Amazing. Victory in Europe. Mission Accomplished. Except, if during an analysis, there is another branch to 0x1000, what happens? Well, we find 0x1000 and start executing, missing the delay slot. Does this cause ld.so to fail, and take many hours, maybe days, I’ve lost count, to debug? Yes. So what is the fix?

Well, we can insert an instruction before the 1004 instruction that does nothing, assign it the address 1000, and have it do nothing. We still need an address for the branch instruction, because of the way the lifter currently works. Right now, Falcon does something like this:

1000 00 nop = 0x0:1
1004 01 $v0:32 = (0x0:32 + 0x1:32)
1005 02 b 0x123456:32

Oh wow, that’s ugly. It works, but it’s not beautiful, right? Well… sometimes.

Remember, conditional branches will calculate the branching condition before the delay slot is executed. Given an instruction sequence such as:

1000 beq $v0, $zero, label
1004 addiu $v0, $zero, 1

This becomes:

1000 00 nop = 0x0:1
1004 01 $v0 = (0x0:32 + 0x1:32)
// outgoing condition on edges is ($v0 == $0x0:32)

We’ve destroyed this conditional branch. The solution I found was to replace the nop with an assignment to a variable branching_condition:1. This now looks like:

1000 00 branching_condition:1 = ($v0:32 == 0x0:32)
1004 01 $v0 = (0x0:32 + 0x1:32)
// outgoing condition on edges is branching_condition:1

This works, and I believe this method will work for all architectures with a single-instruction delay slot.

The case of the missing branch instruction

We spoke earlier about how Falcon IL omits intra-procedural direct branches altogether, and those branches are implicit as edges in the Control-Flow Graph. There’s a problem with this: Sometimes, an analysis needs to know where that branch instruction was. The most obvious example is when a branch instruction branches to an intra-procedural direct branch. If there is no placeholder for that branch, we’re toast.

This problem was fixed in MIPS by the above ugly hack, but still exists in x86. I can continue to litter the litter the IL with useless assignments, whose sole purpose is to serve as a placeholder at an address where an instruction should be. However, this is ugly, inelegant, and prone to further issues down the road. For example, a dead-code elimination pass which eliminates instructions may present the same issue later on.

I believe it’s time for a full-fledged Nop operation. Because I know Nop operations have no semantic significance, and I can simply skip over them in analysis. I can even omit them rendering when viewing IL (preserving some of that IL readability). When I perform dead-code elimination, or otherwise transform the IL, I can leave nops alone, and insert a nop for operations removed, ensuring analysis which depends on being able to locate addresses in the future will function correctly.

Falcon, and Falcon IL continues

I don’t believe there’s a single IL which best fits all analyses. A wise man once told me, “You create the IL to fit the analysis, not the other way around.” We start with an IL which provides a good base, and transform it to another representation which best suits the analysis we’re attempting to perform. However, when information, such as the address of an omitted branching instruction, is lost in that first IL, problems arise. Falcon IL remains my favorite IL, and I’m constantly working to keep it complete.