Fil-C

Memory SafetyC/C++ CompatibilityModern Tooling

Memory Safe Context Switching

Support for ucontext APIs is new since release 0.680. If you want to play with setcontext, getcontext, makecontext, and swapcontext then you have to build from source.

This document describes how Fil-C supports longjmp, setjmp, setcontext, getcontext, makecontext, and swapcontext in a totally memory-safe way. In particular, no misuse of those APIs in Fil-C can lead to stack corruption or any other violation of Fil-C's capability model.

These APIs are widely used:

The ucontext APIs are less commonly used than longjmp/setjmp and some OSes (like Darwin) have deprecated them. However, they remain well supported in glibc.

Implementing these APIs in a way that preserves memory safety is hard since their misuse can result in restoring a dangling stack. For example, you could either setjmp or getcontext within some function, and then do any of the following things:

Even more friendly APIs like makecontext and swapcontext can be straightforwardly misused:

In Yolo-C, execution on a dangling stack results in the most confusing kinds of crashes, since the debugger won't even be able to print a stack trace! Worse, if the program has subtle bugs in its handling of contexts, then an attacker could exploit those bugs to cause the program to do whatever the attacker likes. In Fil-C, execution on a dangling stack is not possible: all such cases are either panics at the point where you misused longjmp or one of the ucontext APIs, or they are reliably legal execution because of how Fil-C manages stacks.

Fil-C implements setjmp/longjmp and the ucontext APIs quite differently.

Making setjmp/longjmp Memory Safe

There is an impressive amount of depth to the depravity of setjmp. Before going into the details of how Fil-C implements setjmp/longjmp, we need to discuss exactly what makes this function so amazingly evil.

setjmp saves the context as it was at the moment when it was called so that when longjmp is called later, setjmp will return a second time. It is the fact that it returns twice that makes it so vile, and so we need to understand the implications precisely.

An Example

Consider this simple program:

#include <setjmp.h>
#include <stdio.h>

int main(int argc, char** argv)
{
    volatile int x = 42;
    jmp_buf jb;
    if (setjmp(jb)) {
        printf("x = %d\n", x);
        return 0;
    }
    x = 666;
    longjmp(jb, 1);
    printf("Should not get here.\n");
    return 1;
}

This program prints:

x = 666

And then exits. The flow is:

  1. On the first call to setjmp, it returns 0 and saves its caller's context in jb.
  2. Then we set x to 666 and longjmp to jb with the value 1.
  3. setjmp returns 1, so we printf and exit.

Note that we have to mark x as volatile for the program to reliably print 666. Otherwise, the compiler is allowed to optimize the access to x and have it return 42 instead. This might happen in the following ways:

Three things to reflect upon:

Here's a more diabolical version of the example that triggers spilling of x to two different spill slots (one for 42 and one for 666) in gcc, clang, and filcc.

#include <setjmp.h>
#include <stdio.h>

int main(int argc, char** argv)
{
    int x = 42;
    asm volatile("" : "+r"(x));
    jmp_buf jb;
    int a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, h = 9, i = 10;
    /* Force some spilling */
    asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    if (setjmp(jb)) {
        asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
        printf("x = %d\n", x);
        return 0;
    }
    x = 666;
    void (*jump)(jmp_buf, int) = longjmp;
    asm volatile("" : "+r"(x));
    asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    jump(jb, 1);
    asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    asm volatile("" : "+r"(x));
    printf("Should not get here.\n");
    return 1;
}

This program will print x = 42 even though x is not constant folded or register-allocated.

Note that all of the examples so far work in Fil-C. Even the inline assembly that we're using to obfuscate variable values works in Fil-C, and has the desired effect.

What Is Even Happening

Let's take a look at how simple setjmp is by looking at the musl implementation on x86_64:

__setjmp:
_setjmp:
setjmp:
    mov %rbx,(%rdi)         /* rdi is jmp_buf, move registers onto it */
    mov %rbp,8(%rdi)
    mov %r12,16(%rdi)
    mov %r13,24(%rdi)
    mov %r14,32(%rdi)
    mov %r15,40(%rdi)
    lea 8(%rsp),%rdx        /* this is our rsp WITHOUT current ret addr */
    mov %rdx,48(%rdi)
    mov (%rsp),%rdx         /* save return addr ptr for new rip */
    mov %rdx,56(%rdi)
    xor %eax,%eax           /* always return 0 */
    ret

This is only saving the callee-save registers, plus the stack pointer and instruction pointer as they were at the callsite. It's not saving the stack itself.

Later, when longjmp is called, the register state is restored with only one difference: %eax (the return value register) will get the argument passed to longjmp.

Hence, the most basic safety issue with setjmp is that if we call it and then return from the function that had called it, the context saved by setjmp is not valid to longjmp to. Jumping to such a context will result in a torn machine state:

longjmp is only safe if it's called at a time when the stack frame used by setjmp could not have possibly been overwritten, since that is the only way to guarantee that the register state restored by longjmp matches the stack frame that the stack pointer points to. The easiest way to guarantee this is to ensure that longjmp is only called from within the function that called setjmp, or from some function called by the function that called setjmp (transitively).

But that's not all!

The compiler has to know that setjmp returns twice to ensure that spill slots are not reused unsoundly. In fact, compilers detect calls to setjmp and treat the functions that call it specially by disabling any optimization that would lead to a reuse of spill slots. This is surfaced a bit to compiler users with the returns_twice attribute.

Let's consider our diabolical example, but with the setjmp call obfuscated:

#include <setjmp.h>
#include <stdio.h>

int main(int argc, char** argv)
{
    int x = 42;
    asm volatile("" : "+r"(x));
    jmp_buf jb;
    int a = 1, b = 2, c = 3, d = 4, e = 5, f = 6, g = 7, h = 9, i = 10;
    asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    int (*setjump)(jmp_buf) = setjmp;
    asm volatile("" : "+r"(setjump));
    if (setjump(jb)) {
        asm volatile("" : "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
        printf("x = %d\n", x);
        return 0;
    }
    x = 666;
    void (*jump)(jmp_buf, int) = longjmp;
    asm volatile("" : "+r"(x));
    asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    jump(jb, 1);
    asm volatile("" : "+r"(jump), "+r"(a), "+r"(b), "+r"(d), "+r"(e), "+r"(f), "+r"(g), "+r"(h), "+r"(i));
    asm volatile("" : "+r"(x));
    printf("Should not get here.\n");
    return 1;
}

Now, the results I see are:

The unsafe thing that is happening (and that Fil-C prevents by refusing to compile this program) is that if the compiler compiles a call to setjmp without knowing that it's calling setjmp then the spill slot used by x = 42 might get reused by some other variable in the code after the if (setjmp) { ... }.

Putting It All Together

Fil-C makes longjmp/setjmp memory safe by ensuring that:

The sketchiest part of this is that the Fil-C runtime strongly assumes that if a pointer variable was materialized as an SSA value in LLVM IR at the time that the FilPizlonator runs, then longjmping restores that value to the state it had at the time of the setjmp, so long as the setjmp is flagged as returning twice. I have so far confirmed that this is the case, but it's extremely confusing - if there was a bug in my longjmp/setjmp, this is where it would be, and it would manifest as follows: after the longjmp, the GC's view of the stack frame's roots is as if all of the local pointers were restored to their values before the setjmp, but some pointer's value was not restored and has a new value from after the setjmp call but before the longjmp call. Note that you cannot trigger the bug with something like making a pointer volatile, since that causes the pointer to be a stack allocation, not an SSA value - and in that case, my transformation does the right thing (the "pointer" really ends up being an object in the heap, and the SSA value is a pointer to that pointer box).

Assuming my analysis of this hideous abomination is right, these rules are sufficient to allow almost all safe uses of longjmp/setjmp while prohibiting any possible use that corrupts the stack or causes any possible violation of the Fil-C capability model.

Making ucontext Memory Safe

It's almost possible to use setjmp/longjmp in to implement fibers in Yolo-C. But two problems arise if we try to do this:

  1. Fibers need a context switch that simultaneously restores some state (the longjmp) while saving the the state (the setjmp). It's extremely confusing to write this in terms of setjmp/longjmp.
  2. It's not obvious how to bootstrap when we start a new fiber. We want to allocate a stack and produce a jmp_buf that we can longjmp to so that we start running the main function of the newly created fiber.

It turns out you can do this with the sigaltstack hack, but as brilliant as this hack is, folks usually prefer to use the much nicer ucontext APIs:

getcontext snapshots the current state into a context. This is like setjmp, though it's rarely used that way; it's mostly used for prepopulating a ucontext_t before calling makecontext.

setcontext is a one-way context switch to a context (it does not save the state before switching). This is mostly just used for exiting a fiber.

makecontext creates a new context that is bootstrapped to call some main function. In a bizarre twist of history, this function's contract requires a prior call to getcontext even though it mostly overwrites all of the state snapshotted by getcontext. Most modern uses of getcontext are just due to this twist.

swapcontext is a context switch that simultaneously saves the current context to one ucontext_t and switches to another ucontext_t.

Here's an example of how to use this API from the Linux man pages (I made some small changes to reduce its size):

#include <ucontext.h>
#include <stdio.h>
#include <stdlib.h>

static ucontext_t uctx_main, uctx_func1, uctx_func2;

static void func1(void)
{
    printf("func1: swapcontext(&uctx_func1, &uctx_func2)\n");
    swapcontext(&uctx_func1, &uctx_func2);
    printf("func1: returning\n");
}

static void func2(void)
{
    printf("func2: swapcontext(&uctx_func2, &uctx_func1)\n");
    swapcontext(&uctx_func2, &uctx_func1);
    printf("func2: returning\n");
}

int main()
{
    char func1_stack[16384];
    char func2_stack[16384];

    getcontext(&uctx_func1);
    uctx_func1.uc_stack.ss_sp = func1_stack;
    uctx_func1.uc_stack.ss_size = sizeof(func1_stack);
    uctx_func1.uc_link = &uctx_main;
    makecontext(&uctx_func1, func1, 0);

    getcontext(&uctx_func2);
    uctx_func2.uc_stack.ss_sp = func2_stack;
    uctx_func2.uc_stack.ss_size = sizeof(func2_stack);
    uctx_func2.uc_link = &uctx_func1;
    makecontext(&uctx_func2, func2, 0);

    printf("main: swapcontext(&uctx_main, &uctx_func2)\n");
    swapcontext(&uctx_main, &uctx_func2);

    printf("main: exiting\n");
    return 0;
}

This program prints:

main: swapcontext(&uctx_main, &uctx_func2)
func2: swapcontext(&uctx_func2, &uctx_func1)
func1: swapcontext(&uctx_func1, &uctx_func2)
func2: returning
func1: returning
main: exiting

Some notes:

For making this API memory-safe, we'll focus on the idiom above where getcontext is only for initializing ucontext before a call to makecontext.

Laws For Safe ucontext

Let's enumerate the laws we will enforce for ucontext. Note that these laws are more restrictive than what is strictly necessary to make ucontext memory-safe, but I wanted to start with the most conservative possible implementation that is useful to real users of the API.

Opaque state. We'll repeat the trick we used to make jmp_buf safe: inside the ucontext_t, we'll have a pointer to an opaque zfiber_context object. Fil-C code cannot access zfiber_context except by calling its API in pizlonated_runtime.h.

ss_sp doesn't matter. The implementation completely ignores the stack you provided in the ss_sp field. Internally, zfiber_context will allocate a stack that you cannot see. It will use your ss_size as the size of that stack (but it will add some padding that's necessary for Fil-C's stack overflow handling to work). The stack is allocated when you call makecontext.

zfiber_context has a restricted state machine. The states are:

This state machine forbids using ucontext for longjmp/setjmp because you cannot switch to a after_getcontext context. You can only switch to a runnable context, and the only way to get one is to either makecontext a new one or to save the current context using swapcontext.

Thread affinity. The Fil-C ABI threads the filc_thread* through every function call and because the compiler is allowed to expect that no function call can ever change the filc_thread* that we're running on. This means that we cannot allow ucontext to cause a stack that had run on one thread to run on any other thread. Hence, zfiber_context tracks which zthread it was created on and disallows any calls into any zfiber_context API from other threads.

GC Integration

When the GC asks the zfiber_context object to mark its outgoing pointers during the mark phase and the context is runnable, the zfiber_context has to do the equivalent of what threads do when a stack scan is requested during a soft handshake.

But what if the following happens during a single GC mark phase:

  1. The GC marks a runnable zfiber_context and puts it on the mark stack.
  2. The GC pops the zfiber_context from the mark stack and marks its outgoing pointers. Let's say that the context is still runnable. So, we scan its stack.
  3. Mutator switches to that zfiber_context using either setcontext or swapcontext. Now, the zfiber_context is running, so its stack is not visible to the GC. This is fine, since the stack is owned by a thread, and the GC uses grey stacks; i.e. it will always rescan the stacks before declaring termination.
  4. Mutator switches away from that zfiber_context using swapcontext, making the zfiber_context runnable again.

Now we have a problem! The GC was expecting that whatever was on the stack doesn't need to be actively tracked by any barriers because we'll just rescan the grey stacks before termination. But now, that stack is no longer owned by any thread; instead it's owned by a runnable zfiber_context. Worse, that zfiber_context is black: we not only set its mark bit but we already popped it off the mark stack and marked its outgoing pointers - so the GC will not visit it again!

The way we solve this is by tracking grey zfiber_contexts. When we swapcontext from a context during marking, if the context is not already grey, then we add it to the current thread's grey_fibers list and set its grey bit. Whenever a thread is asked to rescan its stack, it reruns the stack walk of every grey fiber in its list, clears the grey bits of those fibers, and clears the list.

There's a fun almost-race at termination that may happen due to the use of soft handshakes. In an on-the-fly GC, we may have the following sequence of events:

  1. The GC runs out of work, so it triggers a soft handshake to scan all stacks.
  2. Thread 1 performs a stack scan, including walking and resetting its grey fibers, and finds no new objects.
  3. Thread 1 swapcontexts to a different context, causing its grey fibers to be nonempty.
  4. Thread 2 performs a stack scan and finds no new objects.
  5. There are no other threads, and since none of the threads found any new objects, the GC declares termination even though thread 1 has a grey fiber.

Currently, we just reset the grey fiber lists after termination. Reason: if thread 1 found no new objects to mark in step 2, thread 2 also found no new live objects, and the GC was out of work in step 1, then there's no way that any other unmarked objects could have been introduced into the context before we swapped from it. This is true because:

There's simply no way that thread 1 could have loaded an unmarked pointer from the heap in this scenario!

That being said, if there was a bug in my ucontext implementation, this is where it would be.

Putting It All Together

We only support ucontext APIs in the glibc build of Fil-C. Hence, you get it in /opt/fil and Pizlix, but not in the pizfix. It's implemented as follows:

getcontext allocates a new zfiber_context with zfiber_context_new and calls zfiber_context_bind_sigset (to cause zfiber_context to replicate its internal sigmask with the user-visible uc_sigmask) and then zfiber_context_getcontext.

setcontext calls zfiber_context_setcontext.

makecontext creates its own trampoline that manages passing arguments to the user-passed main function. It also manages handling fiber exit (switching to uc_link or calling exit). Other than that, it just calls zfiber_context_makecontext.

swapcontext mostly just does a zfiber_context_swapcontext. If the from context (aka the oucp) was not initialized, then it allocates a zfiber_context for it using zfiber_context_new and calls zfiber_context_bind_sigset.

Note that Fil-C does not allow using longjmp/setjmp as an alternate context switch path for ucontext. Some software mixes ucontext with longjmp/setjmp, though all cases of this that I've found (OpenSSL, I'm looking at you) has flags to disable the mixing, because Fil-C isn't the only security technology that breaks if you do that.

Conclusion

Fil-C supports memory-safe context switches using either the longjmp/setjmp style and the ucontext style. The ucontext style is new since after version 0.680, so you'll need to build from source to play with it (and it's not yet thoroughly tested). The longjmp/setjmp implementation is older, and probably quite rugged by now.

As this document shows, it's possible to have memory-safe C even if you make the effort to support even the most depraved features!