Corrode update: control flow translation correctness
I’ve been pretty quiet about Corrode for a few months. That’s partly because I was traveling some, partly because I’ve been busy with non-Corrode client work, partly because I’ve been setting up my next Rust-related contracts that I hope to get to announce soon… but mostly, I’ve been quiet about Corrode because I’ve been trying to fix some very difficult bugs since December, and it was demoralizing to fight with my code for so long.
So I’m very pleased to announce that Corrode now translates arbitrary C control flow to Rust, correctly!
One thing this has already enabled: the new rust-lzo crate on crates.io is a Rust implementation of the LZO compression algorithm, translated using Corrode from the Linux kernel’s implementation. Hooray!
I last talked about the challenges around getting control flow right in November (“Go To Statement Considered (Mostly) Harmless”). At that time I mentioned that Corrode couldn’t handle “irreducible” control flow, or some other cases. Later I discovered that sometimes it wasn’t even producing correct answers for the cases that I thought were working.
Corrode’s new control-flow algorithm is based on Emscripten’s “Relooper” algorithm, as described in “Emscripten: An LLVM-to-JavaScript Compiler”. I’d like to thank Brian Anderson of the Mozilla Rust team for reminding me that Emscripten exists and already solved almost exactly the problem I faced! (Unless I’m mixing people up again, in which case I’d like to thank whoever actually told me that, instead.)
The Relooper is fully general: it can handle any control flow that you can express in C, including arbitrary ‘goto’ statements. And I have confidence that my implementation really will handle any wacky control flow you can come up with, partly because at this point I have tested millions of randomly-generated control-flow graphs against relatively simple correctness properties using QuickCheck. I found so many bugs that way! QuickCheck then simplified each bug-triggering control-flow graph down to the smallest graph that would still trigger a bug, making it easier to figure out why it failed.
I now have Travis-CI testing 100,000 random CFGs after every build. This is not a hardship because the implementation is fast: on my 2013-era laptop, testing that many examples only takes around 20 seconds!
Some of the bugs I encountered were in cases which are not covered by the Emscripten paper. I never found the current implementation of the Relooper when I looked at the Emscripten source code, so I don’t know whether those cases are covered by Emscripten’s real implementation and were just left out of the paper, or even if they were covered in the paper and I just missed it. I also discovered that one of the cases in the paper can’t happen because it’s subsumed by the others. I intend to ping the paper’s author about these surprises sooner or later…
This algorithm is tricky to get right and requires very carefully defining invariants and preconditions. Because I needed to reason about the algorithm in a very formal way to have any hope of getting it right, I thought it might be a good use case for annotating the implementation with refinement types as implemented in LiquidHaskell. Unfortunately, I had too much trouble writing types that LiquidHaskell would accept, so I gave up. Still, just making the effort to write down a semi-formal model helped my understanding a lot, so it wasn’t wasted effort! I hope to find a good excuse to play with LiquidHaskell again sometime.
Although I wasn’t able to formalize my reasoning for why Corrode’s implementation of the Relooper is correct, I at least wrote down an English-language sketch of the correctness and termination proofs for the algorithm, as part of the Literate Haskell documentation for Corrode’s CFG module. Please read it, I spent a lot of time writing it ;-)
I’m slowly evolving this algorithm beyond Emscripten’s implementation, because for Corrode I want the output to be as human-readable and maintainable as possible. (Emscripten’s output is only intended for a web browser to parse, and you need only glance at JavaScript that humans write to realize that nobody’s much bothered if we give our web browsers massive headaches.) So far I have a simple post-processing pass to merge any “Simple” block with an immediately-following “Multiple” block, which eliminates the need to introduce an extra variable in the most common cases; and a straightforward check to avoid using labeled “break” and “continue” statements when referring to the most immediately enclosing loop. I have more clean-ups planned, inspired by trying to read the output from Corrode on real C source files.
I’m sure the question you’re asking yourself by now is, “But how is the CVS Rust port coming along?” Significantly better now, thank you! There are now only five source files that Corrode refuses to translate, for various reasons, although quite a few more still won’t compile after translation. And at least one source file now works fine that previously, when I tried converting it to Rust, made CVS fail its test suite. So a lot of the remaining work is just in cleaning up the output, plus a bit of handling more corner cases of C.
Corrode has many limitations left to fix. But at least control flow isn’t one of them any more!