Friday, July 15, 2016

Testing strategies for Corrode

Last week I posted about Corrode, my C-to-Rust translator.

I'm quite proud of Corrode, to be perfectly frank! And apparently other people think it's cool too, since it was on the front page of Hacker News and /r/rust for a solid day, and it spent the subsequent week on GitHub's list of trending projects. The past week has been very exciting, I gotta say!

A moment of reflection, then. There are a few things an open source project really ought to have:
  1. Responsive maintainers. People have been trying Corrode on their own code and filing issues, and a few people have been busily submitting pull requests. I've been responding to these as quickly as I can.
  2. Solid documentation. I think documentation is pretty well covered, considering some of the feedback:
    • "Absolutely blown away by the detail of the documentation."
    • "The documentation is amazing."
    • "That's an impressive use of literate programming."
    • "WOW... This is just amazing. The most important part of the source is highly educative literate haskell."
    • "Who are you and what did you do with Jamey?" —a former co-worker who had to maintain my code after I left
  3. Verification/validation. Um... well... did you see my documentation?
So it's time for a real plan for verifying that Corrode does what it's supposed to. I've explored several strategies so far, which have interesting trade-offs.

Randomized testing with Csmith and C-Reduce

"Csmith is a tool that can generate random C programs that statically and dynamically conform to the C99 standard. It is useful for stress-testing compilers, static analyzers, and other tools that process C code."

Corrode now has a script in scripts/csmith-test which performs the following steps:
  1. Generate a random C program using Csmith, and verify that the program compiles using GCC, as our reference compiler. If either of these fail, we won't discover any interesting Corrode-related bugs with this program, so discard it.
  2. Convert the program to Rust using Corrode, and compile it with rustc. If either of these fail, this may indicate an interesting bug; report it to the user.
  3. Run the GCC-compiled program. If it times out, or exits with an error, or produces output that doesn't look right for a Csmith-generated program, then odds are low that we'll learn anything from the Rust version and we just discard this case.
  4. Run the Rust-compiled program and compare its output to the GCC-compiled version. If they're different, that probably indicates an interesting bug; report it to the user.
Usually it only takes a few seconds to run through that process; but on the other hand, usually these tests all pass :-) so I've been running this script in a loop which stops once an interesting failure is actually found.

Once the script has produced an interestingly-buggy test case, then it tries to delete as much code from the test as possible while preserving the bug that made it interesting. C-Reduce does the heavy lifting for this part, using a variety of heuristics starting with "what happens if I delete this line? No? How about this one?"

Unlike the Csmith test cycle, C-Reduce can easily run for an hour or two trying to find chunks that are OK to delete. At the moment this is my only complaint about C-Reduce: I have so many possible bugs to investigate that spending an hour or two waiting on each one feels like a lot. That said, it's doing a very good job of narrowing down the portions of code actually responsible for bugs, so I guess I'll keep it. :-)

So far I've found and fixed three bugs in Corrode using Corrode's csmith-test script.
  • I applied C99's "usual arithmetic conversions" to the left-shift and right-shift operators, but I should instead have applied the "integer promotions" to both operands and then taken the left-hand side's type. Fixed in commit 3fd9040.
  • I never realized that the type of a C99 integer literal depends not only on whether it has 'U' or 'L' suffixes, but also on whether it's written in decimal (as opposed to octal or hex), and also on whether its value fits in a given type. Fixed in commit 93671a7.
  • Apparently, I find the "usual arithmetic conversions" hard to get right. I've "fixed" them in three commits in a row and Csmith keeps finding new corners for me.
My big problem with the usual arithmetic conversions is that I'm trying to translate C's long to Rust's isize, but in various corner cases that means you can distinguish between Corrode and a standard-conforming C compiler. I may just decide that's a "feature"...
I have not found any cases yet where I could blame rustc for any wrong output, but you never know... it could happen!

Building real software: musl libc

The musl library is a from-scratch implementation of the C standard library, which "is lightweight, fast, simple, free, and strives to be correct in the sense of standards-conformance and safety."

Adam Perry has been manually porting musl to Rust as an exercise to discover "the advantages and pitfalls of a C-to-Rust rewrite project," so I thought it might be a good candidate for testing Corrode.

Of particular importance for my purposes, musl doesn't use system headers, which means that compared with other real C code-bases, musl less frequently uses parts of C that Corrode doesn't support yet.

To test Corrode on musl, I first made the corrode executable accept all the options that GCC/Clang do (more or less), and just pass anything it doesn't understand to the C preprocessor.

With that, you can run "make CC=corrode -k" and get a "foo.rs" file alongside every "foo.c" file that Corrode was successfully able to translate! The "-k" option tells make to keep going as long as possible even when some commands fail.

At the moment, Corrode translates about 10% of the source files in musl, which I feel is pretty good! I was especially excited when I got it to translate bsearch.c and a bunch of the string.h functions, since those do funny things with pointers...

Now, I haven't yet tested whether any of these source files translate correctly, and in fact I know a few of them don't (if they have static local variables, or declare arrays). At the moment all I can tell you is that the output looks mostly reasonable, in my human judgment.

At some point I'd like to integrate Adam Perry's build and test scripts from his manual rewriting effort, which would let me validate that Corrode+rustc generate code that passes musl's test suite.

Software Analysis Workbench

"The Software Analysis Workbench (SAW) provides the ability to formally verify properties of code written in C, Java, and Cryptol. It leverages automated SAT and SMT solvers to make this process as automated as possible, and provides a scripting language, called SAW Script, to enable verification to scale up to more complex systems."

Csmith-generated programs print some computed values just before exiting, and that's all you can observe about them during testing. If there are errors mid-computation but they cancel out by the end, or otherwise have no effect on the final output, Csmith-based testing won't find those.

SAW, by contrast, is designed to show that two functions compute equivalent outputs. In principle, we could use SAW to prove that the code clang generates is equivalent to the code which Corrode+rustc generate on a per-function basis.

Unfortunately, current git master of SAW can't parse the LLVM bitcode which rustc produces. I hear SAW's maintainers are working on that, so I'm hoping this will be an effective option for Corrode testing soon!

Unit tests and randomized property testing

You were probably wondering why I haven't mentioned these yet, I bet?

Csmith was low-hanging fruit: It has proven to be effective at finding bugs in Corrode with very little work on my part. And musl has quickly shown me which C constructs are used by a fairly substantial real-world code base, so I could focus on implementing those.

Unit tests and QuickCheck-style property testing, on the other hand, require a lot more thought from the developer. I just haven't put in that level of thought yet.

As Corrode matures, I think we'll need both focused regression tests and randomized property tests to test at every commit, which can be supplemented by periodic longer-duration Csmith runs.

Conclusions

Corrode doesn't support enough of C yet to translate most real programs, largely due to system headers declaring things like unions and enums even if your program never uses them. But for the subset of C which Corrode does support, at this point I feel pretty good about the correctness of its translation!

Friday, July 8, 2016

Translating C to Rust (and how you can help)

I've been working for a few months on a C to Rust source code translator called "Corrode", and it's starting to come together into something that looks promising!

Corrode is kind of a weird project in that it's really not very much code—less than a thousand lines at its core!—and yet people expect it to be this huge scary thing. Admittedly, it has required a lot more thought per line of code than many programs do, so maybe you should be a little scared of the scope of a project like this.

But...

I'd like to convince you that you can contribute to Corrode's development, and in this post I'll talk about some reasons why it's easier than it sounds, and some of the ways I'm trying to make first contributions as painless as I can.

(If you just want to read up on how Corrode does its magic, or learn about some bizarre corners of C, check out the detailed design documentation.)

Type-directed development

Corrode is implemented in Haskell. Wait, come back! It's no big deal, I swear: Just about everything I wrote has a simple Rust equivalent, just with different syntax. The only reason I wrote Corrode in Haskell is because I couldn't find a complete enough C parser for Rust, and I was already familiar with the language-c parser for Haskell. So I believe that if you can write Rust, you can pick up enough Haskell to work on Corrode.

In most compiler front-ends, the parser's job is to construct an "abstract syntax tree" (AST), and language-c is no different. What does an AST look like? Take a peek at the Language.C.Syntax.AST module. It defines a series of types, where each type represents some piece of C syntax. For example, a CExternalDeclaration represents one thing that can appear at the top-level of a C source file (that is, outside of any function), while a CExpression represents one operator and its operands (like "a + 2").

So something you might wonder about a project like Corrode is: how can we tell when we're done? The neat thing is that these AST types tell us exactly what cases we have to handle before we'll be able to translate all legal C programs. Once we've written down a translation for every constructor of CExpression, for instance, there is no way for language-c to hand us some magic other kind of expression it hadn't previously told us about.

As a result, the process of creating Corrode has been an almost mechanical cycle of finding one of the unimplemented cases, thinking about what the equivalent Rust should be for that case, and writing code to generate that Rust.

Corrode defines a couple of common error-reporting functions, and one of them is called "unimplemented". One quick way to find some place you could contribute is to search for calls to that function and see if any of the missing cases looks like something you know how to tackle. (As of this writing, there are 11 such calls, although some are catch-all handlers for several unimplemented cases at once.)

Open issues tagged "easy"

Of course, some of the cases are not yet implemented because they're actually super difficult to handle correctly. For example, the switch and goto statements are in that category, for reasons I discussed in an earlier post (How to eliminate "goto" statements). Just searching for calls to unimplemented won't tell you whether you've picked one of the tricky cases.

As an alternative, I've configured Corrode's issue tracker with a label called "easy", which I use on issues that I think are a good introduction to hacking on Corrode. I am deliberately avoiding implementing these cases, even though I know how I would implement them, except when they keep me from finishing something else.

While I was a resident at the Recurse Center a few weeks ago (which was a wonderful experience, by the way, and highly recommended), some of the "Recursers" pair-programmed solutions to some of these easy-tagged issues with me. I was so happy to get to merge pull requests from Jeremie Jost (implement alignof) and Nabil Hassein (implement global variables). Jeremie also got a pull request merged in language-c for an issue we discovered along the way. These were all fairly small patches, and you could do this too!

Literate programming

Literate programming turns the usual emphasis of programming on its head. Instead of writing a bunch of machine-readable source code that has human-readable comments interspersed, in literate programming you write human-readable documentation that has some machine-readable source code scattered in it.

For Corrode, there isn't very much source code, but the assumptions and rationale behind the implementation are intricate. I was going to write a detailed design document to help other people get started hacking on this project. But then I realized that I'd want the documentation to refer to specific parts of the implementation, and when reading the implementation I'd want the relevant parts of the documentation to be easy to find.

The literate programming style is perfect for this! Check out the core source file of Corrode... which is just a Markdown document. My goal was to tell the story of why and how, and as you read that story the what is right there for easy cross-referencing.

GitHub renders the Markdown nicely, including syntax highlighting for the code snippets. (And you can use pandoc to generate PDF or other formats from it too.) I also throw in snippets of C or Rust source, which help when explaining some of the quirks but are stripped out along with the rest of the non-Haskell text during compilation.

If you want to use Markdown in your own Literate Haskell source, there are a couple of tricks to it:
  • Configure GHC to pre-process .lhs source files with markdown-unlit, by adding it to your .cabal file's build-depends and adding "-pgmL markdown-unlit" to your ghc-options. (That package's README documents how to turn your README into a test suite where all your code snippets are actually tested at build time, which is a very cool idea, but wasn't quite what I wanted in this case.)
  • Name your literate Haskell source files with a ".md" extension, so GitHub treats them as Markdown instead of as source code. Then create a symlink from "foo.lhs" to "foo.md" so GHC can find the module too.
Of course, I know how all the code I wrote works, so there are likely sections where my explanation of what's going on doesn't actually clarify anything. I would dearly love to get pull requests that improve the documentation without changing the code at all. If you find some of the text confusing, see if you can find a better way to say it! This is an excellent way to gain a deeper understanding of how the code works, too. Huge thanks to Amin Bandali for sending in one documentation pull request already, explaining how to compile and run Corrode. (Oops, I should have thought of that!)

Conclusion

I would very much like to work with you to make Corrode into a tool that many C programmers can use to try switching to Rust. You can help:
  • by improving the documentation, so more people will feel comfortable contributing to and using Corrode;
  • by writing translations for C features which are not yet implemented, especially features I've tagged as "easy";
  • or just by trying it out on your own C source and reporting issues as they come up.
I look forward to seeing your pull requests! :-)