How XCB's design affects its performance
I fairly often get questions about XCB’s performance as compared with Xlib’s, and I expect more as it starts to see more widespread distribution. (The XCB 0.9 “preview release” is out now thanks mostly to the hard work of Josh Triplett: spread the word!)
In architecting XCB we’ve chosen to keep the public interfaces very small and well-defined, with minimal critical sections and no user-visible locks. This design choice causes XCBNoOperation to be something like a factor of four times slower than XNoOperation in the current implementation. (I can improve those numbers with a bit of work, largely involving coercing GCC to inline my memcpy calls; nice gains for this particular benchmark will come from the enhancement in bug 6686, too.) Note that NoOperation, a completely useless request, suffers the most; everything else is less than a factor of four away, and in fact most requests are as fast or faster on XCB compared to Xlib.
Maybe you’re still thinking to yourself, “Wow, up to a factor of four: XCB’s performance is awful.” Let’s explore why that thought is unreasonable.
- For the vast majority of requests, the time spent in the server processing the request dwarfs the time spent in the client delivering it. If you want better performance, hacking on XCB’s implementation is not going to give it to you. Reduce the number of requests, make use of round-trip delays to get work done, and make the server process requests faster.
- Because XCB’s API successfully encapsulates everything that callers don’t need to know about the implementation, we can make dramatic changes in the implementation without breaking API or ABI compatibility with any application. For a particularly cool performance-related example, XCB could switch from mutexes to lock-free or wait-free algorithms tomorrow and no applications would even need to be recompiled. Such a substantial change would just lead to a magic performance boost for single-threaded apps and apps with low levels of lock contention—for free.
- XCB has a clean API. Sure, some things are more messy than anyone would like because C sucks. But there are only 14 non-deprecated functions in xcb.h, and another 5 in xcbext.h, and that’s the entire public API. Clean, simple, and nicely orthogonal. We’ve delayed a wide release of XCB for a few years now while refining the API until it has become a carefully specified interface that addresses the needs of applications without unduly constraining the implementation. I’m confident this API will last a long time. That is the huge win we’ve bought with this tiny performance hit.
- The performance will get better, but the API doesn’t get to change once we release XCB 1.0. I know how to shave something like 25% off time spent issuing requests like NoOperation. I know which instructions are the most costly in the request path. We’ll get the time trimmed down eventually.