Thursday, May 19, 2016

Perspectives on commit history

I've just read three blog posts talking about what policy open source projects should adopt with regard to what the Linux kernel community calls "the perfect patch".
All three are writing with the perspective of extensive experience using git for collaboration on open source projects. They fully understand the alternatives, and I respect their positions.

I'd like to offer a fourth position that's close to Joey's, with a slightly different emphasis: If you want new contributors to your project, never tell them to rebase. Don't ask them to "git rebase -i", don't ask them to "git pull --rebase", none of that.

Similarly, if you're teaching people how to use git for the first time, please do not teach them anything about rebase, except perhaps to say, "There's this thing called rebase. Don't use it until you understand 'git reflog' and have a pretty good sense of git's object model."

My problem with rebase is that, as Joey wrote, it "works well except when it doesn't, when some detail airbrushed out of the only remaining history turns out to be important." Worse, the tools for fixing rebase mistakes require deep understanding of how git works.

If you have a merge commit that went wrong (perhaps there are merge conflict markers still in it, or conflicts were resolved incorrectly) then you can check out the pre-merge version on a branch, re-do the merge, and use "git merge -s ours" to replace the effects of the bad merge. It's a bit tedious, but the procedure is reliable, and someone more experienced can do it for you if necessary. If you have somebody else's rebased commits that had merge conflicts that they resolved incorrectly, there's nothing you can do. If they're your own commits then you can dig around in your reflog and hope you can find the pre-rebase branch head, but that takes much more effort than if you simply merged.

To Matthew's point, squashing branches instead of merging them has many of the same problems, plus the separate problems he called out in his post. So I agree with him that squashing pull requests is a poor choice as well.

A side effect of this advice is that the Linux kernel style of demanding perfect patches is impractical if you want newcomers to contribute, unless you're willing to put in a lot of work on mentoring. (Which is one reason why Outreachy is a wonderful program!)

I sympathize with the kernel community and others who've adopted similar policies. As Joey pointed out, using "git bisect" to track down a bug requires a lot more work if your history contains lots of commits which don't compile or fail for unrelated reasons—but as he also pointed out, adding a "--merges-only" option to bisect would make that much less of an issue, and we shouldn't design our processes around the UX failures of our tools if we can just fix the tools instead.

A more compelling concern is that code review, which is still one of the best tools we have for software quality, is more difficult if you're trying to review a messy series of patches. So evaluate your costs and benefits. Does your project actually do code review? If not, don't worry about clean patches, because you have bigger problems. Even if you do code review, is it really that hard to evaluate the history of how your contributors actually made their changes? The Linux kernel has tens of millions of lines of code, making code review inherently challenging even with perfect patches. Is your project honestly anywhere near that scale?

It may be that whenever Josh Triplett gets around to releasing his current project (which I've seen nothing public on yet), we'll be able to have a different conversation around the ergonomics of managing a patch series. Until then, relaxing OCD requirements around what your commit history looks like in gitk or what your individual diffs contain allows you to accept contributions from more people and get them ramped up on improving your project. For most projects, that's worth more than anything else.

1 comment:

  1. Yeah. I've talked before (but never in an eloquent blog post) about the conflict between "actual" and "idealized" Git histories. As far as I can tell, git should be extended to retain and synchronize the two...

    ReplyDelete